Another draft of Merklist; remove unused images

This commit is contained in:
infogulch 2021-07-08 22:03:08 +00:00
parent 1e057f0931
commit 53a6f8d5cd
4 changed files with 116 additions and 63 deletions

View File

@ -5,7 +5,11 @@
"id": "83dd7287-bca5-49f9-b927-31bbc519d5b9",
"metadata": {},
"source": [
"# Merklist"
"# Merklist\n",
"> A definition for the hash of a list that is robust to arbitrary partitioning\n",
"\n",
"- toc: true\n",
"- categories: [merklist]"
]
},
{
@ -13,7 +17,7 @@
"id": "bf97974c-5582-4bf5-8ed8-6c43daf5036c",
"metadata": {},
"source": [
"Using matrix multiplication's associativity and non-commutativity properties provides a natural definition of a cryptographic hash / digest / summary of an ordered list of elements. Due to the non-commutativity property, lists that only differ in element order result in a different summary. Due to the associativity property, arbitrarily divided adjacent sub-lists can be summarized independently and combined to quickly find the summary of their concatenation. This definition provides exactly the properties needed to define a list, and does not impose any unnecessary structure that could cause two equivalent lists to produce different summaries. The name *Merklist* is intended to be reminicent of other hash-based data structures like [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) and [Merklix Tree](https://www.deadalnix.me/2016/09/24/introducing-merklix-tree-as-an-unordered-merkle-tree-on-steroid/)."
"Matrix multiplication's associativity and non-commutativity properties provide a natural definition for a [cryptographic hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) / digest / summary of an ordered list of elements while preserving concatenation operations. Due to the non-commutativity property, lists that differ in element order result in a different summary. Due to the associativity property, arbitrarily divided adjacent sub-lists can be summarized independently and combined to find the summary of their concatenation in one operation. This definition provides exactly the properties needed to define a list, and does not impose any unnecessary structure that could cause two equivalent lists to produce different summaries. The name *Merklist* is intended to be reminicent of other hash-based data structures like [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) and [Merklix Tree](https://www.deadalnix.me/2016/09/24/introducing-merklix-tree-as-an-unordered-merkle-tree-on-steroid/)."
]
},
{
@ -34,10 +38,10 @@
"This construction has a couple notable concequences:\n",
"\n",
"* The hash of a list with only one item is just the hash of the item itself.\n",
"* You can calculate the hash of any list concatenated with itself by matrix multiplication of the the hash with itself. This works for single elements as well as arbitrarily long lists.\n",
"* A list can have multiple copies of the same list item, and swapping them does not affect the list hash. Consider how swapping the first two elements in `[1, 1, 2]` doesn't change it.\n",
"* Concatenating two lists is accomplished by matrix multiplication of their hashes, in the correct order.\n",
"* Appending or prepending lists of 0 elements yields the same hash, as expected.\n",
"* You can calculate the hash of any list concatenated with a copy of itself by matrix multiplication of the the hash with itself. This works for single elements as well as arbitrarily long lists.\n",
"* A list can have multiple copies of the same list item, and swapping them does not affect the list hash. Consider how swapping the first two elements in `[1, 1, 2]` has no discernible effect.\n",
"* The hash of the concatenation of two lists is the matrix multiplication of their hashes.\n",
"* Concatenating a list with a list of 0 elements yields the same hash.\n",
"\n",
"Lets explore this definition in more detail with a simple implementation in python+numpy."
]
@ -47,12 +51,15 @@
"execution_count": 1,
"id": "99b521d8-1c66-49d7-98e9-6fa1d8d7c18f",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [],
"source": [
"# setup\n",
"\n",
"#collapse-hide\n",
"# Setup and imports\n",
"import hashlib\n",
"import numpy as np\n",
"from functools import reduce\n",
@ -70,7 +77,7 @@
"metadata": {},
"source": [
"### The hash of a list element - `hash_m/1`\n",
"The function `hash_m/1` takes a buffer of bytes as its first argument, and returns the sha512 hash of the bytes formatted as an 8×8 2-d array of 8-bit unsigned integers with wrapping overflow. **This is the hash of a list element consisting of those bytes.** Based on a shallow wikipedia dive, someone familiar with linear algebra might say it's a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$. Not coincidentally, sha512 outputs 512 bits = 64 bytes = 8 * 8 array of bytes, how convenient. (In fact, that might even be the primary reason why I chose sha512!)"
"The function `hash_m/1` takes a buffer of bytes as its first argument, and returns the sha512 hash of the bytes formatted as an 8×8 2-d array of 8-bit unsigned integers with wrapping overflow. **We define this hash to be the hash of the list element.** Based on a shallow wikipedia dive, someone familiar with linear algebra might say it's a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$. Not coincidentally, sha512 outputs 512 bits = 64 bytes = 8 * 8 array of bytes, how convenient. (In fact, that might even be the primary reason why I chose sha512!)"
]
},
{
@ -105,6 +112,9 @@
"execution_count": 3,
"id": "65aa7c7a-25d5-4971-8780-661f367e45ab",
"metadata": {
"jupyter": {
"source_hidden": true
},
"slideshow": {
"slide_type": "skip"
},
@ -136,6 +146,7 @@
}
],
"source": [
"#collapse-hide\n",
"print(hash_m(b\"Hello A\"))\n",
"print()\n",
"print(hash_m(b\"Hello B\"))"
@ -171,26 +182,42 @@
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6ae6ea62-8fb9-4015-8a62-4b5c3dbd98d3",
"execution_count": 17,
"id": "eb84e6e1-b1c1-48f4-aa50-3ae0edfc78af",
"metadata": {},
"outputs": [],
"source": [
"# list1 contains 3 elements\n",
"#\n",
"# `elements` is a list of 3 elements\n",
"elements = [b\"A\", b\"Hello\", b\"World\"]\n",
"# first hash each element\n",
"# first, make a new list with the hash of each element\n",
"element_hashes = [hash_m(e) for e in elements]\n",
"# get the hash of the list by reducing the hashes by matrix multiplication\n",
"list_hash1 = mul_m(mul_m(element_hashes[0], element_hashes[1]), element_hashes[2])\n",
"# an alternative way to write the reduction\n",
"list_hash2 = reduce(mul_m, element_hashes)"
"list_hash2 = reduce(mul_m, element_hashes)\n",
"# check that these alternative spellings are equivalent\n",
"assert_equal(list_hash1, list_hash2)"
]
},
{
"cell_type": "markdown",
"id": "5632eb48-e5cd-4ec4-9bcd-1b59a5dc6042",
"metadata": {},
"source": [
"> Expand the sections below to see a comparison"
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 18,
"id": "694b4727-621e-4c1b-a2af-99296a8e664a",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true,
"source_hidden": true
},
"tags": []
},
"outputs": [
@ -238,13 +265,17 @@
}
],
"source": [
"#collapse-hide\n",
"#collapse-output\n",
"print(\"List of elements:\")\n",
"print(elements)\n",
"print(\"\\nHash of each element:\")\n",
"print()\n",
"print(\"Hash of each element:\")\n",
"print(element_hashes)\n",
"print(\"\\nHash of full list:\")\n",
"print()\n",
"print(\"Hash of full list:\")\n",
"print(list_hash1)\n",
"assert_equal(list_hash1, list_hash2)"
"# Expand the section below to see the output"
]
},
{
@ -257,18 +288,18 @@
"* [Associativity](#Associativity) - Associativity enables you to reduce a computation using any partitioning because all partitionings yield the same result. Addition is associative $(1+2)+3 = 1+(2+3)$, subtraction is not $(5-3)-2\\neq5-(3-2)$. ([Associative property](https://en.wikipedia.org/wiki/Associative_property))\n",
"* [Non-Commutativity](#Non-Commutativity) - Commutativity allows you to swap elements without affecting the result. Addition is commutative $1+2 = 2+1$, but division is not $1\\div2 \\neq2\\div1$. And neither is matrix multiplication. ([Commutative property](https://en.wikipedia.org/wiki/Commutative_property))\n",
"\n",
"This is an unusual combination of properties for an operation, at least not a combination encountered under normal algebra operations:\n",
"This is an unusual combination of properties for an operation. It's at least not a combination encountered in introductory algebra:\n",
"\n",
"| | associative | commutative |\n",
"| --- | --- | --- |\n",
"| + | ✅ | ✅ |\n",
"| * | ✅ | ✅ |\n",
"| - | ❌ | ❌ |\n",
"| / | ❌ | ❌ |\n",
"| exp | ❌ | ❌ |\n",
"| M×M | ✅ | ❌ |\n",
"| $a+b$ | ✅ | ✅ |\n",
"| $a*b$ | ✅ | ✅ |\n",
"| $a-b$ | ❌ | ❌ |\n",
"| $a/b$ | ❌ | ❌ |\n",
"| $a^b$ | ❌ | ❌ |\n",
"| $M×M$ | ✅ | ❌ |\n",
"\n",
"Upon consideration, these are the exact properties that one would want in order to define the hash of a list of items. Non-commutativity enables the order of elements in the list to be well-defined, since swapping different elements produces a different hash. Associativity enables caching the summary of an arbitrary sublist; I expect that doing this heirarchally on a huge list enables an algorithm to calculate the hash of any sublist at the cost of `O(log(N))` time and space.\n",
"Upon consideration, these are the exact properties that one would want in order to define the hash of a list of items. Non-commutativity enables the order of elements in the list to be well defined, since swapping different elements produces a different hash. Associativity enables calculating the hash of the list by performing the reduction operations in any order, and you still get the same hash.\n",
"\n",
"Lets sanity-check that these properties can hold for the construction described above."
]
@ -278,7 +309,9 @@
"id": "c6c8ef5e-99d2-4a7e-887f-54b93a7baf4a",
"metadata": {},
"source": [
"### Associativity"
"### Associativity\n",
"\n",
"If it's associative, we should get the same hash if we rearrange the parenthesis to indicate reduction in a different operation order. That is: $((e1 × e2) × e3) = (e1 × (e2 × e3))$"
]
},
{
@ -290,35 +323,47 @@
},
"outputs": [],
"source": [
"f1 = hash_m(b\"Hello A\")\n",
"f2 = hash_m(b\"Hello B\")\n",
"f3 = hash_m(b\"Hello C\")"
"e1 = hash_m(b\"Hello A\")\n",
"e2 = hash_m(b\"Hello B\")\n",
"e3 = hash_m(b\"Hello C\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b631007c-3784-4c32-9c3e-447267c45a24",
"id": "0452955b-2d7e-41e4-924f-8f00ef0c46cf",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# x is calculated by association ((f1 × f2) × f3)\n",
"x = np.matmul(np.matmul(f1, f2), f3)\n",
"x = np.matmul(np.matmul(e1, e2), e3)\n",
"y = np.matmul(e1, np.matmul(e2, e3))\n",
"\n",
"# y is calculated by association (f1 × (f2 × f3))\n",
"y = np.matmul(f1, np.matmul(f2, f3))\n",
"\n",
"# observe that they produce the same result\n",
"# observe that they produce the same summary\n",
"assert_equal(x, y)"
]
},
{
"cell_type": "markdown",
"id": "bda2bacf-daf7-4f42-93cf-c422704dc067",
"metadata": {
"tags": []
},
"source": [
"> Expand the sections below to see a comparison"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b7a1906d-524c-4339-920a-978a0385d6cc",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true,
"source_hidden": true
},
"tags": []
},
"outputs": [
@ -347,6 +392,8 @@
}
],
"source": [
"#collapse-hide\n",
"#collapse-output\n",
"print(x)\n",
"print()\n",
"print(y)"
@ -357,33 +404,47 @@
"id": "c0fb04da-2cbd-4fa1-8b85-d48441cc8962",
"metadata": {},
"source": [
"### Non-Commutativity"
"### Non-Commutativity\n",
"\n",
"If it's not commutative, then swapping different elements should produce a different hash. That is, $e1 × e2 \\ne e2 × e1$:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "182652ac-a08f-4009-9354-7aaa8632d921",
"id": "2f3d139a-6c48-4ddb-9a34-7b2aa00853d6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# x is f1 × f2\n",
"x = np.matmul(f1, f2)\n",
"x = np.matmul(e1, e2)\n",
"y = np.matmul(e2, e1)\n",
"\n",
"# y is f2 × f1\n",
"y = np.matmul(f2, f1)\n",
"\n",
"# observe that they produce different results\n",
"# observe that they produce different summaries\n",
"assert_not_equal(x, y)"
]
},
{
"cell_type": "markdown",
"id": "a8c05183-08db-44f1-b0cc-5fb4aed77b1b",
"metadata": {
"tags": []
},
"source": [
"> Expand the sections below to see a comparison"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7f833e44-79d8-4c98-af41-0c915bee66ed",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true,
"source_hidden": true
},
"tags": []
},
"outputs": [
@ -412,6 +473,8 @@
}
],
"source": [
"#collapse-hide\n",
"#collapse-output\n",
"print(x)\n",
"print()\n",
"print(y)"
@ -451,7 +514,7 @@
"def identity_m():\n",
" return np.identity(8, dtype=np.uint8)\n",
"\n",
"# generalize to any length, not just doublings, performed in ln(N) matmuls\n",
"# generalize double_m to any length, not just doublings, performed in ln(N) matmuls\n",
"def repeat_m(m, n):\n",
" res = identity_m()\n",
" while n > 0:\n",
@ -506,25 +569,14 @@
" [ 19 135 80 115 75 242 242 5]\n",
" [244 165 250 28 76 43 188 254]\n",
" [233 46 187 39 151 241 175 130]\n",
" [132 138 6 215 20 132 89 33]]\n",
"\n",
"[[1 0 0 0 0 0 0 0]\n",
" [0 1 0 0 0 0 0 0]\n",
" [0 0 1 0 0 0 0 0]\n",
" [0 0 0 1 0 0 0 0]\n",
" [0 0 0 0 1 0 0 0]\n",
" [0 0 0 0 0 1 0 0]\n",
" [0 0 0 0 0 0 1 0]\n",
" [0 0 0 0 0 0 0 1]]\n"
" [132 138 6 215 20 132 89 33]]\n"
]
}
],
"source": [
"print(hash1)\n",
"print()\n",
"print(hash2)\n",
"print()\n",
"print(np.identity(8,\"B\"))"
"print(hash2)"
]
},
{
@ -567,7 +619,9 @@
"id": "6c62dcbc-63a7-4a20-8f15-295e7675f7a8",
"metadata": {},
"source": [
"Flex that associativity `(a × (a499 × b × a500) × (a500 × b × a499) × a)` = `(a500 × b × (a500 × a500) × b × a500)`"
"Flex that associativity - this statement is true and equivalent to the assertion below:\n",
"\n",
"$(a × (a499 × b × a500) × (a500 × b × a499) × a) = (a500 × b × (a500 × a500) × b × a500)$"
]
},
{
@ -589,9 +643,8 @@
"\n",
"This appears to me to be a reasonable way to define the hash of a list. The mathematical definition of a list aligns very nicely with the properties offered by matrix multiplication. But is it appropriate to use for the same things that a Merkle Tree would be? The big questions are related to the valuable properties of hash functions:\n",
"\n",
"* Given a merklist summary or sublist summaries of it, can you derive the hashes of elements or their order? (Elements themselves are protected by the preimage resistance of the underlying hash function.)\n",
" * If yes, when is that a problem?\n",
"* Given a merklist summary but not the elements, is it possible to produce a different list of elements that hash to the same summary? (~preimage resistance)\n",
"* Given a merklist summary or sublist summaries of it, can you derive the hashes of elements or their order?\n",
"* Is it possible to predictably alter the merklist summary by concatenating it with some other sublist of real elements?\n",
"* Are there other desirable security properties that would be valuable for a list hash?\n",
"* Is there a better choice of hash function as a primitive than sha512?\n",
@ -614,7 +667,7 @@
"* Using a Merklist summary tree you can create a consistent hash of any ordered key-value store (like a btree) that can be maintained incrementally inline with regular node updates, e.g. as part of a [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree). This could facilitate verification and sync between database replicas.\n",
"* The sublist summary tree structure can be as dense or sparse as desired. You could summarize down to pairs of elements akin to a merkle tree, but you could also summarize a compressed sublist of hundreds or even millions of elements with a single hash. Of course, calculating or verifying a proof of changes to the middle of that sublist would require rehashing the whole sublist, but this turns it from a fixed structure into a tuneable parameter.\n",
"* If all possible elements had an easily calculatable inverse, that would enable \"subtracting\" an element by inserting its inverse in front of it. That would basically extend the group from a ring into a field, and might have interesting implications.\n",
" * For example you could define a cryptographically-secure rolling hash where advancing either end can be calculated in `O(1)` time.\n",
" * For example you could define a cryptographically-secure rolling hash where advancing either end can be calculated in `O(1)` time and space.\n",
"\n",
"To be continued..."
]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.0 KiB