You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

630 lines
21 KiB

{
"cells": [
{
"cell_type": "markdown",
"id": "2bdec887-ee29-4bef-8978-88a81940f7bc",
"metadata": {},
"source": [
"# Merklist Tree\n",
"Using matrix multiplication's associativity and non-commutativity to construct a digest / summary of an ordered list of elements where mutations to the list can be computed, verified, and stored using $O(log(N))$ time and space. Due to the associativity property, arbitrarily divided adjacent sub-lists can be summarized independently and combined to find the summary of their concatenation."
]
},
{
"cell_type": "markdown",
"id": "3f17d376-b03f-498b-a794-ea566e0b63f7",
"metadata": {},
"source": [
"## Construction"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "99b521d8-1c66-49d7-98e9-6fa1d8d7c18f",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [],
"source": [
"# setup\n",
"\n",
"import hashlib\n",
"import numpy as np\n",
"from functools import reduce\n",
"\n",
"def assert_equal(a, b):\n",
" return np.testing.assert_equal(a, b)\n",
"\n",
"def assert_not_equal(a, b):\n",
" return np.testing.assert_raises(AssertionError, np.testing.assert_equal, a, b)"
]
},
{
"cell_type": "markdown",
"id": "fc1306b8-5e89-460a-997c-c9464c16615d",
"metadata": {},
"source": [
"### hash_m/1\n",
"The function `hash_m/1` takes a buffer of bytes as its first argument, and returns the sha512 hash of the bytes formatted as an 8×8 2-d array of 8-bit unsigned integers with wrapping overflow. Based on a shallow wikipedia dive, someone familiar with linear algebra might say it's a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$. Not coincidentally, sha512 outputs 512 bits = 64 bytes = 8 * 8 array of bytes, how convenient. That might even be the primary reason why I chose sha512."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3ccc7fdc-fa6a-48e3-accb-3c1070b4559c",
"metadata": {},
"outputs": [],
"source": [
"def hash_m(s):\n",
" hash_bytes = list(hashlib.sha512(s).digest())[:64]\n",
" return np.array(hash_bytes, dtype=np.uint8).reshape((8,8))"
]
},
{
"cell_type": "markdown",
"id": "04132091-21b1-4fbb-99df-711ae5e0c819",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"8×8 seems big compared to 3×3 or 4×4 matrixes. The values are as random as you might expect a cryptographic hash to be:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "65aa7c7a-25d5-4971-8780-661f367e45ab",
"metadata": {
"jupyter": {
"source_hidden": true
},
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 14 184 108 217 131 164 222 93]\n",
" [132 227 82 144 111 178 195 109]\n",
" [ 25 250 155 17 131 183 151 217]\n",
" [212 60 138 36 0 60 115 181]\n",
" [ 51 0 87 43 93 252 56 61]\n",
" [108 239 175 222 23 142 41 216]\n",
" [203 98 234 13 65 169 255 240]\n",
" [ 46 127 15 167 112 153 222 94]]\n",
"\n",
"[[ 63 144 188 5 57 146 32 56]\n",
" [ 27 189 98 140 113 194 70 87]\n",
" [115 21 136 27 116 167 85 48]\n",
" [ 29 162 119 29 104 32 145 241]\n",
" [166 197 57 165 132 213 50 202]\n",
" [ 48 71 33 19 230 26 58 164]\n",
" [242 172 65 202 193 50 193 141]\n",
" [206 110 165 129 52 132 250 73]]\n"
]
}
],
"source": [
"print(hash_m(b\"Hello A\"))\n",
"print()\n",
"print(hash_m(b\"Hello B\"))"
]
},
{
"cell_type": "markdown",
"id": "c0c37110-b38d-4420-adf9-11ff5c5cd590",
"metadata": {},
"source": [
"## List\n",
"Ok so we've formatted hashes of bytes into matrixes, but we haven't actually done anything with them yet.\n",
"\n",
"Consider a list of arbitrarily many arbitrary byte buffers. **Define the 'hash' of the list to be reduction by matrix multiplication of the hash of each byte buffer.**"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "91afe2ad-19dc-475c-ad8b-17b70ba9fb79",
"metadata": {},
"outputs": [],
"source": [
"def mul_m(hm1, hm2):\n",
" return np.matmul(hm1, hm2, dtype=np.uint8)"
]
},
{
"cell_type": "markdown",
"id": "39638a4a-6a42-4710-bcd2-f4a41c24f4cf",
"metadata": {},
"source": [
"Consider an example:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6ae6ea62-8fb9-4015-8a62-4b5c3dbd98d3",
"metadata": {},
"outputs": [],
"source": [
"# list1 contains 3 elements\n",
"list1 = [b\"A\", b\"Hello\", b\"World\"]\n",
"# first hash each element\n",
"hashes1 = [hash_m(e) for e in list1]\n",
"# get the hash of the list by reducing the hashes by matrix multiplication\n",
"hash1 = mul_m(mul_m(hashes1[0], hashes1[1]), hashes1[2])\n",
"# an alternative way to write the reduction\n",
"hash2 = reduce(mul_m, hashes1)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "694b4727-621e-4c1b-a2af-99296a8e664a",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"List of byte buffers:\n",
"[b'A', b'Hello', b'World']\n",
"\n",
"Hashes of byte buffers:\n",
"[array([[ 33, 180, 244, 189, 158, 100, 237, 53],\n",
" [ 92, 62, 182, 118, 162, 142, 190, 218],\n",
" [246, 216, 241, 123, 220, 54, 89, 149],\n",
" [179, 25, 9, 113, 83, 4, 64, 128],\n",
" [ 81, 107, 208, 131, 191, 204, 230, 97],\n",
" [ 33, 163, 7, 38, 70, 153, 76, 132],\n",
" [ 48, 204, 56, 43, 141, 197, 67, 232],\n",
" [ 72, 128, 24, 59, 248, 86, 207, 245]], dtype=uint8), array([[ 54, 21, 248, 12, 157, 41, 62, 215],\n",
" [ 64, 38, 135, 249, 75, 34, 213, 142],\n",
" [ 82, 155, 140, 199, 145, 111, 143, 172],\n",
" [127, 221, 247, 251, 213, 175, 76, 247],\n",
" [119, 211, 215, 149, 167, 160, 10, 22],\n",
" [191, 126, 127, 63, 185, 86, 30, 233],\n",
" [186, 174, 72, 13, 169, 254, 122, 24],\n",
" [118, 158, 113, 136, 107, 3, 243, 21]], dtype=uint8), array([[142, 167, 115, 147, 164, 42, 184, 250],\n",
" [146, 80, 15, 176, 119, 169, 80, 156],\n",
" [195, 43, 201, 94, 114, 113, 46, 250],\n",
" [ 17, 110, 218, 242, 237, 250, 227, 79],\n",
" [187, 104, 46, 253, 214, 197, 221, 19],\n",
" [193, 23, 224, 139, 212, 170, 239, 113],\n",
" [ 41, 29, 138, 172, 226, 248, 144, 39],\n",
" [ 48, 129, 208, 103, 124, 22, 223, 15]], dtype=uint8)]\n",
"\n",
"Hash of full list:\n",
"[[178 188 57 157 60 136 190 127]\n",
" [ 40 234 254 224 38 46 250 52]\n",
" [156 72 193 136 219 98 28 4]\n",
" [197 2 43 132 132 232 254 198]\n",
" [ 93 64 113 215 2 246 130 192]\n",
" [ 91 107 85 13 149 60 19 173]\n",
" [ 84 77 244 98 0 239 123 17]\n",
" [ 58 112 98 250 163 20 27 6]]\n"
]
}
],
"source": [
"print(\"List of byte buffers:\")\n",
"print(list1)\n",
"print(\"\\nHashes of byte buffers:\")\n",
"print(hashes1)\n",
"print(\"\\nHash of full list:\")\n",
"print(hash1)\n",
"assert_equal(hash1, hash2)"
]
},
{
"cell_type": "markdown",
"id": "de064a80-208d-4850-b95e-c5a707f7f3b3",
"metadata": {},
"source": [
"What does this give us? Generally speaking, multiplying two matrixes $M_1×M_2$ gives us at least these two properties:\n",
"\n",
"* [Associativity](#Associativity) - Associativity enables you to reduce a computation using any partitioning because all partitionings yield the same result. Addition is associative $(1+2)+3 = 1+(2+3)$, subtraction is not $(5-3)-2\\neq5-(3-2)$. ([Associative property](https://en.wikipedia.org/wiki/Associative_property))\n",
"* [Non-Commutativity](#Non-Commutativity) - Commutativity allows you to swap elements without affecting the result. Addition is commutative $1+2 = 2+1$, but division is not $1\\div2 \\neq2\\div1$. And neither is matrix multiplication. ([Commutative property](https://en.wikipedia.org/wiki/Commutative_property))\n",
"\n",
"This is an unusual combination of properties, at least not a combination encountered under normal algebra operations:\n",
"\n",
"| | associative | commutative |\n",
"| --- | --- | --- |\n",
"| + | ✅ | ✅ |\n",
"| * | ✅ | ✅ |\n",
"| - | ❌ | ❌ |\n",
"| / | ❌ | ❌ |\n",
"| exp | ❌ | ❌ |\n",
"| M×M | ✅ | ❌ |\n",
"\n",
"Upon consideration, these are the exact properties that one would want in order to define the hash of a list of items. Non-commutativity enables the order of elements in the list to be defined, since swapping them produces a different hash. Associativity enables caching the summary of an arbitrary sublist; doing this heirarchally on a huge list enables an algorithm to calculate the hash of any sublist at the cost of `O(log(N))` time and space."
]
},
{
"cell_type": "markdown",
"id": "c6c8ef5e-99d2-4a7e-887f-54b93a7baf4a",
"metadata": {},
"source": [
"### Associativity"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6da02d5e-a783-4a57-90ac-04a654d89006",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"f1 = hash_m(b\"Hello A\")\n",
"f2 = hash_m(b\"Hello B\")\n",
"f3 = hash_m(b\"Hello C\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b631007c-3784-4c32-9c3e-447267c45a24",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# x is calculated by association ((f1 × f2) × f3)\n",
"x = np.matmul(np.matmul(f1, f2), f3)\n",
"\n",
"# y is calculated by association (f1 × (f2 × f3))\n",
"y = np.matmul(f1, np.matmul(f2, f3))\n",
"\n",
"# observe that they produce the same result\n",
"assert_equal(x, y)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b7a1906d-524c-4339-920a-978a0385d6cc",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 58 12 144 134 100 158 159 51]\n",
" [ 73 206 202 190 87 79 223 2]\n",
" [210 122 142 117 37 148 106 45]\n",
" [175 146 187 223 235 171 64 226]\n",
" [149 85 203 87 92 251 243 206]\n",
" [ 18 252 160 103 125 251 181 133]\n",
" [191 132 220 104 213 154 34 154]\n",
" [127 197 95 87 166 3 22 3]]\n",
"\n",
"[[ 58 12 144 134 100 158 159 51]\n",
" [ 73 206 202 190 87 79 223 2]\n",
" [210 122 142 117 37 148 106 45]\n",
" [175 146 187 223 235 171 64 226]\n",
" [149 85 203 87 92 251 243 206]\n",
" [ 18 252 160 103 125 251 181 133]\n",
" [191 132 220 104 213 154 34 154]\n",
" [127 197 95 87 166 3 22 3]]\n"
]
}
],
"source": [
"print(x)\n",
"print()\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"id": "c0fb04da-2cbd-4fa1-8b85-d48441cc8962",
"metadata": {},
"source": [
"### Non-Commutativity"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "182652ac-a08f-4009-9354-7aaa8632d921",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# x is f1 × f2\n",
"x = np.matmul(f1, f2)\n",
"\n",
"# y is f2 × f1\n",
"y = np.matmul(f2, f1)\n",
"\n",
"# observe that they produce different results\n",
"assert_not_equal(x, y)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7f833e44-79d8-4c98-af41-0c915bee66ed",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 87 79 149 131 148 247 195 90]\n",
" [249 84 195 58 142 133 211 15]\n",
" [177 93 69 254 240 234 97 37]\n",
" [ 46 84 76 253 55 200 43 236]\n",
" [ 21 84 99 157 55 148 170 2]\n",
" [168 123 6 250 64 144 54 242]\n",
" [230 78 164 76 30 29 214 68]\n",
" [ 47 183 156 239 157 177 192 184]]\n",
"\n",
"[[149 18 239 238 84 188 191 109]\n",
" [239 150 214 235 59 161 9 133]\n",
" [ 89 174 59 14 70 113 124 243]\n",
" [ 66 113 176 124 227 247 17 25]\n",
" [247 138 152 181 177 143 184 97]\n",
" [113 249 199 153 154 75 45 105]\n",
" [121 201 225 42 249 213 180 244]\n",
" [ 85 31 72 28 181 182 140 176]]\n"
]
}
],
"source": [
"print(x)\n",
"print()\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"id": "2978d8f5-0c9e-445d-80d1-12229b589c24",
"metadata": {},
"source": [
"## Other functions"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "80ec6898-4163-4ba8-9460-c717a9e58c59",
"metadata": {},
"outputs": [],
"source": [
"# Create a list of 1024 elements and reduce them one by one\n",
"list1 = [hash_m(b\"A\") for _ in range(0, 1024)]\n",
"hash1 = reduce(mul_m, list1)\n",
"\n",
"# Take a starting element and square/double it 10 times. With 1 starting element over 10 doublings = 1024 elements\n",
"hash2 = reduce((lambda m, _ : mul_m(m, m)), range(0, 10), hash_m(b\"A\"))\n",
"\n",
"# Observe that these two methods of calculating the hash have the same result\n",
"assert_equal(hash1, hash2)\n",
"\n",
"# lets call it double\n",
"def double_m(m, d=1):\n",
" return reduce((lambda m, _ : mul_m(m, m)), range(0, d), m)\n",
"\n",
"assert_equal(hash1, double_m(hash_m(b\"A\"), 10))\n",
"\n",
"def identity_m():\n",
" return np.identity(8, dtype=np.uint8)\n",
"\n",
"# generalize to any length, not just doublings, performed in ln(N) matmuls\n",
"def repeat_m(m, n):\n",
" res = identity_m()\n",
" while n > 0:\n",
" # concatenate the current doubling iff the bit representing this doubling is set\n",
" if n & 1:\n",
" res = mul_m(res, m)\n",
" n >>= 1\n",
" m = mul_m(m, m) # double matrix m\n",
" # print(s)\n",
" return res\n",
"\n",
"# repeat_m can do the same as double_m\n",
"assert_equal(hash1, repeat_m(hash_m(b\"A\"), 1024))\n",
"\n",
"# but it can also repeat any number of times\n",
"hash3 = reduce(mul_m, (hash_m(b\"A\") for _ in range(0, 3309)))\n",
"assert_equal(hash3, repeat_m(hash_m(b\"A\"), 3309))\n",
"\n",
"# Even returns a sensible result when requesting 0 elements\n",
"assert_equal(identity_m(), repeat_m(hash_m(b\"A\"), 0))\n",
"\n",
"# make helper for reducing an iterable of hashes\n",
"def reduce_m(am):\n",
" return reduce(mul_m, am)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "84738470-61c9-44b5-b6b7-9971a02547bd",
"metadata": {
"jupyter": {
"source_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 68 252 159 3 14 52 199 199]\n",
" [136 124 6 34 58 174 206 54]\n",
" [ 3 234 2 13 120 240 7 163]\n",
" [102 47 66 61 87 234 246 72]\n",
" [ 19 135 80 115 75 242 242 5]\n",
" [244 165 250 28 76 43 188 254]\n",
" [233 46 187 39 151 241 175 130]\n",
" [132 138 6 215 20 132 89 33]]\n",
"\n",
"[[ 68 252 159 3 14 52 199 199]\n",
" [136 124 6 34 58 174 206 54]\n",
" [ 3 234 2 13 120 240 7 163]\n",
" [102 47 66 61 87 234 246 72]\n",
" [ 19 135 80 115 75 242 242 5]\n",
" [244 165 250 28 76 43 188 254]\n",
" [233 46 187 39 151 241 175 130]\n",
" [132 138 6 215 20 132 89 33]]\n",
"\n",
"[[1 0 0 0 0 0 0 0]\n",
" [0 1 0 0 0 0 0 0]\n",
" [0 0 1 0 0 0 0 0]\n",
" [0 0 0 1 0 0 0 0]\n",
" [0 0 0 0 1 0 0 0]\n",
" [0 0 0 0 0 1 0 0]\n",
" [0 0 0 0 0 0 1 0]\n",
" [0 0 0 0 0 0 0 1]]\n"
]
}
],
"source": [
"print(hash1)\n",
"print()\n",
"print(hash2)\n",
"print()\n",
"print(np.identity(8,\"B\"))"
]
},
{
"cell_type": "markdown",
"id": "b2a7b1fd-6d75-4790-ad46-b97767c4f98c",
"metadata": {},
"source": [
"# Exploring"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c9b7e5c8-db73-43e6-89ef-cc912ce1578d",
"metadata": {},
"outputs": [],
"source": [
"a = hash_m(b\"A\")\n",
"b = hash_m(b\"B\")\n",
"\n",
"a499 = repeat_m(a, 499)\n",
"a500 = repeat_m(a, 500)\n",
"\n",
"# this should work because they're all a's\n",
"assert_equal(reduce_m([a, a499]), a500)\n",
"assert_equal(reduce_m([a499, a]), a500)\n",
"\n",
"# these are lists of 999 elements of a, with one b at position 500 (x) or 501 (y)\n",
"x = reduce_m([a499, b, a500])\n",
"y = reduce_m([a500, b, a499])\n",
"\n",
"# shifting the b by one element changed the hash\n",
"assert_not_equal(x, y)"
]
},
{
"cell_type": "markdown",
"id": "6c62dcbc-63a7-4a20-8f15-295e7675f7a8",
"metadata": {},
"source": [
"Flex that associativity `(a × (a499 × b × a500) × (a500 × b × a499) × a)` = `(a500 × b × (a500 × a500) × b × a500)`"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "3a1602ec-db66-4ddf-95ec-80d0b3df7f58",
"metadata": {},
"outputs": [],
"source": [
"assert_equal(reduce_m([a, x, y, a]), reduce_m([a500, b, repeat_m(a500, 2), b, a500]))"
]
},
{
"cell_type": "markdown",
"id": "da076abe-bd14-4d5d-98be-2c8980e538e5",
"metadata": {},
"source": [
"## \"Merklist\"?\n",
"\n",
"Merklist ~ [Merklix](https://www.deadalnix.me/2016/09/24/introducing-merklix-tree-as-an-unordered-merkle-tree-on-steroid/) ~ [Merkle](https://en.wikipedia.org/wiki/Merkle_tree)"
]
},
{
"cell_type": "markdown",
"id": "4c4d4a83-8e2e-46d7-b2e3-2d59ba9c9e8c",
"metadata": {},
"source": [
"# \"Tree\"?\n",
"Ok great, so now we can construct a hash for a list that always produces the same hash for the same list, independent of which pairs in the list are reduced first. I think this enables verifyably adding or removing any number of elements at any point in the list with only $O(log(N))$ additional time and space, but what does that look like specifically? Or alternatively, where does the \"Tree\" part of \"Merlist Tree\" come in?\n",
"\n",
"To be continued..."
]
},
{
"cell_type": "markdown",
"id": "6cc30cb3-8079-4f8a-9b7c-a3b4f7e384a3",
"metadata": {},
"source": [
"# Conclusion / Security\n",
"This appears to me to be a reasonable way to define the hash of a list. Definitionally, it needs to preserve the order of elements; this is provided by the non-commutativity property. For efficiency, it would be nice if any two sublists can have a known equality; this is provided by the associativity property. It's not clear to me under what circumstances any information about what is contained in the list could be derived from the hash.\n",
"\n",
"Obviously, being \"not clear to me how\" is not a proof of impossibility. Matrixes are well-studied objects, perhaps such information is already known. If *you* know something about deriving the preimage of a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$, I would be very interested to know.\n",
"\n",
"*If* this construction has the appropriate security properties, it seems to be a better merkle tree in all respects. Any use of a merkle tree could be replaced with this, and it would enable many more use-cases where merkle trees are not applicable. For example, this would allow you to track the hash of a btree-like structure over time with no additional cost (asymptotically). Of course, these ideas are putting the cart before the horse; we need to know more about its properties first."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
},
"toc-autonumbering": false,
"toc-showmarkdowntxt": false
},
"nbformat": 4,
"nbformat_minor": 5
}