notebook/2021-07-07-Merklist.ipynb

700 lines
26 KiB
Plaintext
Raw Normal View History

2021-05-22 01:21:08 +00:00
{
"cells": [
{
"cell_type": "markdown",
"id": "83dd7287-bca5-49f9-b927-31bbc519d5b9",
"metadata": {},
"source": [
"# Merklist\n",
"> A definition for the hash of a list that is robust to arbitrary partitioning\n",
"\n",
"- toc: true\n",
"- categories: [merklist]"
]
},
{
"cell_type": "markdown",
"id": "bf97974c-5582-4bf5-8ed8-6c43daf5036c",
2021-05-22 01:21:08 +00:00
"metadata": {},
"source": [
"Matrix multiplication's associativity and non-commutativity properties provide a natural definition for a [cryptographic hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) / digest / summary of an ordered list of elements while preserving concatenation operations. Due to the non-commutativity property, lists that differ in element order result in a different summary. Due to the associativity property, arbitrarily divided adjacent sub-lists can be summarized independently and combined to find the summary of their concatenation in one operation. This definition provides exactly the properties needed to define a list, and does not impose any unnecessary structure that could cause two equivalent lists to produce different summaries. The name *Merklist* is intended to be reminicent of other hash-based data structures like [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) and [Merklix Tree](https://www.deadalnix.me/2016/09/24/introducing-merklix-tree-as-an-unordered-merkle-tree-on-steroid/)."
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "markdown",
"id": "3f17d376-b03f-498b-a794-ea566e0b63f7",
"metadata": {},
"source": [
2021-07-07 19:54:56 +00:00
"## Definition\n",
"\n",
"This definition of a hash of a list of elements is pretty simple:\n",
"\n",
"* A **list element** is an arbitrary buffer of bytes. Any length, any content. Just bytes.\n",
"* A **list**, then, is a sequence of such elements.\n",
"* The **hash of a list element** is the cryptographic hash of its bytes, formatted into a square matrix with byte elements. (More details later.)\n",
"* The **hash of a list** is reduction by matrix multiplication of the hashes of all the list elements in the same order as they appear in the list.\n",
"* The **hash of a list with 0 elements** is the identity matrix.\n",
"\n",
"This construction has a couple notable concequences:\n",
"\n",
"* The hash of a list with only one item is just the hash of the item itself.\n",
"* You can calculate the hash of any list concatenated with a copy of itself by matrix multiplication of the the hash with itself. This works for single elements as well as arbitrarily long lists.\n",
"* A list can have multiple copies of the same list item, and swapping them does not affect the list hash. Consider how swapping the first two elements in `[1, 1, 2]` has no discernible effect.\n",
"* The hash of the concatenation of two lists is the matrix multiplication of their hashes.\n",
"* Concatenating a list with a list of 0 elements yields the same hash.\n",
2021-07-07 19:54:56 +00:00
"\n",
"Lets explore this definition in more detail with a simple implementation in python+numpy."
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "99b521d8-1c66-49d7-98e9-6fa1d8d7c18f",
"metadata": {
"jupyter": {
"source_hidden": true
},
2021-05-22 01:21:08 +00:00
"tags": []
},
"outputs": [],
"source": [
"#collapse-hide\n",
"# Setup and imports\n",
2021-05-22 01:21:08 +00:00
"import hashlib\n",
"import numpy as np\n",
"from functools import reduce\n",
"\n",
"def assert_equal(a, b):\n",
" return np.testing.assert_equal(a, b)\n",
"\n",
"def assert_not_equal(a, b):\n",
" return np.testing.assert_raises(AssertionError, np.testing.assert_equal, a, b)"
]
},
{
"cell_type": "markdown",
"id": "fc1306b8-5e89-460a-997c-c9464c16615d",
"metadata": {},
"source": [
2021-07-07 19:54:56 +00:00
"### The hash of a list element - `hash_m/1`\n",
"The function `hash_m/1` takes a buffer of bytes as its first argument, and returns the sha512 hash of the bytes formatted as an 8×8 2-d array of 8-bit unsigned integers with wrapping overflow. **We define this hash to be the hash of the list element.** Based on a shallow wikipedia dive, someone familiar with linear algebra might say it's a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$. Not coincidentally, sha512 outputs 512 bits = 64 bytes = 8 * 8 array of bytes, how convenient. (In fact, that might even be the primary reason why I chose sha512!)"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3ccc7fdc-fa6a-48e3-accb-3c1070b4559c",
2021-07-07 19:54:56 +00:00
"metadata": {
"tags": []
},
2021-05-22 01:21:08 +00:00
"outputs": [],
"source": [
2021-07-07 19:54:56 +00:00
"def hash_m(e):\n",
" hash_bytes = list(hashlib.sha512(e).digest())[:64] # hash the bytes e, convert the digest into a list of 64 bytes\n",
" return np.array(hash_bytes, dtype=np.uint8).reshape((8,8)) # convert the digest bytes into a numpy array with the appropriate data type and shape"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "markdown",
"id": "04132091-21b1-4fbb-99df-711ae5e0c819",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
2021-07-07 19:54:56 +00:00
"8×8 seems big compared to 3×3 or 4×4 matrixes. The values are as random as you might expect a cryptographic hash to be, and range from 0-255:"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "65aa7c7a-25d5-4971-8780-661f367e45ab",
"metadata": {
"jupyter": {
"source_hidden": true
},
2021-05-22 01:21:08 +00:00
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 14 184 108 217 131 164 222 93]\n",
" [132 227 82 144 111 178 195 109]\n",
" [ 25 250 155 17 131 183 151 217]\n",
" [212 60 138 36 0 60 115 181]\n",
" [ 51 0 87 43 93 252 56 61]\n",
" [108 239 175 222 23 142 41 216]\n",
" [203 98 234 13 65 169 255 240]\n",
" [ 46 127 15 167 112 153 222 94]]\n",
"\n",
"[[ 63 144 188 5 57 146 32 56]\n",
" [ 27 189 98 140 113 194 70 87]\n",
" [115 21 136 27 116 167 85 48]\n",
" [ 29 162 119 29 104 32 145 241]\n",
" [166 197 57 165 132 213 50 202]\n",
" [ 48 71 33 19 230 26 58 164]\n",
" [242 172 65 202 193 50 193 141]\n",
" [206 110 165 129 52 132 250 73]]\n"
]
}
],
"source": [
"#collapse-hide\n",
2021-05-22 01:21:08 +00:00
"print(hash_m(b\"Hello A\"))\n",
"print()\n",
"print(hash_m(b\"Hello B\"))"
]
},
{
"cell_type": "markdown",
"id": "c0c37110-b38d-4420-adf9-11ff5c5cd590",
"metadata": {},
"source": [
2021-07-07 19:54:56 +00:00
"### The hash of a list - `mul_m/2`\n",
"Ok so we've got our element hashes, how do we combine them to construct the hash of a list? We defined the hash of the list to be reduction by matrix multiplication of the hash of each element:"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "91afe2ad-19dc-475c-ad8b-17b70ba9fb79",
"metadata": {},
"outputs": [],
"source": [
2021-07-07 19:54:56 +00:00
"def mul_m(he1, he2):\n",
" return np.matmul(he1, he2, dtype=np.uint8) # just, like, multiply them"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "markdown",
"id": "39638a4a-6a42-4710-bcd2-f4a41c24f4cf",
"metadata": {},
"source": [
"Consider an example:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "eb84e6e1-b1c1-48f4-aa50-3ae0edfc78af",
2021-05-22 01:21:08 +00:00
"metadata": {},
"outputs": [],
"source": [
"#\n",
"# `elements` is a list of 3 elements\n",
2021-07-07 19:54:56 +00:00
"elements = [b\"A\", b\"Hello\", b\"World\"]\n",
"# first, make a new list with the hash of each element\n",
2021-07-07 19:54:56 +00:00
"element_hashes = [hash_m(e) for e in elements]\n",
2021-05-22 01:21:08 +00:00
"# get the hash of the list by reducing the hashes by matrix multiplication\n",
2021-07-07 19:54:56 +00:00
"list_hash1 = mul_m(mul_m(element_hashes[0], element_hashes[1]), element_hashes[2])\n",
2021-05-22 01:21:08 +00:00
"# an alternative way to write the reduction\n",
"list_hash2 = reduce(mul_m, element_hashes)\n",
"# check that these alternative spellings are equivalent\n",
"assert_equal(list_hash1, list_hash2)"
]
},
{
"cell_type": "markdown",
"id": "5632eb48-e5cd-4ec4-9bcd-1b59a5dc6042",
"metadata": {},
"source": [
"> Expand the sections below to see a comparison"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 18,
2021-05-22 01:21:08 +00:00
"id": "694b4727-621e-4c1b-a2af-99296a8e664a",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true,
"source_hidden": true
},
2021-05-22 01:21:08 +00:00
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2021-07-07 19:54:56 +00:00
"List of elements:\n",
2021-05-22 01:21:08 +00:00
"[b'A', b'Hello', b'World']\n",
"\n",
2021-07-07 19:54:56 +00:00
"Hash of each element:\n",
2021-05-22 01:21:08 +00:00
"[array([[ 33, 180, 244, 189, 158, 100, 237, 53],\n",
" [ 92, 62, 182, 118, 162, 142, 190, 218],\n",
" [246, 216, 241, 123, 220, 54, 89, 149],\n",
" [179, 25, 9, 113, 83, 4, 64, 128],\n",
" [ 81, 107, 208, 131, 191, 204, 230, 97],\n",
" [ 33, 163, 7, 38, 70, 153, 76, 132],\n",
" [ 48, 204, 56, 43, 141, 197, 67, 232],\n",
" [ 72, 128, 24, 59, 248, 86, 207, 245]], dtype=uint8), array([[ 54, 21, 248, 12, 157, 41, 62, 215],\n",
" [ 64, 38, 135, 249, 75, 34, 213, 142],\n",
" [ 82, 155, 140, 199, 145, 111, 143, 172],\n",
" [127, 221, 247, 251, 213, 175, 76, 247],\n",
" [119, 211, 215, 149, 167, 160, 10, 22],\n",
" [191, 126, 127, 63, 185, 86, 30, 233],\n",
" [186, 174, 72, 13, 169, 254, 122, 24],\n",
" [118, 158, 113, 136, 107, 3, 243, 21]], dtype=uint8), array([[142, 167, 115, 147, 164, 42, 184, 250],\n",
" [146, 80, 15, 176, 119, 169, 80, 156],\n",
" [195, 43, 201, 94, 114, 113, 46, 250],\n",
" [ 17, 110, 218, 242, 237, 250, 227, 79],\n",
" [187, 104, 46, 253, 214, 197, 221, 19],\n",
" [193, 23, 224, 139, 212, 170, 239, 113],\n",
" [ 41, 29, 138, 172, 226, 248, 144, 39],\n",
" [ 48, 129, 208, 103, 124, 22, 223, 15]], dtype=uint8)]\n",
"\n",
"Hash of full list:\n",
"[[178 188 57 157 60 136 190 127]\n",
" [ 40 234 254 224 38 46 250 52]\n",
" [156 72 193 136 219 98 28 4]\n",
" [197 2 43 132 132 232 254 198]\n",
" [ 93 64 113 215 2 246 130 192]\n",
" [ 91 107 85 13 149 60 19 173]\n",
" [ 84 77 244 98 0 239 123 17]\n",
" [ 58 112 98 250 163 20 27 6]]\n"
]
}
],
"source": [
"#collapse-hide\n",
"#collapse-output\n",
2021-07-07 19:54:56 +00:00
"print(\"List of elements:\")\n",
"print(elements)\n",
"print()\n",
"print(\"Hash of each element:\")\n",
2021-07-07 19:54:56 +00:00
"print(element_hashes)\n",
"print()\n",
"print(\"Hash of full list:\")\n",
2021-07-07 19:54:56 +00:00
"print(list_hash1)\n",
"# Expand the section below to see the output"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "markdown",
"id": "de064a80-208d-4850-b95e-c5a707f7f3b3",
"metadata": {},
"source": [
2021-07-07 19:54:56 +00:00
"What does this give us? Generally speaking, multiplying two square matrixes $M_1×M_2$ gives us at least these two properties:\n",
2021-05-22 01:21:08 +00:00
"\n",
"* [Associativity](#Associativity) - Associativity enables you to reduce a computation using any partitioning because all partitionings yield the same result. Addition is associative $(1+2)+3 = 1+(2+3)$, subtraction is not $(5-3)-2\\neq5-(3-2)$. ([Associative property](https://en.wikipedia.org/wiki/Associative_property))\n",
"* [Non-Commutativity](#Non-Commutativity) - Commutativity allows you to swap elements without affecting the result. Addition is commutative $1+2 = 2+1$, but division is not $1\\div2 \\neq2\\div1$. And neither is matrix multiplication. ([Commutative property](https://en.wikipedia.org/wiki/Commutative_property))\n",
"\n",
"This is an unusual combination of properties for an operation. It's at least not a combination encountered in introductory algebra:\n",
2021-05-22 01:21:08 +00:00
"\n",
"| | associative | commutative |\n",
"| --- | --- | --- |\n",
"| $a+b$ | ✅ | ✅ |\n",
"| $a*b$ | ✅ | ✅ |\n",
"| $a-b$ | ❌ | ❌ |\n",
"| $a/b$ | ❌ | ❌ |\n",
"| $a^b$ | ❌ | ❌ |\n",
"| $M×M$ | ✅ | ❌ |\n",
2021-05-22 01:21:08 +00:00
"\n",
"Upon consideration, these are the exact properties that one would want in order to define the hash of a list of items. Non-commutativity enables the order of elements in the list to be well defined, since swapping different elements produces a different hash. Associativity enables calculating the hash of the list by performing the reduction operations in any order, and you still get the same hash.\n",
2021-07-07 19:54:56 +00:00
"\n",
"Lets sanity-check that these properties can hold for the construction described above."
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "markdown",
"id": "c6c8ef5e-99d2-4a7e-887f-54b93a7baf4a",
"metadata": {},
"source": [
"### Associativity\n",
"\n",
"If it's associative, we should get the same hash if we rearrange the parenthesis to indicate reduction in a different operation order. That is: $((e1 × e2) × e3) = (e1 × (e2 × e3))$"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6da02d5e-a783-4a57-90ac-04a654d89006",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"e1 = hash_m(b\"Hello A\")\n",
"e2 = hash_m(b\"Hello B\")\n",
"e3 = hash_m(b\"Hello C\")"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0452955b-2d7e-41e4-924f-8f00ef0c46cf",
2021-05-22 01:21:08 +00:00
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"x = np.matmul(np.matmul(e1, e2), e3)\n",
"y = np.matmul(e1, np.matmul(e2, e3))\n",
2021-05-22 01:21:08 +00:00
"\n",
"# observe that they produce the same summary\n",
2021-05-22 01:21:08 +00:00
"assert_equal(x, y)"
]
},
{
"cell_type": "markdown",
"id": "bda2bacf-daf7-4f42-93cf-c422704dc067",
"metadata": {
"tags": []
},
"source": [
"> Expand the sections below to see a comparison"
]
},
2021-05-22 01:21:08 +00:00
{
"cell_type": "code",
"execution_count": 9,
"id": "b7a1906d-524c-4339-920a-978a0385d6cc",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true,
"source_hidden": true
},
2021-05-22 01:21:08 +00:00
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 58 12 144 134 100 158 159 51]\n",
" [ 73 206 202 190 87 79 223 2]\n",
" [210 122 142 117 37 148 106 45]\n",
" [175 146 187 223 235 171 64 226]\n",
" [149 85 203 87 92 251 243 206]\n",
" [ 18 252 160 103 125 251 181 133]\n",
" [191 132 220 104 213 154 34 154]\n",
" [127 197 95 87 166 3 22 3]]\n",
"\n",
"[[ 58 12 144 134 100 158 159 51]\n",
" [ 73 206 202 190 87 79 223 2]\n",
" [210 122 142 117 37 148 106 45]\n",
" [175 146 187 223 235 171 64 226]\n",
" [149 85 203 87 92 251 243 206]\n",
" [ 18 252 160 103 125 251 181 133]\n",
" [191 132 220 104 213 154 34 154]\n",
" [127 197 95 87 166 3 22 3]]\n"
]
}
],
"source": [
"#collapse-hide\n",
"#collapse-output\n",
2021-05-22 01:21:08 +00:00
"print(x)\n",
"print()\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"id": "c0fb04da-2cbd-4fa1-8b85-d48441cc8962",
"metadata": {},
"source": [
"### Non-Commutativity\n",
"\n",
"If it's not commutative, then swapping different elements should produce a different hash. That is, $e1 × e2 \\ne e2 × e1$:"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2f3d139a-6c48-4ddb-9a34-7b2aa00853d6",
2021-05-22 01:21:08 +00:00
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"x = np.matmul(e1, e2)\n",
"y = np.matmul(e2, e1)\n",
2021-05-22 01:21:08 +00:00
"\n",
"# observe that they produce different summaries\n",
2021-05-22 01:21:08 +00:00
"assert_not_equal(x, y)"
]
},
{
"cell_type": "markdown",
"id": "a8c05183-08db-44f1-b0cc-5fb4aed77b1b",
"metadata": {
"tags": []
},
"source": [
"> Expand the sections below to see a comparison"
]
},
2021-05-22 01:21:08 +00:00
{
"cell_type": "code",
"execution_count": 11,
"id": "7f833e44-79d8-4c98-af41-0c915bee66ed",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true,
"source_hidden": true
},
2021-05-22 01:21:08 +00:00
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 87 79 149 131 148 247 195 90]\n",
" [249 84 195 58 142 133 211 15]\n",
" [177 93 69 254 240 234 97 37]\n",
" [ 46 84 76 253 55 200 43 236]\n",
" [ 21 84 99 157 55 148 170 2]\n",
" [168 123 6 250 64 144 54 242]\n",
" [230 78 164 76 30 29 214 68]\n",
" [ 47 183 156 239 157 177 192 184]]\n",
"\n",
"[[149 18 239 238 84 188 191 109]\n",
" [239 150 214 235 59 161 9 133]\n",
" [ 89 174 59 14 70 113 124 243]\n",
" [ 66 113 176 124 227 247 17 25]\n",
" [247 138 152 181 177 143 184 97]\n",
" [113 249 199 153 154 75 45 105]\n",
" [121 201 225 42 249 213 180 244]\n",
" [ 85 31 72 28 181 182 140 176]]\n"
]
}
],
"source": [
"#collapse-hide\n",
"#collapse-output\n",
2021-05-22 01:21:08 +00:00
"print(x)\n",
"print()\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"id": "2978d8f5-0c9e-445d-80d1-12229b589c24",
"metadata": {},
"source": [
"## Other functions"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "80ec6898-4163-4ba8-9460-c717a9e58c59",
"metadata": {},
"outputs": [],
"source": [
"# Create a list of 1024 elements and reduce them one by one\n",
"list1 = [hash_m(b\"A\") for _ in range(0, 1024)]\n",
"hash1 = reduce(mul_m, list1)\n",
"\n",
"# Take a starting element and square/double it 10 times. With 1 starting element over 10 doublings = 1024 elements\n",
"hash2 = reduce((lambda m, _ : mul_m(m, m)), range(0, 10), hash_m(b\"A\"))\n",
"\n",
"# Observe that these two methods of calculating the hash have the same result\n",
"assert_equal(hash1, hash2)\n",
"\n",
"# lets call it double\n",
"def double_m(m, d=1):\n",
" return reduce((lambda m, _ : mul_m(m, m)), range(0, d), m)\n",
"\n",
"assert_equal(hash1, double_m(hash_m(b\"A\"), 10))\n",
"\n",
"def identity_m():\n",
" return np.identity(8, dtype=np.uint8)\n",
"\n",
"# generalize double_m to any length, not just doublings, performed in ln(N) matmuls\n",
2021-05-22 01:21:08 +00:00
"def repeat_m(m, n):\n",
" res = identity_m()\n",
" while n > 0:\n",
" # concatenate the current doubling iff the bit representing this doubling is set\n",
" if n & 1:\n",
" res = mul_m(res, m)\n",
" n >>= 1\n",
" m = mul_m(m, m) # double matrix m\n",
" # print(s)\n",
" return res\n",
"\n",
"# repeat_m can do the same as double_m\n",
"assert_equal(hash1, repeat_m(hash_m(b\"A\"), 1024))\n",
"\n",
"# but it can also repeat any number of times\n",
"hash3 = reduce(mul_m, (hash_m(b\"A\") for _ in range(0, 3309)))\n",
"assert_equal(hash3, repeat_m(hash_m(b\"A\"), 3309))\n",
"\n",
"# Even returns a sensible result when requesting 0 elements\n",
"assert_equal(identity_m(), repeat_m(hash_m(b\"A\"), 0))\n",
"\n",
"# make helper for reducing an iterable of hashes\n",
"def reduce_m(am):\n",
" return reduce(mul_m, am)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "84738470-61c9-44b5-b6b7-9971a02547bd",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 68 252 159 3 14 52 199 199]\n",
" [136 124 6 34 58 174 206 54]\n",
" [ 3 234 2 13 120 240 7 163]\n",
" [102 47 66 61 87 234 246 72]\n",
" [ 19 135 80 115 75 242 242 5]\n",
" [244 165 250 28 76 43 188 254]\n",
" [233 46 187 39 151 241 175 130]\n",
" [132 138 6 215 20 132 89 33]]\n",
"\n",
"[[ 68 252 159 3 14 52 199 199]\n",
" [136 124 6 34 58 174 206 54]\n",
" [ 3 234 2 13 120 240 7 163]\n",
" [102 47 66 61 87 234 246 72]\n",
" [ 19 135 80 115 75 242 242 5]\n",
" [244 165 250 28 76 43 188 254]\n",
" [233 46 187 39 151 241 175 130]\n",
" [132 138 6 215 20 132 89 33]]\n"
2021-05-22 01:21:08 +00:00
]
}
],
"source": [
"print(hash1)\n",
"print()\n",
"print(hash2)"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "markdown",
2021-07-07 19:54:56 +00:00
"id": "f66e8f69-260c-40ca-bf26-306a85582ad6",
2021-05-22 01:21:08 +00:00
"metadata": {},
"source": [
2021-07-07 19:54:56 +00:00
"# Fun with associativity\n",
"\n",
"Does the hash of a list change even when swapping two elements in the middle of a very long list?"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c9b7e5c8-db73-43e6-89ef-cc912ce1578d",
"metadata": {},
"outputs": [],
"source": [
"a = hash_m(b\"A\")\n",
"b = hash_m(b\"B\")\n",
"\n",
"a499 = repeat_m(a, 499)\n",
"a500 = repeat_m(a, 500)\n",
"\n",
"# this should work because they're all a's\n",
"assert_equal(reduce_m([a, a499]), a500)\n",
"assert_equal(reduce_m([a499, a]), a500)\n",
"\n",
"# these are lists of 999 elements of a, with one b at position 500 (x) or 501 (y)\n",
"x = reduce_m([a499, b, a500])\n",
"y = reduce_m([a500, b, a499])\n",
"\n",
"# shifting the b by one element changed the hash\n",
"assert_not_equal(x, y)"
]
},
{
"cell_type": "markdown",
"id": "6c62dcbc-63a7-4a20-8f15-295e7675f7a8",
"metadata": {},
"source": [
"Flex that associativity - this statement is true and equivalent to the assertion below:\n",
"\n",
"$(a × (a499 × b × a500) × (a500 × b × a499) × a) = (a500 × b × (a500 × a500) × b × a500)$"
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "3a1602ec-db66-4ddf-95ec-80d0b3df7f58",
"metadata": {},
"outputs": [],
"source": [
"assert_equal(reduce_m([a, x, y, a]), reduce_m([a500, b, repeat_m(a500, 2), b, a500]))"
]
},
{
"cell_type": "markdown",
2021-07-07 19:54:56 +00:00
"id": "6cc30cb3-8079-4f8a-9b7c-a3b4f7e384a3",
2021-05-22 01:21:08 +00:00
"metadata": {},
"source": [
2021-07-07 19:54:56 +00:00
"# Unknowns\n",
2021-05-22 01:21:08 +00:00
"\n",
2021-07-07 19:54:56 +00:00
"This appears to me to be a reasonable way to define the hash of a list. The mathematical definition of a list aligns very nicely with the properties offered by matrix multiplication. But is it appropriate to use for the same things that a Merkle Tree would be? The big questions are related to the valuable properties of hash functions:\n",
"\n",
"* Given a merklist summary but not the elements, is it possible to produce a different list of elements that hash to the same summary? (~preimage resistance)\n",
"* Given a merklist summary or sublist summaries of it, can you derive the hashes of elements or their order?\n",
2021-07-07 19:54:56 +00:00
"* Is it possible to predictably alter the merklist summary by concatenating it with some other sublist of real elements?\n",
"* Are there other desirable security properties that would be valuable for a list hash?\n",
"* Is there a better choice of hash function as a primitive than sha512?\n",
"* Is there a better choice of reduction function that still retains associativity+non-commutativity than simple matmul?\n",
"* Is there a more appropriate size than an 8x8 matrix / 64 bytes to represent merklist summaries?\n",
"\n",
"Matrixes are well-studied objects, perhaps such information is already known. If *you* know something about deriving the preimage of the multiplication of a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$, I would be very interested to know."
2021-05-22 01:21:08 +00:00
]
},
{
"cell_type": "markdown",
"id": "4c4d4a83-8e2e-46d7-b2e3-2d59ba9c9e8c",
"metadata": {},
"source": [
2021-07-07 19:54:56 +00:00
"# What's next?\n",
2021-05-22 01:21:08 +00:00
"\n",
2021-07-07 19:54:56 +00:00
"***If** this construction has the appropriate security properties*, it seems to be a better merkle tree in all respects. Any use of a merkle tree could be replaced with this, and it could enable use-cases where merkle trees aren't useful. Some examples of what I think might be possible:\n",
2021-05-22 01:21:08 +00:00
"\n",
2021-07-07 19:54:56 +00:00
"* Using a Merklist with a sublist summary tree structure enables creating a $O(1)$-sized 'Merklist Proof' that can verify the addition and subtraction of any number of elements at any single point in the list using only $O(log(N))$ time and $O(log(N))$ static space. As a bonus the proof generator and verifier can have totally different tree structures and can still communicate the proof successfully.\n",
"* Using a Merklist summary tree you can create a consistent hash of any ordered key-value store (like a btree) that can be maintained incrementally inline with regular node updates, e.g. as part of a [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree). This could facilitate verification and sync between database replicas.\n",
"* The sublist summary tree structure can be as dense or sparse as desired. You could summarize down to pairs of elements akin to a merkle tree, but you could also summarize a compressed sublist of hundreds or even millions of elements with a single hash. Of course, calculating or verifying a proof of changes to the middle of that sublist would require rehashing the whole sublist, but this turns it from a fixed structure into a tuneable parameter.\n",
"* If all possible elements had an easily calculatable inverse, that would enable \"subtracting\" an element by inserting its inverse in front of it. That would basically extend the group from a ring into a field, and might have interesting implications.\n",
" * For example you could define a cryptographically-secure rolling hash where advancing either end can be calculated in `O(1)` time and space.\n",
2021-05-22 01:21:08 +00:00
"\n",
2021-07-07 19:54:56 +00:00
"To be continued..."
2021-05-22 01:21:08 +00:00
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
},
"toc-autonumbering": false,
"toc-showmarkdowntxt": false
},
"nbformat": 4,
"nbformat_minor": 5
}