notebook/2021-07-07-Merklist.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "83dd7287-bca5-49f9-b927-31bbc519d5b9",
   "metadata": {},
   "source": [
    "# Merklist\n",
    "> A definition for the hash of a list that is robust to arbitrary partitioning\n",
    "\n",
    "- toc: true\n",
    "- categories: [merklist]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf97974c-5582-4bf5-8ed8-6c43daf5036c",
   "metadata": {},
   "source": [
    "Matrix multiplication's associativity and non-commutativity properties provide a natural definition for a [cryptographic hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) / digest / summary of an ordered list of elements while preserving concatenation operations. Due to the non-commutativity property, lists that differ in element order result in a different summary. Due to the associativity property, arbitrarily divided adjacent sub-lists can be summarized independently and combined to find the summary of their concatenation in one operation. This definition provides exactly the properties needed to define a list, and does not impose any unnecessary structure that could cause two equivalent lists to produce different summaries. The name *Merklist* is intended to be reminicent of other hash-based data structures like [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) and [Merklix Tree](https://www.deadalnix.me/2016/09/24/introducing-merklix-tree-as-an-unordered-merkle-tree-on-steroid/)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f17d376-b03f-498b-a794-ea566e0b63f7",
   "metadata": {},
   "source": [
    "## Definition\n",
    "\n",
    "This definition of a hash of a list of elements is pretty simple:\n",
    "\n",
    "* A **list element** is an arbitrary buffer of bytes. Any length, any content. Just bytes.\n",
    "* A **list**, then, is a sequence of such elements.\n",
    "* The **hash of a list element** is the cryptographic hash of its bytes, formatted into a square matrix with byte elements. (More details later.)\n",
    "* The **hash of a list** is reduction by matrix multiplication of the hashes of all the list elements in the same order as they appear in the list.\n",
    "* The **hash of a list with 0 elements** is the identity matrix.\n",
    "\n",
    "This construction has a couple notable concequences:\n",
    "\n",
    "* The hash of a list with only one item is just the hash of the item itself.\n",
    "* You can calculate the hash of any list concatenated with a copy of itself by matrix multiplication of the the hash with itself. This works for single elements as well as arbitrarily long lists.\n",
    "* A list can have multiple copies of the same list item, and swapping them does not affect the list hash. Consider how swapping the first two elements in `[1, 1, 2]` has no discernible effect.\n",
    "* The hash of the concatenation of two lists is the matrix multiplication of their hashes.\n",
    "* Concatenating a list with a list of 0 elements yields the same hash.\n",
    "\n",
    "Lets explore this definition in more detail with a simple implementation in python+numpy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "99b521d8-1c66-49d7-98e9-6fa1d8d7c18f",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#collapse-hide\n",
    "# Setup and imports\n",
    "import hashlib\n",
    "import numpy as np\n",
    "from functools import reduce\n",
    "\n",
    "def assert_equal(a, b):\n",
    "    return np.testing.assert_equal(a, b)\n",
    "\n",
    "def assert_not_equal(a, b):\n",
    "    return np.testing.assert_raises(AssertionError, np.testing.assert_equal, a, b)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc1306b8-5e89-460a-997c-c9464c16615d",
   "metadata": {},
   "source": [
    "### The hash of a list element - `hash_m/1`\n",
    "The function `hash_m/1` takes a buffer of bytes as its first argument, and returns the sha512 hash of the bytes formatted as an 8×8 2-d array of 8-bit unsigned integers with wrapping overflow. **We define this hash to be the hash of the list element.** Based on a shallow wikipedia dive, someone familiar with linear algebra might say it's a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$. Not coincidentally, sha512 outputs 512 bits = 64 bytes = 8 * 8 array of bytes, how convenient. (In fact, that might even be the primary reason why I chose sha512!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3ccc7fdc-fa6a-48e3-accb-3c1070b4559c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def hash_m(e):\n",
    "    hash_bytes = list(hashlib.sha512(e).digest())[:64]          # hash the bytes e, convert the digest into a list of 64 bytes\n",
    "    return np.array(hash_bytes, dtype=np.uint8).reshape((8,8))  # convert the digest bytes into a numpy array with the appropriate data type and shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04132091-21b1-4fbb-99df-711ae5e0c819",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "8×8 seems big compared to 3×3 or 4×4 matrixes. The values are as random as you might expect a cryptographic hash to be, and range from 0-255:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "65aa7c7a-25d5-4971-8780-661f367e45ab",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 14 184 108 217 131 164 222  93]\n",
      " [132 227  82 144 111 178 195 109]\n",
      " [ 25 250 155  17 131 183 151 217]\n",
      " [212  60 138  36   0  60 115 181]\n",
      " [ 51   0  87  43  93 252  56  61]\n",
      " [108 239 175 222  23 142  41 216]\n",
      " [203  98 234  13  65 169 255 240]\n",
      " [ 46 127  15 167 112 153 222  94]]\n",
      "\n",
      "[[ 63 144 188   5  57 146  32  56]\n",
      " [ 27 189  98 140 113 194  70  87]\n",
      " [115  21 136  27 116 167  85  48]\n",
      " [ 29 162 119  29 104  32 145 241]\n",
      " [166 197  57 165 132 213  50 202]\n",
      " [ 48  71  33  19 230  26  58 164]\n",
      " [242 172  65 202 193  50 193 141]\n",
      " [206 110 165 129  52 132 250  73]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "print(hash_m(b\"Hello A\"))\n",
    "print()\n",
    "print(hash_m(b\"Hello B\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0c37110-b38d-4420-adf9-11ff5c5cd590",
   "metadata": {},
   "source": [
    "### The hash of a list - `mul_m/2`\n",
    "Ok so we've got our element hashes, how do we combine them to construct the hash of a list? We defined the hash of the list to be reduction by matrix multiplication of the hash of each element:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "91afe2ad-19dc-475c-ad8b-17b70ba9fb79",
   "metadata": {},
   "outputs": [],
   "source": [
    "def mul_m(he1, he2):\n",
    "    return np.matmul(he1, he2, dtype=np.uint8) # just, like, multiply them"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39638a4a-6a42-4710-bcd2-f4a41c24f4cf",
   "metadata": {},
   "source": [
    "Consider an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "eb84e6e1-b1c1-48f4-aa50-3ae0edfc78af",
   "metadata": {},
   "outputs": [],
   "source": [
    "#\n",
    "# `elements` is a list of 3 elements\n",
    "elements = [b\"A\", b\"Hello\", b\"World\"]\n",
    "# first, make a new list with the hash of each element\n",
    "element_hashes = [hash_m(e) for e in elements]\n",
    "# get the hash of the list by reducing the hashes by matrix multiplication\n",
    "list_hash1 = mul_m(mul_m(element_hashes[0], element_hashes[1]), element_hashes[2])\n",
    "# an alternative way to write the reduction\n",
    "list_hash2 = reduce(mul_m, element_hashes)\n",
    "# check that these alternative spellings are equivalent\n",
    "assert_equal(list_hash1, list_hash2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5632eb48-e5cd-4ec4-9bcd-1b59a5dc6042",
   "metadata": {},
   "source": [
    "> Expand the sections below to see a comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "694b4727-621e-4c1b-a2af-99296a8e664a",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true,
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "List of elements:\n",
      "[b'A', b'Hello', b'World']\n",
      "\n",
      "Hash of each element:\n",
      "[array([[ 33, 180, 244, 189, 158, 100, 237,  53],\n",
      "       [ 92,  62, 182, 118, 162, 142, 190, 218],\n",
      "       [246, 216, 241, 123, 220,  54,  89, 149],\n",
      "       [179,  25,   9, 113,  83,   4,  64, 128],\n",
      "       [ 81, 107, 208, 131, 191, 204, 230,  97],\n",
      "       [ 33, 163,   7,  38,  70, 153,  76, 132],\n",
      "       [ 48, 204,  56,  43, 141, 197,  67, 232],\n",
      "       [ 72, 128,  24,  59, 248,  86, 207, 245]], dtype=uint8), array([[ 54,  21, 248,  12, 157,  41,  62, 215],\n",
      "       [ 64,  38, 135, 249,  75,  34, 213, 142],\n",
      "       [ 82, 155, 140, 199, 145, 111, 143, 172],\n",
      "       [127, 221, 247, 251, 213, 175,  76, 247],\n",
      "       [119, 211, 215, 149, 167, 160,  10,  22],\n",
      "       [191, 126, 127,  63, 185,  86,  30, 233],\n",
      "       [186, 174,  72,  13, 169, 254, 122,  24],\n",
      "       [118, 158, 113, 136, 107,   3, 243,  21]], dtype=uint8), array([[142, 167, 115, 147, 164,  42, 184, 250],\n",
      "       [146,  80,  15, 176, 119, 169,  80, 156],\n",
      "       [195,  43, 201,  94, 114, 113,  46, 250],\n",
      "       [ 17, 110, 218, 242, 237, 250, 227,  79],\n",
      "       [187, 104,  46, 253, 214, 197, 221,  19],\n",
      "       [193,  23, 224, 139, 212, 170, 239, 113],\n",
      "       [ 41,  29, 138, 172, 226, 248, 144,  39],\n",
      "       [ 48, 129, 208, 103, 124,  22, 223,  15]], dtype=uint8)]\n",
      "\n",
      "Hash of full list:\n",
      "[[178 188  57 157  60 136 190 127]\n",
      " [ 40 234 254 224  38  46 250  52]\n",
      " [156  72 193 136 219  98  28   4]\n",
      " [197   2  43 132 132 232 254 198]\n",
      " [ 93  64 113 215   2 246 130 192]\n",
      " [ 91 107  85  13 149  60  19 173]\n",
      " [ 84  77 244  98   0 239 123  17]\n",
      " [ 58 112  98 250 163  20  27   6]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "#collapse-output\n",
    "print(\"List of elements:\")\n",
    "print(elements)\n",
    "print()\n",
    "print(\"Hash of each element:\")\n",
    "print(element_hashes)\n",
    "print()\n",
    "print(\"Hash of full list:\")\n",
    "print(list_hash1)\n",
    "# Expand the section below to see the output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de064a80-208d-4850-b95e-c5a707f7f3b3",
   "metadata": {},
   "source": [
    "What does this give us? Generally speaking, multiplying two square matrixes $M_1×M_2$ gives us at least these two properties:\n",
    "\n",
    "* [Associativity](#Associativity) - Associativity enables you to reduce a computation using any partitioning because all partitionings yield the same result. Addition is associative $(1+2)+3 = 1+(2+3)$, subtraction is not $(5-3)-2\\neq5-(3-2)$. ([Associative property](https://en.wikipedia.org/wiki/Associative_property))\n",
    "* [Non-Commutativity](#Non-Commutativity) - Commutativity allows you to swap elements without affecting the result. Addition is commutative $1+2 = 2+1$, but division is not $1\\div2 \\neq2\\div1$. And neither is matrix multiplication. ([Commutative property](https://en.wikipedia.org/wiki/Commutative_property))\n",
    "\n",
    "This is an unusual combination of properties for an operation. It's at least not a combination encountered in introductory algebra:\n",
    "\n",
    "|     | associative | commutative |\n",
    "| --- | ---         | ---         |\n",
    "| $a+b$   | ✅ | ✅ |\n",
    "| $a*b$   | ✅ | ✅ |\n",
    "| $a-b$   | ❌ | ❌ |\n",
    "| $a/b$   | ❌ | ❌ |\n",
    "| $a^b$ | ❌ | ❌ |\n",
    "| $M×M$ | ✅ | ❌ |\n",
    "\n",
    "Upon consideration, these are the exact properties that one would want in order to define the hash of a list of items. Non-commutativity enables the order of elements in the list to be well defined, since swapping different elements produces a different hash. Associativity enables calculating the hash of the list by performing the reduction operations in any order, and you still get the same hash.\n",
    "\n",
    "Lets sanity-check that these properties can hold for the construction described above."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6c8ef5e-99d2-4a7e-887f-54b93a7baf4a",
   "metadata": {},
   "source": [
    "### Associativity\n",
    "\n",
    "If it's associative, we should get the same hash if we rearrange the parenthesis to indicate reduction in a different operation order. That is: $((e1 × e2) × e3) = (e1 × (e2 × e3))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6da02d5e-a783-4a57-90ac-04a654d89006",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "e1 = hash_m(b\"Hello A\")\n",
    "e2 = hash_m(b\"Hello B\")\n",
    "e3 = hash_m(b\"Hello C\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0452955b-2d7e-41e4-924f-8f00ef0c46cf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "x = np.matmul(np.matmul(e1, e2), e3)\n",
    "y = np.matmul(e1, np.matmul(e2, e3))\n",
    "\n",
    "# observe that they produce the same summary\n",
    "assert_equal(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bda2bacf-daf7-4f42-93cf-c422704dc067",
   "metadata": {
    "tags": []
   },
   "source": [
    "> Expand the sections below to see a comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "b7a1906d-524c-4339-920a-978a0385d6cc",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true,
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 58  12 144 134 100 158 159  51]\n",
      " [ 73 206 202 190  87  79 223   2]\n",
      " [210 122 142 117  37 148 106  45]\n",
      " [175 146 187 223 235 171  64 226]\n",
      " [149  85 203  87  92 251 243 206]\n",
      " [ 18 252 160 103 125 251 181 133]\n",
      " [191 132 220 104 213 154  34 154]\n",
      " [127 197  95  87 166   3  22   3]]\n",
      "\n",
      "[[ 58  12 144 134 100 158 159  51]\n",
      " [ 73 206 202 190  87  79 223   2]\n",
      " [210 122 142 117  37 148 106  45]\n",
      " [175 146 187 223 235 171  64 226]\n",
      " [149  85 203  87  92 251 243 206]\n",
      " [ 18 252 160 103 125 251 181 133]\n",
      " [191 132 220 104 213 154  34 154]\n",
      " [127 197  95  87 166   3  22   3]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "#collapse-output\n",
    "print(x)\n",
    "print()\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0fb04da-2cbd-4fa1-8b85-d48441cc8962",
   "metadata": {},
   "source": [
    "### Non-Commutativity\n",
    "\n",
    "If it's not commutative, then swapping different elements should produce a different hash. That is, $e1 × e2 \\ne e2 × e1$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2f3d139a-6c48-4ddb-9a34-7b2aa00853d6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "x = np.matmul(e1, e2)\n",
    "y = np.matmul(e2, e1)\n",
    "\n",
    "# observe that they produce different summaries\n",
    "assert_not_equal(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8c05183-08db-44f1-b0cc-5fb4aed77b1b",
   "metadata": {
    "tags": []
   },
   "source": [
    "> Expand the sections below to see a comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "7f833e44-79d8-4c98-af41-0c915bee66ed",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true,
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 87  79 149 131 148 247 195  90]\n",
      " [249  84 195  58 142 133 211  15]\n",
      " [177  93  69 254 240 234  97  37]\n",
      " [ 46  84  76 253  55 200  43 236]\n",
      " [ 21  84  99 157  55 148 170   2]\n",
      " [168 123   6 250  64 144  54 242]\n",
      " [230  78 164  76  30  29 214  68]\n",
      " [ 47 183 156 239 157 177 192 184]]\n",
      "\n",
      "[[149  18 239 238  84 188 191 109]\n",
      " [239 150 214 235  59 161   9 133]\n",
      " [ 89 174  59  14  70 113 124 243]\n",
      " [ 66 113 176 124 227 247  17  25]\n",
      " [247 138 152 181 177 143 184  97]\n",
      " [113 249 199 153 154  75  45 105]\n",
      " [121 201 225  42 249 213 180 244]\n",
      " [ 85  31  72  28 181 182 140 176]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "#collapse-output\n",
    "print(x)\n",
    "print()\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2978d8f5-0c9e-445d-80d1-12229b589c24",
   "metadata": {},
   "source": [
    "## Other functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "80ec6898-4163-4ba8-9460-c717a9e58c59",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a list of 1024 elements and reduce them one by one\n",
    "list1 = [hash_m(b\"A\") for _ in range(0, 1024)]\n",
    "hash1 = reduce(mul_m, list1)\n",
    "\n",
    "# Take a starting element and square/double it 10 times. With 1 starting element over 10 doublings = 1024 elements\n",
    "hash2 = reduce((lambda m, _ : mul_m(m, m)), range(0, 10), hash_m(b\"A\"))\n",
    "\n",
    "# Observe that these two methods of calculating the hash have the same result\n",
    "assert_equal(hash1, hash2)\n",
    "\n",
    "# lets call it double\n",
    "def double_m(m, d=1):\n",
    "    return reduce((lambda m, _ : mul_m(m, m)), range(0, d), m)\n",
    "\n",
    "assert_equal(hash1, double_m(hash_m(b\"A\"), 10))\n",
    "\n",
    "def identity_m():\n",
    "    return np.identity(8, dtype=np.uint8)\n",
    "\n",
    "# generalize double_m to any length, not just doublings, performed in ln(N) matmuls\n",
    "def repeat_m(m, n):\n",
    "    res = identity_m()\n",
    "    while n > 0:\n",
    "        # concatenate the current doubling iff the bit representing this doubling is set\n",
    "        if n & 1:\n",
    "            res = mul_m(res, m)\n",
    "        n >>= 1\n",
    "        m = mul_m(m, m) # double matrix m\n",
    "        # print(s)\n",
    "    return res\n",
    "\n",
    "# repeat_m can do the same as double_m\n",
    "assert_equal(hash1, repeat_m(hash_m(b\"A\"), 1024))\n",
    "\n",
    "# but it can also repeat any number of times\n",
    "hash3 = reduce(mul_m, (hash_m(b\"A\") for _ in range(0, 3309)))\n",
    "assert_equal(hash3, repeat_m(hash_m(b\"A\"), 3309))\n",
    "\n",
    "# Even returns a sensible result when requesting 0 elements\n",
    "assert_equal(identity_m(), repeat_m(hash_m(b\"A\"), 0))\n",
    "\n",
    "# make helper for reducing an iterable of hashes\n",
    "def reduce_m(am):\n",
    "    return reduce(mul_m, am)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "84738470-61c9-44b5-b6b7-9971a02547bd",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 68 252 159   3  14  52 199 199]\n",
      " [136 124   6  34  58 174 206  54]\n",
      " [  3 234   2  13 120 240   7 163]\n",
      " [102  47  66  61  87 234 246  72]\n",
      " [ 19 135  80 115  75 242 242   5]\n",
      " [244 165 250  28  76  43 188 254]\n",
      " [233  46 187  39 151 241 175 130]\n",
      " [132 138   6 215  20 132  89  33]]\n",
      "\n",
      "[[ 68 252 159   3  14  52 199 199]\n",
      " [136 124   6  34  58 174 206  54]\n",
      " [  3 234   2  13 120 240   7 163]\n",
      " [102  47  66  61  87 234 246  72]\n",
      " [ 19 135  80 115  75 242 242   5]\n",
      " [244 165 250  28  76  43 188 254]\n",
      " [233  46 187  39 151 241 175 130]\n",
      " [132 138   6 215  20 132  89  33]]\n"
     ]
    }
   ],
   "source": [
    "print(hash1)\n",
    "print()\n",
    "print(hash2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f66e8f69-260c-40ca-bf26-306a85582ad6",
   "metadata": {},
   "source": [
    "# Fun with associativity\n",
    "\n",
    "Does the hash of a list change even when swapping two elements in the middle of a very long list?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "c9b7e5c8-db73-43e6-89ef-cc912ce1578d",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = hash_m(b\"A\")\n",
    "b = hash_m(b\"B\")\n",
    "\n",
    "a499 = repeat_m(a, 499)\n",
    "a500 = repeat_m(a, 500)\n",
    "\n",
    "# this should work because they're all a's\n",
    "assert_equal(reduce_m([a, a499]), a500)\n",
    "assert_equal(reduce_m([a499, a]), a500)\n",
    "\n",
    "# these are lists of 999 elements of a, with one b at position 500 (x) or 501 (y)\n",
    "x = reduce_m([a499, b, a500])\n",
    "y = reduce_m([a500, b, a499])\n",
    "\n",
    "# shifting the b by one element changed the hash\n",
    "assert_not_equal(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c62dcbc-63a7-4a20-8f15-295e7675f7a8",
   "metadata": {},
   "source": [
    "Flex that associativity - this statement is true and equivalent to the assertion below:\n",
    "\n",
    "$(a × (a499 × b × a500) × (a500 × b × a499) × a) = (a500 × b × (a500 × a500) × b × a500)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "3a1602ec-db66-4ddf-95ec-80d0b3df7f58",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert_equal(reduce_m([a, x, y, a]), reduce_m([a500, b, repeat_m(a500, 2), b, a500]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cc30cb3-8079-4f8a-9b7c-a3b4f7e384a3",
   "metadata": {},
   "source": [
    "# Unknowns\n",
    "\n",
    "This appears to me to be a reasonable way to define the hash of a list. The mathematical definition of a list aligns very nicely with the properties offered by matrix multiplication. But is it appropriate to use for the same things that a Merkle Tree would be? The big questions are related to the valuable properties of hash functions:\n",
    "\n",
    "* Given a merklist summary but not the elements, is it possible to produce a different list of elements that hash to the same summary? (~preimage resistance)\n",
    "* Given a merklist summary or sublist summaries of it, can you derive the hashes of elements or their order?\n",
    "* Is it possible to predictably alter the merklist summary by concatenating it with some other sublist of real elements?\n",
    "* Are there other desirable security properties that would be valuable for a list hash?\n",
    "* Is there a better choice of hash function as a primitive than sha512?\n",
    "* Is there a better choice of reduction function that still retains associativity+non-commutativity than simple matmul?\n",
    "* Is there a more appropriate size than an 8x8 matrix / 64 bytes to represent merklist summaries?\n",
    "\n",
    "Matrixes are well-studied objects, perhaps such information is already known. If *you* know something about deriving the preimage of the multiplication of a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$, I would be very interested to know."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c4d4a83-8e2e-46d7-b2e3-2d59ba9c9e8c",
   "metadata": {},
   "source": [
    "# What's next?\n",
    "\n",
    "***If** this construction has the appropriate security properties*, it seems to be a better merkle tree in all respects. Any use of a merkle tree could be replaced with this, and it could enable use-cases where merkle trees aren't useful. Some examples of what I think might be possible:\n",
    "\n",
    "* Using a Merklist with a sublist summary tree structure enables creating a $O(1)$-sized 'Merklist Proof' that can verify the addition and subtraction of any number of elements at any single point in the list using only $O(log(N))$ time and $O(log(N))$ static space. As a bonus the proof generator and verifier can have totally different tree structures and can still communicate the proof successfully.\n",
    "* Using a Merklist summary tree you can create a consistent hash of any ordered key-value store (like a btree) that can be maintained incrementally inline with regular node updates, e.g. as part of a [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree). This could facilitate verification and sync between database replicas.\n",
    "* The sublist summary tree structure can be as dense or sparse as desired. You could summarize down to pairs of elements akin to a merkle tree, but you could also summarize a compressed sublist of hundreds or even millions of elements with a single hash. Of course, calculating or verifying a proof of changes to the middle of that sublist would require rehashing the whole sublist, but this turns it from a fixed structure into a tuneable parameter.\n",
    "* If all possible elements had an easily calculatable inverse, that would enable \"subtracting\" an element by inserting its inverse in front of it. That would basically extend the group from a ring into a field, and might have interesting implications.\n",
    "    * For example you could define a cryptographically-secure rolling hash where advancing either end can be calculated in `O(1)` time and space.\n",
    "\n",
    "To be continued..."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  },
  "toc-autonumbering": false,
  "toc-showmarkdowntxt": false
 },
 "nbformat": 4,
 "nbformat_minor": 5
}