notebook/2021-07-07-Merklist.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "83dd7287-bca5-49f9-b927-31bbc519d5b9",
   "metadata": {},
   "source": [
    "# Merklist\n",
    "> An associative definition for the hash of an ordered sequence of elements\n",
    "\n",
    "- toc: true\n",
    "- categories: [merklist]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf97974c-5582-4bf5-8ed8-6c43daf5036c",
   "metadata": {},
   "source": [
    "Matrix multiplication's associativity and non-commutativity properties provide a natural definition for a [cryptographic hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) / digest / summary of an ordered list of elements while preserving concatenation operations. Due to the non-commutativity property, lists that differ in element order result in a different summary. Due to the associativity property, arbitrarily divided adjacent sub-lists can be summarized independently and combined to find the summary of their concatenation in one operation. This definition provides exactly the properties needed to define a list, and does not impose any unnecessary structure that could cause two equivalent lists to produce different summaries. The name *Merklist* is intended to be reminicent of other hash-based data structures like [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) and [Merklix Tree](https://www.deadalnix.me/2016/09/24/introducing-merklix-tree-as-an-unordered-merkle-tree-on-steroid/)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f17d376-b03f-498b-a794-ea566e0b63f7",
   "metadata": {},
   "source": [
    "## Definition\n",
    "\n",
    "This definition of a hash of a list of elements is pretty simple:\n",
    "\n",
    "* A **list element** is an arbitrary buffer of bytes. Any length, any content. Just bytes.\n",
    "* A **list**, then, is a sequence of such elements.\n",
    "* The **hash of a list element** is the cryptographic hash of its bytes, formatted into a square matrix with byte elements. (More details later.)\n",
    "* The **hash of a list** is reduction by matrix multiplication of the hashes of all the list elements in the same order as they appear in the list.\n",
    "* The **hash of a list with 0 elements** is the identity matrix.\n",
    "\n",
    "This construction has a couple notable concequences:\n",
    "\n",
    "* The hash of a list with only one item is just the hash of the item itself.\n",
    "* You can calculate the hash of any list concatenated with a copy of itself by matrix multiplication of the the hash with itself. This works for single elements as well as arbitrarily long lists.\n",
    "* A list can have multiple copies of the same list item, and swapping them does not affect the list hash. Consider how swapping the first two elements in `[1, 1, 2]` has no discernible effect.\n",
    "* The hash of the concatenation of two lists is the matrix multiplication of their hashes.\n",
    "* Concatenating a list with a list of 0 elements yields the same hash.\n",
    "\n",
    "Lets explore this definition in more detail with a simple implementation in python+numpy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "99b521d8-1c66-49d7-98e9-6fa1d8d7c18f",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#collapse-hide\n",
    "# Setup and imports\n",
    "import hashlib\n",
    "import numpy as np\n",
    "from functools import reduce\n",
    "\n",
    "def assert_equal(a, b):\n",
    "    return np.testing.assert_equal(a, b)\n",
    "\n",
    "def assert_not_equal(a, b):\n",
    "    return np.testing.assert_raises(AssertionError, np.testing.assert_equal, a, b)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc1306b8-5e89-460a-997c-c9464c16615d",
   "metadata": {},
   "source": [
    "### The hash of a list element - `hash_m/1`\n",
    "The function `hash_m/1` takes a buffer of bytes as its first argument, and returns the sha512 hash of the bytes formatted as an 8×8 2-d array of 8-bit unsigned integers with wrapping overflow. **We define this hash to be the hash of the list element.** Based on a shallow wikipedia dive, someone familiar with linear algebra might say it's a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring), $R_{256}^{8×8}$. Not coincidentally, sha512 outputs 512 bits = 64 bytes = 8 * 8 array of bytes, how convenient. (In fact, that might even be the primary reason why I chose sha512!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3ccc7fdc-fa6a-48e3-accb-3c1070b4559c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def hash_m(e):\n",
    "    hash_bytes = list(hashlib.sha512(e).digest())[:64]          # hash the bytes e, convert the digest into a list of 64 bytes\n",
    "    return np.array(hash_bytes, dtype=np.uint8).reshape((8,8))  # convert the digest bytes into a numpy array with the appropriate data type and shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04132091-21b1-4fbb-99df-711ae5e0c819",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "8×8 seems big compared to 3×3 or 4×4 matrixes. The values are as random as you might expect a cryptographic hash to be, and range from 0-255:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "65aa7c7a-25d5-4971-8780-661f367e45ab",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 14 184 108 217 131 164 222  93]\n",
      " [132 227  82 144 111 178 195 109]\n",
      " [ 25 250 155  17 131 183 151 217]\n",
      " [212  60 138  36   0  60 115 181]\n",
      " [ 51   0  87  43  93 252  56  61]\n",
      " [108 239 175 222  23 142  41 216]\n",
      " [203  98 234  13  65 169 255 240]\n",
      " [ 46 127  15 167 112 153 222  94]]\n",
      "\n",
      "[[ 63 144 188   5  57 146  32  56]\n",
      " [ 27 189  98 140 113 194  70  87]\n",
      " [115  21 136  27 116 167  85  48]\n",
      " [ 29 162 119  29 104  32 145 241]\n",
      " [166 197  57 165 132 213  50 202]\n",
      " [ 48  71  33  19 230  26  58 164]\n",
      " [242 172  65 202 193  50 193 141]\n",
      " [206 110 165 129  52 132 250  73]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "print(hash_m(b\"Hello A\"))\n",
    "print()\n",
    "print(hash_m(b\"Hello B\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0c37110-b38d-4420-adf9-11ff5c5cd590",
   "metadata": {},
   "source": [
    "### The hash of a list - `mul_m/2`\n",
    "Ok so we've got our element hashes, how do we combine them to construct the hash of a list? We defined the hash of the list to be reduction by matrix multiplication of the hash of each element:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "91afe2ad-19dc-475c-ad8b-17b70ba9fb79",
   "metadata": {},
   "outputs": [],
   "source": [
    "def mul_m(he1, he2):\n",
    "    return np.matmul(he1, he2, dtype=np.uint8) # just, like, multiply them"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39638a4a-6a42-4710-bcd2-f4a41c24f4cf",
   "metadata": {},
   "source": [
    "Consider an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "eb84e6e1-b1c1-48f4-aa50-3ae0edfc78af",
   "metadata": {},
   "outputs": [],
   "source": [
    "#\n",
    "# `elements` is a list of 3 elements\n",
    "elements = [b\"A\", b\"Hello\", b\"World\"]\n",
    "# first, make a new list with the hash of each element\n",
    "element_hashes = [hash_m(e) for e in elements]\n",
    "# get the hash of the list by reducing the hashes by matrix multiplication\n",
    "list_hash1 = mul_m(mul_m(element_hashes[0], element_hashes[1]), element_hashes[2])\n",
    "# an alternative way to write the reduction\n",
    "list_hash2 = reduce(mul_m, element_hashes)\n",
    "# check that these alternative spellings are equivalent\n",
    "assert_equal(list_hash1, list_hash2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5632eb48-e5cd-4ec4-9bcd-1b59a5dc6042",
   "metadata": {},
   "source": [
    "> Expand the sections below to see a comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "694b4727-621e-4c1b-a2af-99296a8e664a",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true,
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "List of elements:\n",
      "[b'A', b'Hello', b'World']\n",
      "\n",
      "Hash of each element:\n",
      "[array([[ 33, 180, 244, 189, 158, 100, 237,  53],\n",
      "       [ 92,  62, 182, 118, 162, 142, 190, 218],\n",
      "       [246, 216, 241, 123, 220,  54,  89, 149],\n",
      "       [179,  25,   9, 113,  83,   4,  64, 128],\n",
      "       [ 81, 107, 208, 131, 191, 204, 230,  97],\n",
      "       [ 33, 163,   7,  38,  70, 153,  76, 132],\n",
      "       [ 48, 204,  56,  43, 141, 197,  67, 232],\n",
      "       [ 72, 128,  24,  59, 248,  86, 207, 245]], dtype=uint8), array([[ 54,  21, 248,  12, 157,  41,  62, 215],\n",
      "       [ 64,  38, 135, 249,  75,  34, 213, 142],\n",
      "       [ 82, 155, 140, 199, 145, 111, 143, 172],\n",
      "       [127, 221, 247, 251, 213, 175,  76, 247],\n",
      "       [119, 211, 215, 149, 167, 160,  10,  22],\n",
      "       [191, 126, 127,  63, 185,  86,  30, 233],\n",
      "       [186, 174,  72,  13, 169, 254, 122,  24],\n",
      "       [118, 158, 113, 136, 107,   3, 243,  21]], dtype=uint8), array([[142, 167, 115, 147, 164,  42, 184, 250],\n",
      "       [146,  80,  15, 176, 119, 169,  80, 156],\n",
      "       [195,  43, 201,  94, 114, 113,  46, 250],\n",
      "       [ 17, 110, 218, 242, 237, 250, 227,  79],\n",
      "       [187, 104,  46, 253, 214, 197, 221,  19],\n",
      "       [193,  23, 224, 139, 212, 170, 239, 113],\n",
      "       [ 41,  29, 138, 172, 226, 248, 144,  39],\n",
      "       [ 48, 129, 208, 103, 124,  22, 223,  15]], dtype=uint8)]\n",
      "\n",
      "Hash of full list:\n",
      "[[178 188  57 157  60 136 190 127]\n",
      " [ 40 234 254 224  38  46 250  52]\n",
      " [156  72 193 136 219  98  28   4]\n",
      " [197   2  43 132 132 232 254 198]\n",
      " [ 93  64 113 215   2 246 130 192]\n",
      " [ 91 107  85  13 149  60  19 173]\n",
      " [ 84  77 244  98   0 239 123  17]\n",
      " [ 58 112  98 250 163  20  27   6]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "#collapse-output\n",
    "print(\"List of elements:\")\n",
    "print(elements)\n",
    "print()\n",
    "print(\"Hash of each element:\")\n",
    "print(element_hashes)\n",
    "print()\n",
    "print(\"Hash of full list:\")\n",
    "print(list_hash1)\n",
    "# Expand the section below to see the output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de064a80-208d-4850-b95e-c5a707f7f3b3",
   "metadata": {},
   "source": [
    "What does this give us? Generally speaking, multiplying two square matrixes $M_1×M_2$ gives us at least these two properties:\n",
    "\n",
    "* [Associativity](#Associativity) - Associativity enables you to reduce a computation using any partitioning because all partitionings yield the same result. Addition is associative $(1+2)+3 = 1+(2+3)$, subtraction is not $(5-3)-2\\neq5-(3-2)$. ([Associative property](https://en.wikipedia.org/wiki/Associative_property))\n",
    "* [Non-Commutativity](#Non-Commutativity) - Commutativity allows you to swap elements without affecting the result. Addition is commutative $1+2 = 2+1$, but division is not $1\\div2 \\neq2\\div1$. And neither is matrix multiplication. ([Commutative property](https://en.wikipedia.org/wiki/Commutative_property))\n",
    "\n",
    "This is an unusual combination of properties for an operation. It's at least not a combination encountered in introductory algebra:\n",
    "\n",
    "|     | associative | commutative |\n",
    "| --- | ---         | ---         |\n",
    "| $a+b$   | ✅ | ✅ |\n",
    "| $a*b$   | ✅ | ✅ |\n",
    "| $a-b$   | ❌ | ❌ |\n",
    "| $a/b$   | ❌ | ❌ |\n",
    "| $a^b$ | ❌ | ❌ |\n",
    "| $M×M$ | ✅ | ❌ |\n",
    "\n",
    "Upon consideration, these are the exact properties that one would want in order to define the hash of a list of items. Non-commutativity enables the order of elements in the list to be well defined, since swapping different elements produces a different hash. Associativity enables calculating the hash of the list by performing the reduction operations in any order, and you still get the same hash.\n",
    "\n",
    "Lets sanity-check that these properties can hold for the construction described above."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6c8ef5e-99d2-4a7e-887f-54b93a7baf4a",
   "metadata": {},
   "source": [
    "### Associativity\n",
    "\n",
    "If it's associative, we should get the same hash if we rearrange the parenthesis to indicate reduction in a different operation order. That is: $((e1 × e2) × e3) = (e1 × (e2 × e3))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6da02d5e-a783-4a57-90ac-04a654d89006",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "e1 = hash_m(b\"Hello A\")\n",
    "e2 = hash_m(b\"Hello B\")\n",
    "e3 = hash_m(b\"Hello C\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0452955b-2d7e-41e4-924f-8f00ef0c46cf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "x = np.matmul(np.matmul(e1, e2), e3)\n",
    "y = np.matmul(e1, np.matmul(e2, e3))\n",
    "\n",
    "# observe that they produce the same summary\n",
    "assert_equal(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bda2bacf-daf7-4f42-93cf-c422704dc067",
   "metadata": {
    "tags": []
   },
   "source": [
    "> Expand the sections below to see a comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "b7a1906d-524c-4339-920a-978a0385d6cc",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true,
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 58  12 144 134 100 158 159  51]\n",
      " [ 73 206 202 190  87  79 223   2]\n",
      " [210 122 142 117  37 148 106  45]\n",
      " [175 146 187 223 235 171  64 226]\n",
      " [149  85 203  87  92 251 243 206]\n",
      " [ 18 252 160 103 125 251 181 133]\n",
      " [191 132 220 104 213 154  34 154]\n",
      " [127 197  95  87 166   3  22   3]]\n",
      "\n",
      "[[ 58  12 144 134 100 158 159  51]\n",
      " [ 73 206 202 190  87  79 223   2]\n",
      " [210 122 142 117  37 148 106  45]\n",
      " [175 146 187 223 235 171  64 226]\n",
      " [149  85 203  87  92 251 243 206]\n",
      " [ 18 252 160 103 125 251 181 133]\n",
      " [191 132 220 104 213 154  34 154]\n",
      " [127 197  95  87 166   3  22   3]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "#collapse-output\n",
    "print(x)\n",
    "print()\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0fb04da-2cbd-4fa1-8b85-d48441cc8962",
   "metadata": {},
   "source": [
    "### Non-Commutativity\n",
    "\n",
    "If it's not commutative, then swapping different elements should produce a different hash. That is, $e1 × e2 \\ne e2 × e1$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2f3d139a-6c48-4ddb-9a34-7b2aa00853d6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "x = np.matmul(e1, e2)\n",
    "y = np.matmul(e2, e1)\n",
    "\n",
    "# observe that they produce different summaries\n",
    "assert_not_equal(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8c05183-08db-44f1-b0cc-5fb4aed77b1b",
   "metadata": {
    "tags": []
   },
   "source": [
    "> Expand the sections below to see a comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "7f833e44-79d8-4c98-af41-0c915bee66ed",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true,
     "source_hidden": true
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 87  79 149 131 148 247 195  90]\n",
      " [249  84 195  58 142 133 211  15]\n",
      " [177  93  69 254 240 234  97  37]\n",
      " [ 46  84  76 253  55 200  43 236]\n",
      " [ 21  84  99 157  55 148 170   2]\n",
      " [168 123   6 250  64 144  54 242]\n",
      " [230  78 164  76  30  29 214  68]\n",
      " [ 47 183 156 239 157 177 192 184]]\n",
      "\n",
      "[[149  18 239 238  84 188 191 109]\n",
      " [239 150 214 235  59 161   9 133]\n",
      " [ 89 174  59  14  70 113 124 243]\n",
      " [ 66 113 176 124 227 247  17  25]\n",
      " [247 138 152 181 177 143 184  97]\n",
      " [113 249 199 153 154  75  45 105]\n",
      " [121 201 225  42 249 213 180 244]\n",
      " [ 85  31  72  28 181 182 140 176]]\n"
     ]
    }
   ],
   "source": [
    "#collapse-hide\n",
    "#collapse-output\n",
    "print(x)\n",
    "print()\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2978d8f5-0c9e-445d-80d1-12229b589c24",
   "metadata": {},
   "source": [
    "## Other functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "80ec6898-4163-4ba8-9460-c717a9e58c59",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a list of 1024 elements and reduce them one by one\n",
    "list1 = [hash_m(b\"A\") for _ in range(0, 1024)]\n",
    "hash1 = reduce(mul_m, list1)\n",
    "\n",
    "# Take a starting element and square/double it 10 times. With 1 starting element over 10 doublings = 1024 elements\n",
    "hash2 = reduce((lambda m, _ : mul_m(m, m)), range(0, 10), hash_m(b\"A\"))\n",
    "\n",
    "# Observe that these two methods of calculating the hash have the same result\n",
    "assert_equal(hash1, hash2)\n",
    "\n",
    "# lets call it double\n",
    "def double_m(m, d=1):\n",
    "    return reduce((lambda m, _ : mul_m(m, m)), range(0, d), m)\n",
    "\n",
    "assert_equal(hash1, double_m(hash_m(b\"A\"), 10))\n",
    "\n",
    "def identity_m():\n",
    "    return np.identity(8, dtype=np.uint8)\n",
    "\n",
    "# generalize double_m to any length, not just doublings, performed in ln(N) matmuls\n",
    "def repeat_m(m, n):\n",
    "    res = identity_m()\n",
    "    while n > 0:\n",
    "        # concatenate the current doubling iff the bit representing this doubling is set\n",
    "        if n & 1:\n",
    "            res = mul_m(res, m)\n",
    "        n >>= 1\n",
    "        m = mul_m(m, m) # double matrix m\n",
    "        # print(s)\n",
    "    return res\n",
    "\n",
    "# repeat_m can do the same as double_m\n",
    "assert_equal(hash1, repeat_m(hash_m(b\"A\"), 1024))\n",
    "\n",
    "# but it can also repeat any number of times\n",
    "hash3 = reduce(mul_m, (hash_m(b\"A\") for _ in range(0, 3309)))\n",
    "assert_equal(hash3, repeat_m(hash_m(b\"A\"), 3309))\n",
    "\n",
    "# Even returns a sensible result when requesting 0 elements\n",
    "assert_equal(identity_m(), repeat_m(hash_m(b\"A\"), 0))\n",
    "\n",
    "# make helper for reducing an iterable of hashes\n",
    "def reduce_m(am):\n",
    "    return reduce(mul_m, am)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "84738470-61c9-44b5-b6b7-9971a02547bd",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 68 252 159   3  14  52 199 199]\n",
      " [136 124   6  34  58 174 206  54]\n",
      " [  3 234   2  13 120 240   7 163]\n",
      " [102  47  66  61  87 234 246  72]\n",
      " [ 19 135  80 115  75 242 242   5]\n",
      " [244 165 250  28  76  43 188 254]\n",
      " [233  46 187  39 151 241 175 130]\n",
      " [132 138   6 215  20 132  89  33]]\n",
      "\n",
      "[[ 68 252 159   3  14  52 199 199]\n",
      " [136 124   6  34  58 174 206  54]\n",
      " [  3 234   2  13 120 240   7 163]\n",
      " [102  47  66  61  87 234 246  72]\n",
      " [ 19 135  80 115  75 242 242   5]\n",
      " [244 165 250  28  76  43 188 254]\n",
      " [233  46 187  39 151 241 175 130]\n",
      " [132 138   6 215  20 132  89  33]]\n"
     ]
    }
   ],
   "source": [
    "print(hash1)\n",
    "print()\n",
    "print(hash2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f66e8f69-260c-40ca-bf26-306a85582ad6",
   "metadata": {},
   "source": [
    "# Fun with associativity\n",
    "\n",
    "Does the hash of a list change even when swapping two elements in the middle of a very long list?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "c9b7e5c8-db73-43e6-89ef-cc912ce1578d",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = hash_m(b\"A\")\n",
    "b = hash_m(b\"B\")\n",
    "\n",
    "a499 = repeat_m(a, 499)\n",
    "a500 = repeat_m(a, 500)\n",
    "\n",
    "# this should work because they're all a's\n",
    "assert_equal(reduce_m([a, a499]), a500)\n",
    "assert_equal(reduce_m([a499, a]), a500)\n",
    "\n",
    "# these are lists of 999 elements of a, with one b at position 500 (x) or 501 (y)\n",
    "x = reduce_m([a499, b, a500])\n",
    "y = reduce_m([a500, b, a499])\n",
    "\n",
    "# shifting the b by one element changed the hash\n",
    "assert_not_equal(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c62dcbc-63a7-4a20-8f15-295e7675f7a8",
   "metadata": {},
   "source": [
    "Flex that associativity - this statement is true and equivalent to the assertion below:\n",
    "\n",
    "$(a × (a499 × b × a500) × (a500 × b × a499) × a) = (a500 × b × (a500 × a500) × b × a500)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "3a1602ec-db66-4ddf-95ec-80d0b3df7f58",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert_equal(reduce_m([a, x, y, a]), reduce_m([a500, b, repeat_m(a500, 2), b, a500]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "624d5cb8-01cb-4c6d-b80a-4b2079e4ad54",
   "metadata": {},
   "source": [
    "## Security\n",
    "\n",
    "The security of the design of merklist leans on existing cryptographic hash function by the requiring that values that go into the final list digest came from some real hashed preimage, and any implementation can require that the primage of every element must be provided. This means that it's not possible to choose an *arbitrary* element hash as an input to the reduction operation, limiting the blast radius that any specific element can have on the final list digest to the bit fixing that is possible by asic bitcoin miners today. So then the question is if bit fixing is enough to degrade the preimage resistance of the merklist design, and if that is impossible to overcome. (Challenges anyone?)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9515603f-f8b5-482a-ae1b-ed504358b860",
   "metadata": {},
   "source": [
    "## Unknowns\n",
    "\n",
    "This appears to me to be a reasonable way to define the hash of a list. The mathematical description of a list aligns very nicely with the properties offered by matrix multiplication. But is it appropriate to use for the same things that a Merkle Tree would be? The big questions are related to the valuable properties of hash functions:\n",
    "\n",
    "* Given a merklist summary but not the elements, is it possible to produce a different list of elements that hash to the same summary? (~preimage resistance)\n",
    "* Given a merklist summary or sublist summaries of it, can you derive the hashes of elements or their order?\n",
    "* Is it possible to predictably alter the merklist summary by concatenating it with some other sublist of real elements?\n",
    "* Are there other desirable security properties that would be valuable for a list hash?\n",
    "* Is there a better choice of hash function as a primitive than sha512?\n",
    "* Is there a better choice of reduction function that still retains associativity+non-commutativity than simple matmul?\n",
    "* Is there a more appropriate size than an 8x8 matrix / 64 bytes to represent merklist summaries?\n",
    "\n",
    "Matrixes are well-studied objects, perhaps some of these questions already have an answer. If *you* know something about deriving a preimage of a [matrix ring](https://en.wikipedia.org/wiki/Matrix_ring) $R_{256}^{8×8}$ using just multiplication, I would be very interested to know. Given the simplicity of the definition, I wouldn't be surprised if there was already a big wrench in the cogs here."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c4d4a83-8e2e-46d7-b2e3-2d59ba9c9e8c",
   "metadata": {},
   "source": [
    "## What's next?\n",
    "\n",
    "*If this construction has appropriate security properties*, it seems to be a better merkle tree in all respects. Any use of a merkle tree could be replaced with this, and it could enable use-cases where merkle trees aren't useful. Some examples of what I think might be possible:\n",
    "\n",
    "* Using a Merklist with a sublist summary tree structure enables creating a $O(1)$-sized 'Merklist Proof' that can verify the addition and subtraction of any number of elements at any single point in the list using only $O(log(N))$ time and $O(log(N))$ static space. As a bonus the proof generator and verifier can have totally different tree structures and can still communicate the proof successfully.\n",
    "* Using a Merklist summary tree you can create a consistent hash of any ordered key-value store (like a btree) that can be maintained incrementally inline with regular node updates, e.g. as part of a [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree). This could facilitate verification and sync between database replicas.\n",
    "* The sublist summary tree structure can be as dense or sparse as desired. You could summarize down to pairs of elements akin to a merkle tree, but you could also summarize a compressed sublist of hundreds or even millions of elements with a single hash. Of course, calculating or verifying a proof of changes to the middle of that sublist would require rehashing the whole sublist, but this turns it from a fixed structure into a tuneable parameter.\n",
    "* If all possible elements had an easily calculatable inverse, that would enable \"subtracting\" an element by inserting its inverse in front of it. That would basically extend the group from a ring into a field, and might have interesting implications.\n",
    "    * For example you could define a cryptographically-secure rolling hash where advancing either end can be calculated in `O(1)` time and space.\n",
    "\n",
    "To be continued..."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  },
  "toc-autonumbering": false,
  "toc-showmarkdowntxt": false
 },
 "nbformat": 4,
 "nbformat_minor": 5
}