Merkle Tree

Merkle tree is a structure used in blockchain that serves to encode data more efficiently and securely. It is a collision-resistant hash function, which takes n-inputs and outputs a Merkle root hash. Collision-resistant hash function makes impossible to generate two identical outputs to two different inputs, which in theoretical terms could be understand as a following formula:
H(x)≠H(x′)
where H(x) stands for “hash generated from input x”.

It is worth to mention that generated output is assigned to the concrete input. For example H(x) will always be y, while H(x’) will always be y’. We’ll get a little ahead of ourself by explaining right now how this works, but here it goes:
  • function takes in input x,

  • it checks wheter output H(x) already exists,

    • If it exists, it returns that output,

    • If not it creates new output, associates it to this input and returns that out.

Merkle tree are also known as “binary hash trees.”

What is it used for?

Hash trees are useful for verifying data stored, handled and transferred in and between computers. They comes very helpful to ensure that received data (e.g. from other peers in a peer-to-peer network) is undamaged and unaltered.
The biggest advantage of Merkle tree usage is that a prover (who has a large set of data) can convince a verifier (who has access to those set’s Merkle root hash), that a piece of data (e.g., a single transaction) is in set, just by giving the verifier Merkle proof. That action is secure and trustworthy, not to mention easy to verify since verifier doesn’t have to download whole blockchain to confirm a single transaction or single piece of data.
Merkle proof is a small part of the Merkle tree.
To sum up, Merkle proof makes possible to verify that particular piece of data is a part of hash tree and/or wasn’t in any way altered.

How does it work?

Let’s start from explaining a graphic below:

../../_images/merkle_tree_scheme.jpg
This Merkle tree scheme depicts the main work scheme within Merkle tree. As you can see there are three main parts: leaves, branches and root.
Leaves are hashes of blocks of data. That’s the first level of hashing. Next, every two hashes are concatenated and hashed again into one hash till there are only two of them left. This level of hashing is called branches and it involves every level where concatenation takes part. The last level is root where the last two hashes are one last time concatenated and hashed, which creates in the end Merkle root hash.
If number of blocks on any level of the Merkle tree would be odd, the left out hash is duplicated and hashed with itself.

Merkle tree takes arbitrarily-sized input and then outputs fixed-size hashes. It works like a “compressing”, which is nice since verification should be fast, secure and efficient and that could be reached thanks to small size of the data.

To simplify this, we can say that Merkle root hash is a concatenated hash value of all block hashes:
../../_images/merkle_tree_scheme_simplify.jpg

Note

Notice that instead of keeping n hashes, you could just keep one root hash created from all previous ones. This is very efficient and saves some space - that’s why we can compare it to a compression.