Merkle trees are a data structure created with the aim of facilitating the verification of large amounts of organized data by relating them using various cryptographic and information management techniques.
DI enter any block of the Bitcoin network we find a structure called "Merkle Tree".
A Merkle tree is a data structure divided into several layers whose purpose is to relate each node with a single root associated with them. To achieve this, each node must be identified with a unique identifier (hash). These initial nodes, called child (leaf) nodes, are then associated with a higher node called the parent (branch) node. The parent node will have a unique identifier resulting from the hash of its child nodes. This structure is repeated until the root node or Merkle root (Merkle Root), whose imprint is associated with all the nodes of the tree.
Thanks to this unique structure, Merkle trees allow a large amount of data to be related at a single point (Merkle Root). In this way, the verification and validation of this data can become very efficient, having to only verify the Merkle Root instead of the entire structure.
This design was created by Ralph Merkle, in the year 1979, in order to streamline the verification process for large amounts of data.
How does a Merkle tree work?
A Merkle tree is a structure that relates all transactions and groups them between pairs to obtain a Root hash o "root address ”. This root hash is related to all the hashes of the tree. Check all Transactions of a network would be extremely slow and inefficient. For this reason, this system was implemented. Since, if a hash is changed, all the others would change until reaching the root (root hash). This will invalidate the authenticity of the information for the entire tree. It is precisely this function that allows Merkels trees to provide the high level of safety that characterizes them.
To understand how a Merkle tree works in more depth, examine the following example:
Imagine a data block which has a unique and unrepeatable imprint or hash. Each of these blocks is organized in layers in what we would see as a pyramid structure. These blocks are linked to a higher layer by means of these hashes. In this way, the upper blocks always point to the lower blocks, but more importantly, the hash of these upper blocks is the result of the sum of the information contained in the new block with the hash of the previous block. That way, as you continue to scale, the same structure repeats as all blocks are connected to one large block of data.
The fact that it works in this way, makes altering the hash of a block, invalidating the hashes of the rest of the blocks. In this way the system facilitates two things. First, it makes it easier to verify the data blocks. Second, it serves as a mechanism to prevent tampering. This is thanks to the fact that this mechanism allows detecting hashes changes in each data block. If a change is detected, the entire tree is invalidated because it has been altered and its data is not valid.
Characteristics of Merkle trees
Some of the most outstanding characteristics of Merkle trees are:
- They are an efficient means of generating a distributed data structure.
- They provide great security and resistance to data alterations.
- They enable a high level of data transmission performance on distributed networks. Thanks to this, they reduce the amount of data necessary for its correct operation.
- They are computationally inexpensive and efficient in creating, processing, and verifying information.
- They allow “dissection” to make verification checks faster. All this, without compromising the security and traceability of the transactions carried out.
- Thanks to the “dissection” feature, they are also able to save storage resources.
- They offer great adaptability to different computer problems. Thanks to this, Merkle trees have been widely used in different systems. For example, database software, file systems, public key structures, versioning systems, distributed networks (P2P), among others.
Uses today
Merkle trees have a wide number of uses in computer systems today, and we'll talk about some of them here.
Blockchain technology
The use of Merkle trees in technology blockchain it is vital. Through its use, the client software can download the entire history of the blockchain and verify it on the fly. In fact, its use facilitates the process by allowing «prune» (take only part of the history) the history and reduce the download size.
For example, a user who wants to install a Bitcoin client does not have to download all the history of the blockchain. Instead, you can reduce your download to just a few hundred or thousands of blocks behind. This way, you have access to a lighter version of the history that better meets your requirements.
Contrary to what you may think, this does not reduce customer security. Well, thanks to the Merkle tree, it is possible to download a specific "root hash" and from there start creating a history. Since that root hash is related to the blocks before it, all you have to do is verify it. To do this, you can go to a series of Bitcoin full nodes (with all the history) and verify that the "root hash" taken matches. Having absolute consensus on this point, the "root hash" is given as valid. And from that point, the user can perfectly use their new Bitcoin client node.
File systems
Another utility that we can see from Merkle trees is reflected in file systems. A file system is a data structure that an operating system uses to keep track of the files it stores. Normally this structure is applied on a hard disk and even inside the memory cards that our smartphones use.
Some of these peculiar creations make use of Merkle trees in order to manage and guarantee the correct use of the stored data. Special cases of mention in this group are file systems ZFS y Btrfs.
ZFS known as Zettabyte File System, is considered in the computer world as the best filing system. His abilities far exceed his best-known opponents as NTFS, FAT:, exFAT o ext3/4. It has resistance to failures, recovery and correction of errors, de-duplication of data, replication, copy-on-write (CoW) and high scalability. All this makes it a perfect option to be deployed in critical environments. Designed by Sun Microsystems and introduced in 2004, it is currently the leader in computer file systems.
But to achieve this ZFS has a secret tool in the middle of all its code: the extensive and intensive use of Merkle trees. All this in order to have a system that allows rapid verification of the data stored within its structures. Its error recovery, de-duplication, CoW (Copy-On-Write) and replication properties strongly depend on this technique, otherwise it would entail a high computational cost that would undermine its advantages.
Version control systems
Another use of Merkle trees can be seen in software versioning systems, well-known cases of Go y mercurial. Of both, the most widely known in Git, the same that enables the operation of platforms such as Gitlab y Github, or the development of the Linux kernel, the latter was the first project to use it in production, since the creator of Git is Linus Torvalds.
The use of Merkle trees in this type of software is related to the change tracking system within the repository or workspace where the files are stored. In this way, each change made (new blocks of data) goes through a hashing process that, when going through all the content of the repository, generates a unique hash of said workspace, which is called commit.
This marking system allows us, for example, to go to a specific commit to see what the status of the project's code was at a certain time, while the general status of the project remains intact. Seeing it in a simpler way, Merkle trees in Git allow us to create a kind of time machine, which allows us to navigate through the different commits of the project, keeping the files and their changes as they were at that particular moment.
A utility that is very useful when designing software in high traffic and distributed work environments, as Linus Torvalds has not demonstrated in his great Linux kernel project.