Merkle Trees
Ensuring blocks haven't been tampered with
In a bitcoin block, there may be many thousands of transactions. If they were not stored in any particularly clever way, the block itself would always need to store all the information about these transactions. "Full nodes" do this - they store everything. However, for most people this is not sufficient.
For most use cases, you do not need to download the full list of transactions. However, you want to be able to verify it later, if needed. Have a look at our lesson on verifying blocks for more information. So instead of storing everything, you have the option of just downloading the header information. In this header information, all of the transactions are represented by a single hash value. This is the merkle root and is derived from a merkle tree that stores all of the transactions in a bitcoin block.
Let's have a look at one in action. First, navigate to this block's information. The merkle root is 7252b7…781be. The block is made from a whole bunch of transactions, such as this one. If you look at the Input Scripts section of that page, you can see the raw transaction information (this is what is hashed). This is a SHA256 hash, but how is it derived?
Let's consider a simple method of storing the transactions, as a simple list. We take each of the transactions, put them in a list, hash the list, and this gives a merkle root. This alone provides a lot of benefits. If someone changes just one of the values in this list (including simply reording the transactions), the overall hash would change. The resulting hash wouldn't match the one published on the blockchain, and everyone will know something has changed. People can then check each transaction independently to see which transaction has been modified. However, this means downloading all of the transactions to do a verification.
To get around this issue, a merkle tree is used. The procedure is as follows:
- The transactions are ordered. The order is up to the miner, who will usually just sort by the highest fees and include those transactions.
- Every transaction is hashed using the SHA256 function
- The transactions are paired, with transaction 1 paired with 2. Tx 3 with tx 4. And so on…
- A pair is then hashed, by concatenating the hashes and then hashing the result. The order of the hashed pairs is maintained.
- Each of those hashes is then paired, concatenated and hashed.
- Step 5 is repeated until only one hash remains, and this is the merkle root.
The full set of transactions, hashed pairs and so on is the merkle tree.
Benefits
So why bother doing this? We already have looked at a list, which provides the benefit of knowing when things are tampered with. If any transaction is altered, the merkle root changes. Therefore a merkle tree provides the same protection as a simple list-of-transactions. It is at least as good.
Next, if the block has been tampered with, the list-of-hashes approach requires that we download every transaction to verify it. With a merkle tree, we can just download the first level down (i.e. the two hashes that combined to create the merkle root). If the merkle root is different to the published one, then one of these hashes will also be different. We then only need to verify that half of the merkle tree. Further, we can download that hashes "parents" to find out which of those is invalid, further reducing our search space. In effect, we keep looking down the tree, only looking at branches that have invalid values.
With computers, it normally takes longer to download a hash, then it takes to compute a hash locally. This means that we can reduce the amount of network traffic required to find an altered transaction by using a merkle tree.
In summary, merkle trees provide a way for us to verify that the transactions included in a block have not been modified. Those with lower-powered computers can simply download the headers for a block, rather than the full list of transactions. They then know that a block's hasn't been tampered with later on, because they have a record of the merkle root and can verify that. Merkle trees are also faster to use to find any modified data.