In a nutshell, a hash algorithm is a mathematical function that transforms any input into a fixed size output. To be cryptographically secure — and usable in blockchain technology — the hash function needs to be collision resistant, which means that it is difficult to find two inputs that produce the same output. To accomplish this, a hash function needs to have the following properties:

One-way: It is possible to go from input to output in a hash function but not vice versa. This makes it impossible to reverse engineer a collision from the desired hash output Large output space: The only way to find a hash collision is via a brute-force search. According to the Pigeonhole Principle, this requires checking as many inputs as the hash function has possible outputs. This number should be large enough to make a brute force search infeasible Non-locality: In a non-local hash function, similar inputs produce very dissimilar outputs. This is important to protect against hill-climbing attacks

A hash algorithm is considered secure until it is possible to find a collision for it. Once this has occurred, it is officially deprecated, like MD5 and SHA-1.

Uses of hash functions in blockchain

Hash functions are commonly used to protect the integrity of data. Given a trusted hash of the data, it is possible to calculate the hash of the data and compare the two values. If they match, then the data has likely not been modified since the original hash was created. The blockchain’s digital ledger is designed to store valuable information that can benefit an attacker if modified in their favor. Additionally, this ledger is stored and transmitted by a network of mutually distrusting nodes. As a result, the blockchain has a number of different uses for hash functions and the integrity protection that they provide. Some of the most common uses of hash function in blockchain include:

Digital signatures: Hash functions are a vital part of digital signature algorithms, summarizing the data to a compact value while preserving its integrity. Digital signatures are used for preserving data integrity and authentication for blockchain transactions and blocks Merkle trees: Merkle tree summarizes a list of transactions contained within a block into a single value within a block header. They use hash functions to ensure that it is infeasible to find two Merkle trees with the same root hash. This way, by storing the root hash within a block header and protecting the integrity of the block header, the integrity of the transactions contained within the block’s body is protected as well Proof of Work consensus: The Proof of Work consensus algorithm defines a valid block as one whose header has a hash value less than a given threshold. Hash function collision resistance is essential here since it ensures that finding a valid block is difficult The blockchain’s “chains”: Each block in the blockchain contains the hash of the previous block header within its header. This ensures that changing a single block in the blockchain without detection is impossible. Modifying one block in the chain requires generating new versions of every following block as well, increasing the difficulty and helping to preserve the integrity of the blockchain’s digital ledger

Hash function security for blockchain

Hash functions are vital to protecting the immutability of the digital ledger. If the hash function used by a blockchain is broken, then an attacker could find collisions for crucial hash values (such as the block’s chains or a Merkle tree’s values). This would enable malicious nodes to more easily rewrite the history of the blockchain network and cause a blockchain system to fall apart. For this reason, the security of hash functions is essential to the security of the blockchain. The security of a hash function can be threatened in a few different ways:

Algorithm strength: Hash functions are designed to be collision-resistant, but cryptographic algorithms are occasionally broken. If a flaw is discovered in a hash function, it may make it possible to effectively find hash collisions Hash output length: Hash functions are designed to make the best way to find a collision a brute-force search, with the search space equal in size to the space of possible outputs of the hash function. If such a space becomes effectively searchable — due to the use of a hash function with too short of an output length — then the hash function is no longer collision resistant and is vulnerable to attack Quantum computing: Grover’s Algorithm is an algorithm designed for quantum computers that reduces the space that an attacker needs to search to find a hash collision. Once quantum computers become a reality, this may enable collisions to be found for hash functions that were still secure against attacks by classical computers

Conclusion

Hash functions are widely used in blockchain to protect the integrity and immutability of data stored on the distributed ledger. Since the ledger is stored in a decentralized fashion, with each node maintaining its own copy, ledger immutability is critical. Otherwise, nodes could change their versions of the ledger to their own benefit, breaking the network’s consensus. This makes hash function security vital to blockchain security. If a blockchain’s hash function becomes vulnerable to attack, then the security of the entire system is broken

Sources

Mathematics | The Pigeonhole Principle, GeeksforGeeks Introduction to Hill Climbing | Artificial Intelligence, GeeksforGeeks Daan Sprenkels, “Grover’s algorithm“