The legal and practical implications of recent attacks on 128–bit cryptographic hash functions
First Monday

The legal and practical implications of recent attacks on 128-bit cryptographic hash functions by Praveen Gauravaram, Adrian McCullagh and Ed Dawson

This paper discusses the legal and practical implications of attacks, presented at Crypto ’2004, against various 128–bit hash functions and in particular MD5 due to its wide usage. These attacks are significant because a number of important applications depend on MD5. It is argued in this paper that the MD–x style of hash function designs for various applications can be a single point of failure. New hash function design schemes with some strict security properties should be developed in order to avoid new attacks in the future.


1. Introduction
2. Operation of hash functions
3. Details of the attacks
4. Impact of collisions in MD5 on various applications
5. Need for new hash function designs
6. Conclusion




1. Introduction

Cryptographic hash functions serve an essential role within a wide range of information security applications. These security applications provide certain security services such as data integrity, authentication and non–repudiation. These applications also include non–exhaustively, digital signature generation and verification, session key establishment in key agreement protocols, management of password schemes and commitment schemes in cryptographic protocols such as electronic auctions.

Hash functions, also called message–digest algorithms, compress an arbitrary finite m-bit input message x, without the provision of any secret parameter, into a fixed n-bit output message y. y is called the hash value or message digest. A hash function can be expressed as h(x) = y where h is a publicly known hash function and the computation of y from x must be easy. Hash functions are well discussed in the classic Handbook of Applied Cryptography [31] and in the doctoral dissertation of Bart Preneel [36].

The principal security properties of hash functions are:

  • pre–image resistance: A hash function h is said to be pre–image resistant if for a given hash value y, it is “computationally infeasible” or “hard” to find an input message x with only y. In other words, it should be hard to invert the hash function.
  • 2nd pre–image resistance: A hash function h is said to be 2nd pre–image resistant if for a given message x and its corresponding hash value y, it is hard to find another input message x' such that h(x') = h(x) = y.
  • collision resistance: A hash function h is said to be collision resistant if it is hard to find any two input messages x and x' such that h(x) = h(x').
  • Near-collision resistance: A hash function h is said to be near collision resistant if it is hard to find any two inputs x and x' such that the difference between h(x) and h (x') is small. This property serves as a certificational property for hash functions [31].

A hash algorithm that satisfies the first two properties mentioned above is said to be a “one–way” hash function (OWHF); whereas a hash algorithm that satisfies the first three properties is said to be a “collision resistant” hash function (CRHF) [36] (In reality due to some technical reasons pre–image resistance is not a compulsory requirement for a hash function to be classified as CRHF [31]). The practical security of any hash function lies in its output bit size to prevent Yuval’s Birthday attack to find collisions [42]. Collisions in hash functions are easier to find than finding pre–images or pre–images. It requires computations to find a collision for an n–bit hash function according to the Birthday attack; whereas it requires effort to find either pre–images or pre–images by the brute–force technique [31].

The attacks on the 128–bit hash functions MD4, MD5, RIPEMD and HAVAL–128 presented at Crypto ’2004 [41] have established that it is no longer secure to use any of these four hash functions for various information processing applications where the collision resistance property is relevant. The attacks clearly show that these hash functions can no longer be considered as CRHFs. The security properties expected from the hash functions vary as per the application in which they are used as discussed in Section 4 (Impact of collisions in MD5 on various applications).

The rest of the paper is organised as follows: Section 2 ( Operation of hash functions) gives a brief working procedure of hash functions, Section 3 (Details of the attacks) explains the attacks on these algorithms and their outcomes, Section 4 discusses the significance of these attacks on some important applications, Section 5 (Need for new hash function designs) emphasises the necessity for new hash function designs and finally Section 6 provides some concluding remarks.



2. Operation of hash functions

Hash functions that belong to the MD–x family are based on the Merkle/Damgard style of iterative structure [ 13, 32] as shown in Figure 1. An arbitrary finite size input message, which has to be hashed, is split into blocks of equal size depending on the algorithm used, as detailed in Table 1. A predefined reversible padding procedure is employed on the last block to make its size equal to that of remaining blocks. This is necessary for certain security reasons [36]. Blocks are processed one by one by the repeated application of a compression function as shown in Figure 1 and the hash value y is obtained after processing all of the blocks comprising the messages including the last padded block. Hash Functions use a fixed constant at the start of their operation which is called the Initial Value (IV). The bit–length of the IV will equal the bit–length of the hash output and the IV becomes a chaining variable as the IV’s value is updated with each repeated iteration of the compression function as applied to the message blocks. This chaining variable at the end of the process becomes the message digest or hash value y. Table 1 shows various hash functions that are based on Merkle/Damgard iterative structure with their block sizes, number of compression function iterations and output hash value sizes in bit–length.


Merkle/Damagard iterative structure for hash functions

Figure 1: Merkle/Damagard iterative structure for hash functions.




Details of the attacks

The attacks on MD4, MD5, RIPEMD and HAVAL-128 presented at Crypto ’2004 by Xiaoyun Wang et al show that these hash algorithms can no longer be classified as CRHFs, as collisions can be found for these algorithms with less computational complexity than required by the Birthday attack technique. The results of the collisions on MD4, MD5, RIPEMD and HAVAL-128 are available at [41]. MD5 due to its wide usage will primarily be focused upon in the remainder of this paper. MD5 is widely used in various applications, such as, the generation and verification of digital signatures, software integrity assurance of products such as open source Apache Web server and Sun Microsystems’ Solaris Fingerprint Database and other cryptographic protocols.


Table 1: Various cryptographic hash functions.

Hash function
block size (bits)
CF iterations
Hash value (bit–length)
96, 128, 160
128, 160, 190, 224, 256


The recently discovered weaknesses in MD5 make its future use in various applications (especially in applications where digital signatures are created) questionable and the National Institute of Standards and Technology (NIST) plans [34] to phase out SHA–1 (weakened variant of SHA–1 was also broken [16]) by 2010 and recommends that SHA–1 and algorithms of similar strengths (160–bit hash functions) should be replaced with more robust algorithms such as SHA–224, SHA–256, SHA–384 and SHA–512 available in Federal Information Processing Standard (FIPS) 180–2 approved by the NIST [8].

The attack on MD5 works on two 1024–bit message blocks. An overview of the technique employed in finding the collisions on MD5 [41] is set out below, but this information may change, as the complete analytical details of the attack have not yet been released publicly by Xiaoyun Wang, et al.

The input messages are always longer than just one block (512 bits). Given the same input 128–bit chaining value, the idea is to find two very similar outputs for two messages M and M', such that the next blocks “correct” the difference in the hash outputs of M and M' and result in a collision. Thus, there are always two (or more) blocks of input. Hence, given an instant equation with a fixed differential , the technique involves finding two second block messages and with another fixed differential , whose equation is such that and “correct” the difference in the intermediate chaining value established by M and M'. Here, the term “differential” refers to difference in the two input messages and the term “difference” refers to difference of any two chaining values (for example difference between h1 and h1' as shown in Figure 2). This attack technique can find collisions for any given 128–bit IV but the same inputs do not produce collisions for any IV. The differential does not always produce a collision. A collision will depend on the starting value of the chaining variable and the contents of the message itself. In short, the attack uses some relatively efficient method for finding messages given a chaining value that can yield a specific differential [38] which is not disclosed in [41]. Trying random messages does not yield collisions effectively. This attack technique is depicted in Figure 2.

The attack also demonstrates that there are some messages that belong to the class maintaining differential that are not pre–image resistant. This concept was not completely analyzed in [41]. For a given instant equation and a given second block message , there is an effcient technique, (that is not disclosed in [41]), which appears to be better than a brute force attack and can be used to find another message such that results in a collision. If this is the case, it shows that MD5 is not pre–image resistant.

The attacks on MD4, RIPEMD and HAVAL–128 work on one block messages. The technique involves finding a differential which when applied to a certain message causes a collision; though this technique is not revealed in [41]. This is given by an instant equation . The value of is diferent for different hash functions but always the same for the same hash function. These attacks establish that these hash functions will not satisfy the collision resistance property and pre–image resistance required for hash functions. Whether the attacks on MD4, RIPEMD and HAVAL–128 can be extended to more than one block is still under investigation.

One of the possible reasons for these attacks could be that they are all based on the design principles of MD4. The compression function of all the above hash functions operate as unbalanced Feistel networks in a non–linear feedback shift register mode [39, 19].


Cryptanalysis of MD5

Figure 2: Cryptanalysis of MD5.




4. Impact of collisions in MD5 on various applications

In this Section, we focus on the impact of collisions in hash functions, particularly MD5 considering its wide usage, on some important e–commerce applications. This Section will discuss, the affect of the attack on the digital signature applications where the application requires a CRHF, such as MD5 for signing purposes. Other the applications will also be discussed, where there is no immediate threat in the wake of these recent attacks.

Digital signatures

Digital signatures are used to authenticate the signers of electronic messages. Digital signatures are substantially dependent upon the document, that is being digitally signed, as well as the private key and the algorithms used to affix the digital signature. If the document changes in any way then the digital signature will also substantially change. Consequently, a digital signature is the resultant of encrypting the hash of the document that is being digitally signed, using a private key held by the person causing the digital signature to be created. In general, digital signatures are constructed using hash functions along with Public Key Technology (PKT). A historical background on PKT is given in [15]. Using PKT, the signer encrypts the “message digest” of the relevant electronic message/document using a public key algorithm such as RSA and the private key that only the signer of the message has access in order to create the digital signature. Anyone, with the public key that corresponds with the private key used to affix the digital signature, can verify the signature of the signer. The basic premise behind PKT and its ability to provide authentication and non–repudiation services is that the private key remains private and has not been compromised by its disclosure to third parties.

Consider the following scenario between two parties Alice and Bob, wherein Alice wishes to send a digitally signed message to Bob.

The following steps would be performed by Alice.

  1. Alice hashes the message x that she wishes to send to Bob using MD5. MD5(x) is the value of the message digest. Let MD5(x) be h.
  2. Alice then encrypts h using her private key using some public key technology such as RSA to compute the digital signature . This is expressed as follows.
  3. Alice sends message x and the signature on x, , to Bob.

The following steps would be performed by Bob after receiving x and to verify the digital signature of Alice.

  1. Bob hashes the message x that he received from Alice using MD5. MD5(x) is the value of the message digest. Let MD5(x) be h'. Note this is the same initial step performed by Alice above.
  2. Bob decrypts the signature using the public key of Alice to get the message digest h''. This is expressed as follows.
  3. Bob compares the digests obtained in steps 1 and 2 and if they are equal then the integrity of the message is established and provided the private key remains private, the authenticity of the person attributed as signing the message is also established. This is sometimes knowns as the non–repudiation property.

Due to the Xioayun attacks it is now possible for both the signer and the verifier of a digitally signed message that relies upon the MD5 hash algorithm to cheat each other and thus obviate the non–repudiation property that has been continually argued by various researchers as being an essential property of digital signature technology. This will be shown in the following two scenarios.

Scenario 1: The collision attack on the MD5 algorithm described in Section 3 can easily be used by Alice to cheat Bob. Let the innocuous message x, that Alice wants to sign, be something like “I, Alice, am selling my property of five acres to Bob for a price of $123,456.” Assume that Alice can come up with an alternate message x' that gives the same digest as x. Let x' be “I, Alice, am selling my property of two acres to Bob for a price of $223,456.” Let it also be assumed that these two messages have the following characteristic that MD5(x) = MD5(x'). Alice then sends x and the signature computed on MD5(x) (noting that it would be same on MD5(x')) to Bob. Bob, believes that the message x is the legitimate message but later Alice claims x' as the legitimate message by producing the same signature on x'. Furthermore Alice has destroyed all evidence regarding the digital signing of x.

From an evidentiary perspective this raises some serious doubts as to the probity of the two messages and in particular the fact that the same digital signature can be verified for both messages. At common law the onus of proof is generally placed upon the plaintiff. That is, it is the plaintiff who has the onus of proving her case.

In this case, if Alice raises a dispute over the contents of the electronic communications that have transpired between them, she will accordingly inform Bob of the problem. If Bob commences proceedings to prove his case, he will have the onus of proving the value of the contract; but in doing so he will encounter the defense that Alice has evidence that the contract was different to that which Bob claims. The difficulty for the arbiter of fact, namely the judge in this case, is that the judge will needs to decide which message is the correct reflection of the contract. Other evidence will need to be adduced by Bob, which could cause substantial expense being placed upon Bob in proving his case. Remembering that Bob, does not have access to Alice’s private key; so Bob can not generate a new digital signature for x'. Alice would argue that she does not know how this occurred. She can also show that her public key can verify the digital signature attached to x' that is in her possession.

Alice will argue that x' is the orginial message and that Bob has somehow developed x such that MD5(x) = MD5(x'). Further, she may claim that Bob has stripped the original signature from x' and attached it to x. Bob will have other evidence such as time of receipt of x that has the digital signature of x, but this kind of evidence can be easily spoofed or altered without leaving a trace by the recipient of an electronic communication. Such a case could be decided solely upon the oral evidence of the parties and not upon the technology that underpins the case. Of course a court in this case would rightly make some substantial disparaging remarks about the technology and its lack of the non-repudiation and authentication properties.

Scenario 2: The collision attack on MD5 described in Section 3 can easily be used by Bob to cheat Alice. Let the message x that Alice signs be something like “I, Alice, am selling my property of five acres to Bob for a price of $223,456.” Bob, the verifier of the message x, upon receiving x and the signature affixed to it by Alice with her private key, can come up with a similar message x' something like “I, Alice, am selling my property of 10 acres to Bob for a price of $123,456” producing the same message digest as x. Bob, then strips the signature attached to message x and attaches it to message x'. Bob, later claims that the message sent by Alice is x' not x.

In this case, if Bob raises a dispute over the contents of the electronic communications that have transpired between them, he will accordingly inform Alice of the problem. If Alice commences proceedings to prove her case, she will have the onus of proving the value of the contract; but in doing so she will encounter the defense that Bob has evidence that the contract was different to that which Alice claims. Further, Bob does not have access the private key of Alice as it is known only to her and hence the signature on x' could only have been signed by Alice. That is, since Bob is able to verify the digital signature affixed to x' using Alice’s public key and Bob has no access to Alice’s private key then Bob will argue that Alice is trying to defraud him of the contract value.

Bob will force Alice to admit that her private key has not been compromised, which means that theoretically she is the only person who had access to the relevant private key and was the only person capable of activating its use to digitally sign x'. Since Bob had no access to Alice’s private key, and it is Alice’s public key that is used to verify the digital signature affixed to the electronic communication that is being relied upon by Bob, it will be very difficult for Alice to discredit this evidence in the court proceedings, even though the original message x has been substituted by Bob for x'.

The collision of y for both x and x' is a substantial flaw in digital signature technology, where MD5 is being used as the hash algorithm. Such a flaw will completely undermine the concept of non–repudiation regarding forged digital signatures. The concept of non–repudiation has been a principal attribute promoted by a number digital signature technology providers.

In Scenario 2 a court, at a minimum, would come to the conclusion that there is some uncertainty regarding the validity of the two messages x and x' and at a maximum the electronic communication in the possession of Alice that has Alice’s digital signature affixed, has been altered by Alice, which could give rise to an allegation of giving false evidence to tampering with evidence.

It can be seen that these attacks on MD5 greatly undermine the evidential value of digital signature technology where MD5 or any of the other mentioned hash algorithms are used for digital signature purposes. The collision attack on MD5 discussed in Section 3 can be used to construct ASCII message sequences and for the given equation , which can result in the same hash value.

There are no legal cases where digital signatures have been specifically disputed, though it is generally accepted that a digital signature will not be non–repudiatable because there are many legal reasons where a party may be able to successfully repudiate a digital signature attributed to them, such as unconscionable conduct or undue influence or duress [29]. What has been generally taken as the base position, is that digital signature technology will greatly reduce the incidence of forgeries, but since the successful attacks by Xiaoyun, et al. even the issue of forgeries has now become an issue, which undermines the concept of non–repudiation even further. When the underlying hash function technology is weak, it could result in the compromise of the non–repudiation security property.

A further effect of the undermining of the non–repudiation property is the long term archiving of digitally signed documents. It is not unusual for a dispute to take a substantial amount of time to elapse before it will be heard by a judge. During this time both parties have to ensure that the evidence they have in their possession does not become tainted and maintains its integrity. In Australia section 11(3) of the Electronic Transactions Act 1999 (Cth) [“ETA”] provides that an electronic document can be endorsed by a third party for the purposes of integrity. If the PKT used to affix a digital signature is undermined due to some technological advancement, it is not correct to get the parties to resign the document as this would alter the document in a substantial manner. It may also be impractical for this to occur as one of the parties may not be available or even died in the meantime. The better approach is to get a trusted third party to endorse the document by affixing another digital signature which uses a more robust PKT. This can be undertaken multiple times. The orginal document with its original digital signature or digital signatures is maintained and thus the integrity of the document is preserved. The trusted third party will need to be noted as an endorer so as not to be confused as an original signatory. When the case is finally heard by a Judge, each endorsement digital signature will be verfied until such time as the orginal signatures can be verified. For long term archiving pursoses this appraoch is recommended. The Uniform Electronic Transactions Act that has been adopted by a substantial number of U.S. states does not specifically recognise the endorsement mechanism as provided in the Australian Electronic Transactions Act but EUTA does make provision for security procedures which are to be used to maintain the integrity of digitally signed documents and therefore the procedure would be similar as noted above.

The issue of obviating the non–repudiation property has an even more catastrophic affect when digital certificates are the documents that are being digitally signed. One of the difficulties in using PKT is the deployment of the corresponding public keys that are used to verify digitally signed documents. Remembering that it is the public key that corresponds to the private key that is used to verify the digital signature. To distribute public keys, digital certificates were created, which are currently based upon the ISO standard X.509 v.3 [22].

A X. 509 v.3 certificate is itself a digital document that will have embodied in it an entities public key as well as other information such as the hash algorithm and the public key algorithm that was used to digitally sign the document. A X. 509 v.3 certificate will be digitally signed by the issuing authority which is generally known as a certification authority (CA) or trusted third party (TTP).

An entity in need of a digital certificate submits to the CA their identification details that evidences their identity, the public key and other required information. The CA issues the certificate once the identity of the entity is sufficiently confirmed with the details provided. The certificate, in general, contains certificate version number, a unique certificate serial number, signature algorithms used by the holder of the corresponding private key such as RSA with MD5, or ElGamal with SHA–1, the issuer name (that is the name of the CA that issues the certificate), certificate validity period, subject of the certificate (the entity for which the certificate is issued), public key of the subject, some extension fields and the signature of the CA. Digital certificates are well discussed in [23].

Due to the attacks on MD5, it is now possible for a third party to alter the contents of a certificate in a restricted sense to some other information without altering the digital signature of the CA, that is attached to the certificate. The restricted sense is that the attacker will need to identify an x' that corresponds to x where h(x)= h(x'). It may be that no such x' can be identified for all certificates but it may exist for a subset of all possible certificates. It will not be possible for the attacker to resign the certificate as this would require the attacker to have access to the private key of the CA. A fraudulent entity might be able to come up with a duplicate certificate that had a corresponding hash collision with the legitimate certificate that the entity gets from the CA and then transfers the CA’s signature on the true certificate on to a fake certificate.

This would have catastrophic implications for identity management as this attack could result in an increased incidence of identity fraud in non–face–to–face transactions where a merchant or customer is relying upon the X.509 v.3 certificate as the basis of identifying the other person to the transaction. From an evidential position this could result in an innocent party being held liable for a transaction to which they did not in reality participate.

An attack scenario could be as follows. When a customer’s Web browser connects to the server operated by a bank that has Internet banking operations, the bank’s server sends the digital certificate signed by the CA that issued the certificate. The certificate will contain the bank’s public key, which the customer’s machine can use to establish the secure sessions with the bank’s server. A malicious attacker can generate two certificate requests as follows. “Digital certificate request for and here are my personal details” and “Digital certificate request for and here are my personal details” that contain the same public key and producing the same MD5 message digest. But the attacker sends the initial request to get the digital certificate signed by the CA and later inserts the signature on to the fake second message thereby making a perfect forgery if the signature scheme used is malleable. Any browser can easily trust the’s digital certificate as the genuine certificate and masquerades as a bank thus sneaking the personal details of a customer. This attack is similar to the one described in Section 4 where the non–repudiation property of security is violated. This type of attack could be used to great effect in the incidence of the e–mail phishing scam that has become so prevalent recently.

The practical possibility of this attack is still in doubt. The reason is that when the CA signs a certificate, CA specifies a unique serial number in the certificate. It depends on how much control the attacker has on the serial number field. However, it is recommended that CAs move away from using MD5 for signing purposes while issuing digital certificates. The attacks on hash functions discussed in Section 3 cannot be used to tamper the existing certificates, for example Secure Socket Layer (SSL) Web server certificates, as it needs an attack on the pre–image resistance property of MD5 [7] and so far the best known attack to violate this property is the brute–force attack which requires 2128 practical computations of the MD5 algorithm and hence infeasible.

Internet security protocols such as SSL [18] and Internet Protocol Security (IPSec) could be affected by the attack on MD5 described in Section 3 if their digital certificates use the MD5 algorithm. But these protocols are designed in such a way that MD5 can be replaced with the SHA–1 configuration, if it is essential [6]. MD5 is also used in the actual protocols like SSL version 3, Transport Layer Security (TLS) [14] and IPSec along with HMAC for the key establishment and other purposes. This combination would not affect the security of any of these protocols due to the reasons given in Section 4.

Password verification schemes

Users of computer systems have to evidence that they are who they claim to be. For example, a user accessing a database application, supplies the username and password for authentication. The MD5 algorithm is widely used to hash the password and the message digest of the password is stored in a database which achieves application level security as shown in Figure 3.

An attacker who gains access to the password message digest database cannot see the username and password combinations but can see the username and hash of the password. In some systems, a random string is attached to the password upon login to the system to make dictionary attacks less effective [31] and the random string and the hash of the password are stored in the database. The correct password of the user can be generated from the known hash value only if an attacker is able to invert the hash function — MD5. If an attacker is able to find any other input password that hashes to the known or stored hash value then she will be authenticated to the system. In this particular application, it is not concerned finding any collision, but focuses on finding collisions for the known message digest of the password. It does not matter whether an attacker can get the exact password used by the user for authentication as part of the collision. As long as attacker finds some input that hashes to the known digest that is enough for her to be authenticated to the system.

Historically, design and analysis of hash functions is done more as an art than as science.

The feasibility of the attacks described in Section 3 in finding a valid password to the known message digest is negligible. Historically, design and analysis of hash functions is done more as an art than as science. To the best of the authors’ understanding, the attacks on the four 128–bit hash functions presented in [41] are more an art rather than science. The technique of crafting the input message blocks with particular differentials does not work in the case of password schemes because this technique requires finding and with a differential for the known equation . This means that an attacker has to initially find messages to get a collision, which requires the attacker to invert the MD5 algorithm for the known message digest, that she gets from the password file. In this case, it is the pre–image resistance property of the MD5 algorithm that has to be violated to compromise the password validation scheme which requires practical computations and so far there are no known practical attacks on the MD5 algorithm that can be used to invert it. Once a password input for the known or stored digest is known, there is no point in finding collisions to the MD5 algorithm. Hence the attacks described in Section 3 do not apply to the password schemes.

It is also not possible for an attacker to eavesdrop on passwords in transit because eavesdropping attacks are prevented by encrypting the communication between the client and the server by using secure protocols such as SSL [35]. The protocols such as SSL and TLS that provide secure communications, use MD5 in the form of HMAC. The attack on MD5 described in Section 3 is not applicable on HMAC–MD5 due to the reasons discussed in Section 4. But it is recommended to use more robust hash functions (and encryption schemes) than MD5, for example, in Unix systems (modern Unix and Linux systems use the MD5 algorithm with a 256–bit character limitation) due to the weakness in their password schemes and in light of new technological advances in the computer systems [25].

Hierarchical PKI

Here we show that the collisions in MD5 do not effect the flow of the infrastructure of a hierarchical PKI. It will not be possible for an attacker to impersonate a CA within a PKI in order to issue false certificates. The hierarchical PKI model looks as shown in Figure 4. In this model, the trust relationship among the CAs is established using a single digital certificate [23]. A superior CA issues certificates to the subordinate CAs. Any compromise of a CA in any particular hierarchical path would result in the lose of services for that particular section only and would not affect any other part that is not dependent upon the affected section within the heirachical path. In order to compromise the entire hierarchy the attacker would need to succeed against the root certificate.


MD5 for password checking

Figure 3: MD5 for password checking.


A malicious attacker can create a new certificate C2 by changing the public key of any intermediate digital certificate of a subordinate CA (say CA1) keeping other entries of the certificate fixed still producing the same message digest as C1. This could be possible if the new certificate C2 is chosen in such a way that specific differential value of corresponding to the differences in the public key entries of the two certificates C1 and C2 is maintained. Then the attacker can splice the signature of the superior CA on C1 onto C2. By doing so, the attacker can masquerade as an intermediate CA. Obviously by impersonating as a true CA, the attacker cannot issue certificates as he cannot derive the corresponding private key from the public key used in his certificate.

Consequently, it appears that the collision attacks on the hash functions have no adverse effect on the working of PKI.

HMAC applications

Keyed hash functions, also called Message Authentication Codes (MACs) are keyed hash primitives that are used to protect the integrity of the message since the time the message was created, transmitted or stored by an authorized source over an unreliable medium [31]. MACs are also called cryptographic checksums or integrity check values. The sender computes the MAC value of the message using a MAC algorithm which utilises a secret key and the message as input parameters. This MAC value is appended to the message and is sent to the receiver as shown in Figure 5. The receiver upon receiving the message with the appended MAC value, separates the message from the MAC value and computes the MAC using the same shared MAC key on the message and compares the received MAC value with the sent MAC value. A match between these two values guarantees the authenticity and the integrity of the message.


Hierarchical PKI

Figure 4: Hierarchical PKI.


Traditionally block ciphers are used in Cipher Block Chaining (CBC) mode to construct MACs for authentication purposes. Such kind of MAC algorithms are called CBC–MAC algorithms. The security analysis of CBC–MACs was presented in [12, 37]. MACs can also be constructed using unkeyed cryptographic hash functions such as MD5 and SHA–1. Bellare, et al. have shown in [10, 11] such a design scheme called HMAC with a deeper security analysis. The generalization of HMAC is specified in [26, 9].


Message Authentication Code

Figure 5: Message authentication code.


The collisions on the MD5 algorithm described in Section 3 do not apply to its usage as HMAC as the properties required from MD5 are different in the HMAC context [27]. The attack on MD5 described in Section 3 works for specific value of differential which when applied to certain messages causes a collision. The trick here is that it does not always cause a collision; whether it does or not depends on the initialisation vector (IV) and the contents of the message itself. These kind of collisions based on known IVs in the underlying hash function such as MD5 are not relevant to the security of HMAC. The collisions on HMACs based on MD5 would be relevant only for a variable and secret IV [27]. So the analysis required for the hash functions used in HMACs is completely different from those in [41].

The Transport Layer Security (TLS) protocol — which is defined as a proposed Internet standard version for SSL version 3 in [14] — uses HMAC algorithm defined in [26] as a MAC function and also as a Pseudo Random Function (PRF). The HMAC used in TLS protocol employs either MD5 or SHA–1 as the underlying hash function. The MAC function defined in SSL version 3 protocol is similar to the HMAC algorithm with a minor difference [40]. None of these HMAC applications are affected by the usage of MD5 in HMAC due to the recent attacks on MD5.

Software protection

Software vendors want to protect the integrity of the software they sell to their clients [36]. Hash functions are used to check the integrity of the software that the user receives from the vendor; they ensure that users will not receive viruses or malicious programs. A user obtains software from vendors through the CDs or DVDs; as downloads from the vendor’s Web site or from vendor’s mirror Web site; or, through some other means. Some users may also download software for free from the Internet and expect the programs to be free of malicious content.

There are many free tools available on the Internet [5] that use MD5 algorithm for data integrity control and verification of network file transfer, e–mail messages and files or software downloaded from the Internet. The MD5 algorithm is used to check the integrity of a Cisco Internetworking Operating System (IOS) software image [2]. The MD5 file validation feature of the Cisco IOS uses MD5 to create a 128–bit checksum of the Cisco IOS software image on some of the Cisco released products and compares that with the MD5 checksum of the images on those releases posted on the Cisco Web site. The Apache Web server [3], the most popular server on the Internet, develops and maintains an open source Hyper Text Transfer Protocol (HTTP) server for Unix and Windows NT operating systems. It uses MD5 hashes as one of the options, the other being the Pretty Good Privacy (PGP) signatures, to ensure the verifiability of downloads from its home page and other mirror sites [4]. It uses appropriate MD5 embedded programs on the Unix and Windows distributions to achieve this. The Solaris Fingerprint Database (sfpDB), a free Sun Microsystems security tool, uses MD5 to verify the integrity of the files distributed with the Solaris Operating Environment [1]. The sfpDB ensures that its official binary distributions contain authentic files but not adapted ones that compromise system security. The sfpDB uses MD5 to compare the digest of the binary distributions with the trusted hashes stored on its homepage and hence identify any mismatches, if present.

In the wake of attack on MD5 described in Section 3, there may be more immediate threat to applications where MD5 is used as a CRHF to achieve data integrity [30]. Since the data input to these applications is at the control of the attacker, it is possible for an attacker to create identical MD5 checksums using the attack technique on MD5 for the true software content and the malicious content and replace genuine code with malicious code. Hence, when the hash function used for the integrity checking is not robust, the end user cannot identify virus infected code from true code.



Need for new hash function designs

The cryptographic hash functions listed in Table 1 have their compression functions operating like Unbalanced Feistel Networks (UFNs) in the non–linear feedback shift register (NLFSR) mode of operation [39, 19]. These designs based on MD–x family, operate in the style of block cipher in feed–forward mode. SHA–0, the main alternative to the MD5 algorithm, was issued as Federal Information Processing Standard (FIPS–180) by NIST in 1993 [17]. It was superseded by SHA–1 [28] (FIPS 180–1) which differs from SHA–0 only in the circular shift operation after the linear message expansion of the input block to address the unpublished weaknesses found in SHA–0 by the National Security Agency (NSA). Recently it was shown in [16], two near collisions for the full compression function, many full collisions for the 65–round version of SHA–0 and collisions for the 34–round version of SHA–1. Near collisions for 45–round version of SHA–1 were also shown in [16]. Antoine Joux has shown that finding multiple collisions (more than just two messages hashing to the same digest) in iterated hash functions is not much harder than finding ordinary collisions [24]. The security analysis of hash functions like SHA–256 and SHA–512 has already started [20, 21].

These advances in the cryptanalysis of hash functions of MD–x family show that having a unique style of approach in their design might be a single point of failure for cryptographic hash functions. It was recently recommended in [19] to seek alternative design paradigms for secure and efficient cryptographic hashing due to the increasing interest in all forms of hash functions.




Cryptanalysis of MD5 and other hash functions presented in [41] highly affect security in applications where the attacker has a control over the input messages that are being hashed. Considering the attacks on various 128–bit hash functions and the NIST’s recent announcement [34] of the phaseout of the stronger hash function SHA–1 (due to an attack on its reduced version) by 2010, it is highly recommended not to use either current 128–bit or 160–bit cryptographic hash functions of MD–x family in the future applications as CRHFs. It is further recommended that new approaches in the design of hash functions be developed by imposing some stringent restrictions in the fundamental security properties of these designs to thwart present and new kinds of cryptanalysis. Given this context, it would be appropriate to have a worldwide competition for a new hash function standard like the NIST’s Advanced Encryption Standard process (AES) [33] for the new block cipher standard.

The current commercial use of the MD–x family of hash algorithms has been substantially undermined as an effective trusted mechanism for digital signature implementations. Even though digital signature technology has not become as pervasive as was once expected, its use is still substantial. These attacks do call into question the trust value that has been attributed to digital signature technology when MD5 has been used. We have made an intermediate investigation of a number of Web sites and have identified that within the sites investigated a substantial number using a SSL session were also relying upon a digital certificate that used the MD5 hash algorithm for signature purposes, which must cause some concern. End of article


About the authors

Praveen Gauravaram, B.Technology (Electrical and Electronics Engineering) (Sri Venkateswara University College of Engineering, Tirupati, India), MIT (QUT, Australia) is a PhD candidate at the Information Security Institute, Queensland University of Technology (QUT). His home page is at
E–mail: p [dot] gauravaram [at] isi [dot] qut [dot] edu [dot] au

Professor Adrian McCullagh, B.App.Sc (Computing) (QIT), L.L.B.(Hons) (QIT), Ph.D. (QUT), is Professor of Telecommunications and Secure Electronic Business Law at the Information Security Institute at QUT and Special Counsel for Phillips Fox Lawyers, Australia.
E–mail: a [dot] mccullagh [at] isi [dot] qut [dot] edu [dot] au

Professor Ed Dawson, BSc (University of Washington), DipEd (University of Washington), MA (University of Sydney), MLitStu (University of Queensland), MSc (University of Queensland), PhD (Queensland University of Technology), is the research director of the Information Security Institute, Queensland University of Technology (QUT), Australia.
E–mail: e [dot] dawson [at] [dot] qut [dot] edu [dot] au



1. Vasanthan Dasan, Alex Noodergraaf, and Lou Ordorica, 2001. “The Solaris Fingerprint Database — A Security Tool for Solaris Operating Environment Files,” at, accessed 30 September 2004.

2. “MD5 file validation,” a document on MD5 file validaion available on Cisco Web site at, accessed 17 September 2004.

3. “Apache HTTP Server Project,” at, accessed 30 October 2004.

4. “Apache HTTP server source code distributions,” at, accessed 30 September 2004.

5. “Data invariability and integrity control,” at, accessed 16 September 2004.

6. “FAQ: MD5, SHA–0 and hash collisions,” Certicom’s frequently asked questions on recent attacks, 2004; at, accessed 13 September 2004.

7. Cryptography Research, 2004. “Hash collision Q&A,” at, accessed 30 September 2004.

8. National Institute of Standards and Technology (NIST), Computer Systems Laboratory, 2002. Secure hash standard. Federal Information Processing Standards Publication (FIPS) 180–2, at, accessed 15 July 2004.

9. American Bankers Association, 2000. Keyed Hash Message Authentication Code, ANSI X9.71. Washington, D.C.: ABA.

10. M. Bellare, R. Canetti, and H. Krawczyk, 1996. “Keying hash functions for message authentication,” Lecture Notes in Computer Science, volume 1109, pp. 1–15.

11. Mihir Bellare, Ran Canetti, and Hugo Krawczyk, 1996. “Message authentication using hash functions: The HMAC construction,” CryptoBytes, volume 2, number 1, pp. 12–15.

12. Mihir Bellare, Joe Kilian, and Phillip Rogaway, 1994. “The security of cipher block chaining,” Lecture Notes in Computer Science, volume 839, pp. 341–358.

13. I.B. Damgard, 1990. “A design principle for hash functions,” Lecture Notes in Computer Science, volume 435, pp. 416–427.

14. Tim Dierks and Christopher Allen, 1999. “The TLS Protocol Version 1.0, Internet Request for Comment RFC 2246,” (January), at, accessed 30 September 2004.

15. Whitfield Diffe, 1988. “The first ten years of public–key cryptography,” Proceedings of the IEEE, volume 76, pp. 560–576.

16. Rafi Chen Eli Biham, 2004. “Near–collisions of sha–0,” Cryptology ePrint Archive, Report 2004/146, at, accessed 1 July 2004.

17. National Institute of Standards and Technology (U.S.), 1993. Secure hash standard. Federal information processing standards publication (FIPS PUB) 180. Springfield, Va.: National Technical Information Service; available at, accessed 1 August 2004.

18. Alan O. Freier, Philip Kariton, and Paul C. Kocher, 1996. “The SSL Protocol, Version 3.0,” (March), at, accessed 14 August 2004.

19. Praveen Gauravaram, William Millan, and Lauren May, 2004. “CRUSH: A new cryptographic hash function using iterated halving technique,” Proceedings of the Workshop on Cryptographic Algorithms and Their Uses (Goldcoast, Australia, 4–5 July), pp. 28–39. This paper is available at

20. Henri Gilbert and Helena Handschuh, 2004. “Security analysis of SHA–256 and sisters,” Lecture Notes in Computer Science, volume 3006, pp. 175–193.

21. Philip Hawkes, Michael Paddon, and Gregory G. Rose, 2004. “On corrective patterns for the SHA–2 family,” Cryptology ePrint Archive, Report 2004/207, at, accessed 25 October 2004.

22. R. Housley, W. Ford, W. Polk, and D. Solo, 1999. “RFC 2459: Internet X.509 public key infrastructure certificate and CRL profile,” (January), at, accessed 25 October 2004.

23. Russ Housley and Tim Polk, 2001. Planning for PKI: Best practices guide for deploying public key infrastructure. New York: Wiley.

24. Antoine Joux, 2004. “Multicollisions in iterated hash functions: Application to cascaded constructions,” Lecture Notes in Computer Science, volume 3152, pp. 306–316 (Matt Franklin (editor). Advances in Cryptology — CRYPTO 2004. Proceedings of the 24th Annual International Cryptology Conference, Santa Barbara, Calif., 15–19 August 2004. New York: Springer).

25. Gershon Kedem and Yuriko Ishihara, 1999. “Brute force attack on UNIX passwords with SIMD computer,” Proceedings of the Eighth USENIX Security Symposium, pp. 93–98, and at, accessed 15 September 2004.

26. H. Krawczyk, M. Bellare, and R. Canetti, 1997. “HMAC: Keyed–hashing for message authentication,” RFC 2104, at, accessed 29 August 2004.

27. Hugo Krawczyk, 2004. “HMAC with MD5 and SHA–1” (22 August), at, accessed 30 August 2004.

28. National Institute of Standards and Technology (U.S.), Computer Systems Laboratory, 1995. Secure hash standard. Federal information processing standards publication (FIPS PUB) 180–1. Springfield, Va.: National Technical Information Service; available at, accessed 1 July 2004.

29. Adrian McCullagh and William Caelli, 2000. “Non–repudiation in the digital environment,” First Monday, volume 5, number 8 (August), at, accessed 14 September 2004.

30. Declan McCullagh, 2004. “Crypto researchers abuzz over flaws,” CNET (17 August), at, accessed 1 September 2004.

31. Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, 1997. Handbook of applied cryptography. Boca Raton, Fla.: CRC Press. See chapter on hash functions and data integrity, pp. 321–383.

32. Ralph C. Merkle, 1990. “One way hash functions and DES,” Lecture Notes in Computer Science, volume 435, pp. 428–446.

33. National Institute of Standards and Technology (NIST). “Advanced encryption standard,” The details of AES process can be found at, accessed 3 September 2004.

34. National Institute of Standards and Technology, 2004. “NIST brief comments on recent cryptanalytic attacks on secure hashing functions and the continued security provided by SHA–1,” at (25 August 2004), accessed 1 July 2004.

35. Benny Pinkas and Tomas Sander, 2002. “Securing passwords against dictionary attacks,” In: Vijay Atlury (editor). Proceedings of the Ninth ACM Conference on Computer and Communication Security (CCS ’02), pp. 161–170.

36. Bart Preneel, 1993. “Analysis and design of cryptographic hash functions,” PhD thesis. Leuven: Katholieke Universiteit.

37. Bart Preneel and Paul C. van Oorschot, 1995. “MDx–MAC and building fast MACs from hash functions,” Lecture Notes in Computer Science, volume 963, pp. 1–14.

38. Greg Rose, Personal communication. August, 2004.

39. B. Schneier and J. Kelsey, 1996. “Unbalanced Feistel networks and block cipher design,” Lecture Notes in Computer Science, volume 1039, pp. 121–144.

40. William Stallings, 2003. Cryptography and network security: Principles and practices. Third edition. Upper Saddle River, N.J.: Prentice–Hall. See especially the chapter on Web security, pp. 527–562.

41. Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu, 2004. “Collisions for hash functions MD4, MD5, HAVAL–128 and RIPEMD,” Cryptology ePrint Archive, Report 2004/199, at, accessed 30 December 2005.

42. Gideon Yuval, 1979. “How to swindle Rabin,” Cryptologia, volume 3, number 3 (July), pp. 187–189.


Editorial history

Paper received 22 November 2005; accepted 15 December 2005.

Contents Index

Copyright ©2006, First Monday

Copyright ©2006, Praveen Gauravaram, Adrian McCullagh and Ed Dawson

The legal and practical implications of recent attacks on 128–bit cryptographic hash functions by Praveen Gauravaram, Adrian McCullagh and Ed Dawson
First Monday, volume 11, number 1 (January 2006),

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2018. ISSN 1396-0466.