The Chabaud-Joux Attack Against SHA-0

Posted on June 30, 2008
Filed Under Computer Science | Leave a Comment

Look [here ] if you want to learn about cryptographic hash functions, their properties, or their applications.

History of SHA-0

In 1993, SHA-0 descended from heaven. The NSA published SHA-0 in FIPS-180, which contained the algorithm with no explanation of its security. Concerns were thrown aside as the algorithm became widely adopted. The stated collision resistance [the resistance most pertinent to the attack listed below] was the theoretical maximum of 2^80.

In 1995, the NiST released an improvement on the algorithm in FIPS 180-1 . The new algorithm (called SHA-1) had just an extra shift operation in the message expansion. Other than this detail, the new algorithm was identical to SHA-0. According to the NSA, this corrected a weakness in the message expansion.

In 1998, researchers Chabard and Joux released an elementary attack on SHA-0, described below. It exploited a fundamental weakness in the simple round structure of SHA-1, combined with an exceptionally simple message expansion. Their attack employs a search algorithm that requires only examining 2^61 messages instead of 2^80 messages to find a collision.

This was only a theoretical break in SHA-0: the collision resistance was still improbably high (though breakable through a distributed computing project), and the only collisions produced were gibberish.

More than one weakness in SHA-0 was found in the meantime! Biham and Chen discovered another route to finding collisions and near-collisions, and Chinese researchers Yang, Yu, and Yin have effectively pwnt [to use a technical term] SHA-0. In their most recent attack, they found a search space that requires examining only 2^39 different messages. This is possible on a beefy desktop computer. They have only released the attack vector and a high-level overview of how they arrived at the vector. They have not released the details for how they developed their attack in an English language journal.

The Biham and Chen attack is actually cooler than the Chibaud-Joux attack, but much more complicated. If this essay is well-received, I will write a post on that.

Description of SHA-0

SHA-0 is a block hash with an input of 512 bits. This means that any input message is broken into 512-bit blocks. The output from processing the first block is the input for processing the second block, and so on and so forth. The output of the final processing step is your hash value.

Writing the description is far less clear than just reading the pseudocode, so I urge you to read the pseudocode listed on Wikipedia or a reference implementation I wrote in Python [note: Python is not my primary language, and I wrote it so that a Math major with no coding experience could understand it, so I make no guarantees about it.]

The overall structure of SHA is as follows:

1) Message padding: Because SHA-0 operates on blocks, we need to make sure that we can evenly break the message into 512-bit blocks. The length of the message must also be appended on the end of the message. If the message + the length is not 0 (mod 512), then in between the message and the appended length, the string "100…..000" is added to make the message the right length.

2) Message Expansion: The message is then divided into 512-bit blocks. Each 512-bit block can be broken down into 16 32-bit words. However, just plugging these 16 words into the round functions (described below) would give attackers too much control over the internal state of the round function, the 16 words are expanded into 80 words using the recurrence W[i] = W[i-3] ^ W[i-8] ^ W[i-14] ^ W[i-16] for 17<i<80. Note that the first 16 words of the expansion are the 16 words from the block.

3) 80 rounds : The input of the hash function, in addition to the message, is also a 5 word magic constant. These 5 words are transformed using the results of the message expansion, and then added to the original magic word to produce the hash.

After that, the resulting 5 registers are added with the initial state of the 5 registers. For the first block, this means the result is added to the "magic number".

Two messages side-by-side

If we take two messages, m1 and m2, and let m1=m2, obviously SHA-0(m1) = SHA-0(m2). Cryptographic hash functions MUST be deterministic by definition.

The ultimate goal is to find two messages that hash to the same value: Hash(some_message) = Hash(other_message). This, by itself, is overwhelming. So Chabaud and Joux decided to cheat: They would find collisions for successively stronger versions of SHA-0 until they found a collision attack against the full SHA-0.

Local Collisions

Chabaud and Joux needed better tools to attack SHA-0, so they decided to find necessary conditions for finding collisions for just a few rounds of SHA-0. They decided to see what happens if they flip a single bit in a round, and ignore the message expansion. So they are allowed to put whatever they want into each round of the hash function by the rules of this simplified attack.

Let’s go back to identical messages m1 and m2. Let’s alter m2 a little: we will flip a single bit. If we make a change to a bit in some round r , the register A will be the only register affected in round r . In round r+1 , the registers A and B will be affected. So on and so forth, so it will take until round r+4 for the bit difference in m2 to propagate all the way to the last register, E .

Chabaud and Joux decided that they wanted to find the minimum amount of rounds necessary to createa  local collision from r . It turns out that if you have full control over the input to the hash function, you can create a local collision at round r+5 : The hash at round r+5 of m1 and m2 is the same even though m1 and m2 are different functions. This is called a 6-round local collision (Rounds: r, r+1, r+2, r+3, r+4, r+5 ).

How?

Let’s look at round r+1 , the round after the first bit-change occurred. The changes that happen to the A register are defined as:

A = (A <<< 5) + f
(B, C, D) + E + round_constant + input[r+1
]

where f is one of the 4 round functions defined in SHA-0. We have full control over the term "input[r+1]".

and the changes that happen to the B register are:

B = A

We can’t correct the difference in B that propagates from A (because we have no additive term in this equation), but we CAN correct the difference that occurs in A. After all, we know what bit we changed, so getting A to match up in m1 and m2 is easy: we just have to do the math.

Chabaud and Joux define specific masks for r+1, r+2, r+3, r+4, and r+5 , and voila! A local collision.

Why does this work?

This works because of a special fact about the A register: it is the only register that is directly modified by the message, AND it is the only register whose value is produced by the values of other registers. Since the other registers aren’t sharing bit differences, we only have to fight the changes in the A register in order to succeed.

Problems!

If it were really this easy to find a local collision, SHA-0 would have been completely unusable. Fortunately for its strength, there are a few complicating factors:

Message expansion : The message expansion produces an additional constraint: this produces the need to chain together a bunch of local collisions inside of the 80 rounds in order to come up with one full collision. The constraints of the message expansion need to be satisfied by your chosen starting points for all of the local collisions (as your bit differences that you introduce will be scattered by the message expansion in the exact same way).

Round-specific functions : To make matters worse, there are round-specific functions that change every 20 rounds. Each of the functions are functions of the registers B , C , and D , and can therefore be viewed as functions of past values of A . Two of the functions are non-linear : they can’t be modeled by an XOR, and they have their own specific constraints for when a local collision can be produced. The round functions prevent the local collision from being automatic: depending on the round we start the collision, we need to search for which bits we need to flip. The Chabaud-Joux attack on SHA-0 has a search space of 2^61 for bits that need to be flipped in order to find the correct hash.

The weak message expansion

However, there is still an extra foothold that Chabaud and Joux were able to use: the message expansion of SHA-0 is just an XOR. When they were satisfying the constraints of the bits they were flipping, they were able to keep flipping the same bit in each message word instead of needing to come up with complicated arrangements of bits that needed to be flipped. The extra rotation function added to SHA-1 caused this particular attack to be impractical: the search space for bits to be flipped rose dramatically, and new approaches had to be developed.

Popularity: 6% [?]

Good Advice From the Creator of C++

Posted on June 27, 2008
Filed Under Programming, Quote, Software Development | Leave a Comment

Know the foundations of computer science: algorithms, machine architectures, data structures, etc. Don’t just blindly copy techniques from application to application. Know what you are doing, that it works, and why it works. Don’t think you know what the industry will be in five years time or what you’ll be doing then, so gather a portfolio of general and useful skills. Try to write better, more principled code. Work to make “programming” more of a professional activity and less of a low-level “hacking” activity (programming is also a craft, but not just a craft). Learn from the classics in the field and the better advanced textbooks; don’t be satisfied with the easily digested “how to” guides and online documentation - it’s shallow.

~Bjarne Stroustrup

Popularity: 4% [?]

How Do You Write Programs So Non-Programmers Can Read Them?

Posted on June 23, 2008
Filed Under Programming, Software Development | Leave a Comment

The Setup

In my final semester at TCNJ, I researched cryptographic hash algorithm with a math major. His programming experience was extremely limited: he had only programmed using Matlab, so he knew what functions and variables were, but that was the extent of his knowledge. In school I was a math minor, so we had a common ground for communication. However, part of my job was to actually write the programs that our research needed. Not only that, but the math major had to be able to modify what I had written in meaningful ways. I needed to write the clearest code that I had ever written.

So how did I do it? Language, Good Names/Comments, Structure, and Examples. Basically, all of the ordinary tools for making code more readable to programmers.

Language

I had a few criteria for selection:

I ended up choosing Python. First of all, downloading it and installing it on any OS is a piece of cake, running a Python file takes exactly 1 step: fire up the interpreter with the file as an argument. Second of all, it’s essentially executable pseudo-code. The syntax necessary to write a Python program looks very clean, and the indentation of Python forces a certain level of readability.

Third, I’ve never explicitly "known" Python, but I’ve always reached the "get things done" level of competency within a day of picking it up for a project.

I ignored Java and C++ because getting the compiler tools installed on new Windows machines is a hassle compared to Python, the installation is in 2 steps, and there is just more syntax for the newbie to read. YMMV, but I’ve always picked Python back up quicker than I’ve picked up Java or C++ after a long hiatus, and I "know" Java and C++.

Good Names/Comments

Extensive comments obviously helped the project. Normally, I describe what a function does at a high level, and then comment any tricky parts. This is enough for programmers, but it turned out to be insufficient for non-programmers. I showed my partner some C++ code in the beginning of the project using this technique and he didn’t understand how it worked.

At the beginning of each file, I added a description of the structure, and pointers to where he could find key parts of the code. At the beginning of each function, I commented the hell out of it, describing what it was supposed to do. At every logical sub-section of the code, I left a comment describing what it accomplished.

The best addition to commenting was using good names: consistent case, minimize abbreviation for the most useful names, and consistent naming across functions.

Structure

Every file that I wrote had to have the same general structure:

Since everything was always in the same place, my partner never had to look hard to find what he needed.

Examples

The best thing I did to talk to my parter in programming languages was to implement hash algorithms that he already understood. For example, here is my implementation of SHA-1 [.txt file due to hosting limitations]. I sat down with him and made sure that he knew how to run it, and from there, let him on his merry way.

The Result?

"Actually, it was a lot clearer than I thought it was going to be"

The code was a success. He was able to modify and read it successfully, and do everything that needed to be done.

Is This Relevent to Ordinary Programming?

No. I haven’t added anything new to the discussion, only approached an unusual situation with the usual solutions.

Popularity: 6% [?]

keep looking »