Date: 01 September 2011
Click here for printable version
I am sure many of you have heard of Blowfish (if not have a read on Wikipedia or head over to the horse's mouth of Bruce Schneier). It is as symmetric block cipher that seems to be one of the faster methods of encryption out there.
Recently (a month or two ago) a 13 year old bug in some implementations were found that had the potential to drastically reduce the security of data secured with those implementations. Specifically this came to light because of some differences noticed between the "crypt_blowfish" implementation when compared against another (correct) implementation. The fact that this took 13 years to find might lead you to the conclusion that it was a subtle bug - and you would be correct.
This specific library is used by SUSE (and some others) for password hashing, which is the reason behind the recent 8 bit SUSE password bulletin.
The bug comes about due to the method a compiler uses to interact with signed numbers when they need to have the number of bits used to represent that number increased. There are three bits of information you need to know before the bug will make sense.
The first bit of information you will need to understand this bug is an idea of what Two's Complement representation of numbers is. The most important thing to note is that while positive numbers "default" to having leading 0's, negative numbers have leaving 1's.
The second bit of background information you need is about how small or 8 bit numbers can interact with longer 16 or 32 bit numbers. In order for two numbers of different "size" to interact, they need to become the same size. Normally this would be as simple as sticking a bunch of 0's on the front, however negative numbers need 1's on the front. To accomplish this the x86 instruction set has two different "extend" instructions: MOVSX (Move with sign-extend), and MOVZX (Move with zero-extend). The first (MOVSX) adds 0's or 1's as is required, the second (MOVZX) always adds 0's to the front of the number (even if it is negative).
The third bit of information is related to how a char is represented in the C programming language. Now, my knowledge of C is ok, but in this specific area I differ to the all-knowing Wikipedia for some easy to digest information. In C all types have an unsigned and signed version. Why you need a signed char I am not sure, but you get it. However what I found really interesting is that, unlike all the other types, char has three different types: unsigned (specified by "unsigned char"), signed (specified by "signed char"), and just char which is a magical specification that "may be a signed type or an unsigned type, depending on the compiler and the character set" (Wikipedia). So what I always thought of as one of the most simple types in C - char - is actually a bit of an unknown - YAY. In this instance it turns out to be signed.
The Bug Exploding Blowfish Sushi
These three items all tie together in the code that turns four 8 bit characters into a 32 bit ... collection of bits. To do this four char's are essentially put one after the other into a 32 bit variable. This is done by OR-ing the 32 bit result with the 8 bit char and then shifting the 32 bit result 8 bits over until all four 8 bit characters are combined.
Or at least that is what is supposed to happen. In practice because any char that is over 0x80 (thankfully outside normal 7 bit ASCII, however the pound sign (£) triggers it) is considered signed and has to interact with the longer 32 bit result, it must be extended. Our friendly compiler chooses the MOVSX because the char is signed, and that results in any of the four characters that have already been put into the result getting wiped over with 1's.
Confused? It took me a couple of re-reads to form a mental picture, so lets see if I can draw one for you. The first table is what we want to happen, the second shows how it does happen due to the sign-extension. In the second, the values are OR'ed together into the result meaning that the sign-extension causes the "cat" to be overwritten with 1's.
In the tables I am assuming that the binary representation of £ (decimal 163) is represented as 10100011 in 8 bits which is interpreted as -93 (decimal) when extending into two's complement. I am reasonably sure this is what is happening, but it makes me further question the reason for having a signed char! If anyone has a reason why a signed char is a good idea I would love to hear from you. This page also has a good discussion about signed and unsigned types in C and where the unknown-signed "char" type came from.
The Ending (almost)
As you can (hopefully) see - the pound sign causes all 3 other letters to get over-written with 1's. This means that "cat£", "aaa£" and "zzz£" would all produce the same result. Of course this will only happen when you have non 7 bit ASCII characters, but that is becoming more normal these days given all the different languages that people speak and type. In the worst case a password of 16 characters (quite a reasonable length) could be reduced to the equivalent of 4 characters, fortunately I don't believe there will be too many examples of this.
This problem has already been corrected and the patch includes a "sign_extension_bug" flag that can be set so that backwards compatibility can be maintained. If you are interested in reading more about this the following are some posts/articles that I found interesting:
One Final Thing
While I was looking for further information about this bug I found that a very similar bug was found back in 1996. This one seems to be the same sort of bug in a different part of a different code base for blowfish - but goes to show that I was not the only one who was slightly surprised to find that a char is C is (normally) a signed value.
Hope you enjoyed reading - please feel free to send us any other interesting bugs like this that you find.