Getting rid of the mismatch between source code and compiled machine code may mean having to debug the machine code. Here's some of the latest research on finding a tool to combat the problem.
By Paul Anderson and Thomas W. Reps
The source-code representation of computer programs is often thought of as the supreme authoritative, precise, and unambiguous specification of what a software program does when it executes. Of course, plenty of errors can occur in source code, and static-analysis tools that find such errors can be effective in pinpointing where the problems are.
However, tools for analyzing source code have a key weakness: computers don't execute source code; they execute machine-code programs that may be generated from source code. The WYSINWYX phenomenon (What You See Is Not What You eXecute) refers to the mismatch between what the source-code description seems to indicate and what is actually executed by the processor.1 The consequence of WYSINWYX is that source-code-analysis tools are fundamentally blind to some kinds of code weaknesses--weaknesses that can only be detected by directly analyzing the machine code.
Compilers introduce WYSINWYX effects for several reasons. Sometimes they're created by machine-code optimization. Another reason is that the compiler author may have interpreted the source-language specification in an unexpected way. Some effects can even be maliciously introduced. Finally, compilers are themselves fairly complex programs and as such may have their own bugs.
In this article, we'll give describe some of these effects and their consequences with a few real-world examples. The cure for WYSINWYX is to use tools that analyze machine code directly, an approach we've taken in our research. We'll discuss some of the challenges we faced.
Incurable optimizations
A classic example of the WYSINWYX effect was found during a security review at Microsoft and reported by Michael Howard.2 When writing secure code, a good guiding principle is to limit the lifetime of sensitive data. This reduces the risk that the data can be retrieved by an attacker or leaked by accident, such as in a crash dump. A common technique is to overwrite the sensitive data with zeroes. In this example, the requirements were that the code should read a password into a buffer, use it to authorize some kind of secret operation, and then scrub it from memory. The code (slightly simplified) looked something like the code in Listing 1.
 Click on image to enlarge. |
The programmer's intent was sound, and few reviewers would say this code had a problem. The problem is not in the source code. When compiled with optimization, the call to memset is removed, so the function returns without scrubbing the password, and this sensitive information is left sitting on the stack. The compiler removes that function call because it uses a standard optimization technique called dead-store removal. It observes that the buffer goes out of scope at the end of the function, and the zeroes written by the call are never read, so it concludes that the entire call to memset is redundant and removes it.
Howard suggests three solutions to prevent the optimizer from removing the call:
• Touch the password after the call to memset so the data appears used,
• Replace the call to memset with something that will not get optimized away, or
• Turn off optimizations for that code.
For the first solution, he suggests adding the source code in Listing 2.
 Click on image to enlarge. |
Unfortunately, this fix only helps a little! Some compilers are sophisticated enough to recognize that this statement only touches a single byte of the buffer and therefore conclude that the call to memset can be transformed into code that zeros out only the first byte.3
Other functions could be called to zero out the buffer, but some suffer the same disadvantage as memset. As a consequence of finding this problem, Microsoft introduced the SecureZero Memory() function, whose implementation guarantees that the buffer will be cleared.
A second example where the optimizer unwittingly introduced a vulnerability into code that was otherwise reasonable came to light recently when it was used to demonstrate an exploitable vulnerability in the Linux kernel.4 In this case, the offending code is shown in Listing 3.
 Click on image to enlarge. |
In this case, the source code is not entirely innocent. It contains contradictory assumptions: if tun could in fact be NULL, then the first line will (under normal circumstances) cause a null-pointer dereference and the kernel will crash. If tun could never be null, the entire if statement is redundant because the condition will never be true. This is a kind of weakness that can be found by some advanced static-analysis tools for source code (including CodeSonar, the tool we work on).
The WYSINWYX effect arises because the compiler notices this redundancy and eliminates the if statement entirely. Although usually reasonable, removing the if statement in this case introduces the vulnerability because an attacker can memory map the NULL address to user-space and from there take control of the kernel.