In my last article I dug down into some of the joys of undefined pointer behavior in C. But I did it in the context of an architecture that most developers aren't too likely to ever see again, the Intel 8086. I wanted to show that this stuff matters even with more mainstream architectures because compilers are free to do a lot of non-obvious things. C is not assembly.
The United States Computer Emergency Readiness Team (US-CERT) "is charged with providing response support and defense against cyber attacks for the Federal Civil Executive Branch (.gov) and information sharing and collaboration with state and local government, industry and international partners."
With a U.S. Department of Homeland Security badge on their page you know they're serious. When you overflow your buffers the terrorists win. So you'd think they'd take C seriously.
I found a real WTF gem caused by programmers treating C like assembly. Vulnerability Note VU#162289
"In the C language, given the following types:
char *buf; int len;
some C compilers will assume that buf+len >= buf. As a result, code that performs wrapping checks similar to the following:
len = 1<<30; [...] if(buf+len < buf) /* wrap check */ [...overflow occurred...]
are optimized out by these compilers; no object code to perform the check will appear in the resulting executable program. In the case where the wrap test expression is optimized out, a subsequent manipulation of len could cause an overflow. As a result, applications that perform such checks may be vulnerable to buffer overflows."
The advisory is careful to admit that compilers are free to do just that. Here's why: greatly simplified, the C standard says a pointer must point at a valid object or just past the end. Any pointer arithmetic that might cause a pointer to step outside those bounds yields undefined behavior. Now, there's some code missing from their snippet, but I have to assume that the compiler knows that buf points at the beginning of something and not into the middle. So by definition either buf + len >= buf or the program is free to do anything up to and including launching shoulder mounted kitten missiles at Capitol Hill.
Still, there is a WTF to lay on the compiler writer here. If a programmer writes an "if" test then presumably he or she had some reason to believe that sometimes the test might be true. Before optimizing away the conditional the compiler really should have issued a warning.
In order for this code to have any hope of working a few assumptions must hold: sizeof(int) <= sizeof(char *), overflow must happen "silently", etc. But there's another major assumption here: the buffer pointed to by buf must be located at the end of its address space. With a check like this, if there are any objects located higher in the same overflow segment then those objects are getting some kitten missiles. So another WTF is a developer making an assumption about how a compiler works in the face of undefined code.
Now, there are a few scenarios where all these assumptions might be justified. For instance, if you're targeting some special purpose embedded device then memory layout might be well understood. In such situations, the optimization performed by the compiler might be shocking indeed, even if technically permissible.
The problem is that the developer is thinking at the assembly code level but the C standard says the compiler doesn't have to "think" the same way. In assembly the distance between what you write and the object code generated is pretty small (barring some kind of complicated macro magic). In C the distance is quite a bit larger.
Repeat after me, C is not assembly.
In my mind the real WTF is the US-CERT's proposed solution.
"Cast objects of type char* to uintptr_t before comparison. The faulty wrapping check listed above would be written
#include <stdint.h> [...] if((uintptr_t)buf+len < (uintptr_t)buf) [...]
Alternatively, developers can use size_t on platforms that do not provide the uintptr_t type."
Yes, that's right, the proposed solution is to replace one set of undefined behavior with another. Well...WTF? Didn't we get here because the compiler was free to do anything in the face of undefined semantics?
C is not assembly. Not even when you do some casting. Pointers might be segmented in such a way that overflow behavior for pointers is totally different from that of integral types. Or sizeof(int) > sizeof(uintptr_t). Or the cast to an integral type could do anything at all like invert the bits. Or shoulder mounted kitten missiles. Whatever.
The Real Solution
The best solution is "don't do that." Give the buffer a size and compare len to that size. The assumption that the buffer is the last object in its address space is just playing with fire. But I'll assume they really did need to use memory at the end of the address space. Fine. What's the solution?
It's this: you can't say "end of address space" portably in C so tread carefully when you use undefined behavior. Minimize the amount of undefined behavior you depend on. Pull the non-portable stuff out into separate compilation units and document that the assumptions. Maybe even write the non-portable stuff in assembly where you know what's going to happen, but at least write a unit test that ensures that the compiler behaves the way you think it does. And always, always remember that C is not assembly.