CSCI 530 Lab
Application Security
This lab is divided in two parts:
1 - stack overflow mechanics - we will not go so far as to overflow the stack with any particular attack data or code, but will examine its structure and operation in detail to grasp what overflowing it means.
2 - a C sign extension bug in a password hashing library
These are chosen from the endless myriad of "soft spots" in software. There are too many different kinds of vulnerabilities to mention. And that doesn't even address the ones nobody knows about. These two, chosen randomly because they are feasible, representative, fun, and might fit in an hour and a half, should give you some idea of the variety of ways software can be weak.
Part one: stack overflow mechanics
As advance preparation for this exercise, read though page 8 of Hackin9 magazine article Overflowing the stack on Linux x86 by Piotr Sobolewski. This exercise derives from it.
Use the Centos/snort virtual machine (it may not work the same in other machines). In it, install the GNU debugger:
yum install gdb
Obtain the source files. They're in a zip file One way is:
wget www-scf.usc.edu/~csci530l/downloads/softwaresecurity.zip
unzip softwaresecurity.zip
In order to get some gdb experience, play with vars.c. Examine its source code. Compile it with debugging data in the output, then bring up the output binary "vars" in the debugger gdb:
gcc vars.c -o vars -ggdb
gdb vars
Press ctrl-L to clear the screen. Run these commands in succession within the debugger:
list
break 5
run
print $esp
print $ebp
This runs the program up to (but not including) the first assignment, in line 5. Then it examines the values in the stack pointer and base pointer (these are registers, not memory locations). You should note that the base pointer exceeds the stack pointer by 0x28, or 40. The 40 bytes between the 2 addresses are "the stack," which we would like to examine. The command for you to execute is:
x/10 $esp
meaning, examine the 10 4-byte words starting with the address given by the stack pointer (going through the one at the address just before the base pointer). You'll see 40 not too meaningful bytes. Now fire the next line, "a=1;"
next
And re-display the stack:
x/10 $esp
The change in the stack is interesting. Note that the final 4-byte word now holds 0x00000001 while before it was something else. If you now run the next 2 assignments you will see the stack evolve to contain them:
next
x/10 $esp
next
x/10 $esp
(You can use command history in gdb just like the shell, where the up-arrow key recalls recent commands to the command line without your having to re-type them.) Issue "next" a couple more times to bring the program to completion.
vars.c shows one of the things the stack contains-- local variables for a function. rvals.c shows what happens when there is more than one function. The stack is also utilized to hold the return addresses in the calling function to which each called function is supposed to return control when it finishes. Since each function has its own stuff-- variables, return values-- the stack is divided internally into "sub-stacks." There's a separate, private little sub-stack for every function. They are called frames. So... another thing that goes in the frame for each function is the previous frame pointer. It points back in the stack to the calling function's frame. (This isn't the same as the return address. Both the return address and the previous frame pointer belong to the calling function, but the return address points into its code area of memory while the frame pointer points into the stack.) rvals.c shows the appearance of these new items in the stack in addition to the variables we've already seen there. The use of rvals.c in the debugger to show this is documented in the slides for this lab. Review their depiction of what happens when rvals.c runs.
Repeat the rvals run shown in the slides' screenshots
"Do" the screenshots by mimicking them on your machine to "bring it alive." rvals.c called a function but passed it nothing. If at any point needed information has scrolled off the screen, shift-pgup will scroll the display backward a screenful at a time for you.
The next thing that interests us is the stack manifestation of the parameters passed into receiving arguments during a function call. That's what stack_2.c demonstrates.
Repeat the stack_2.c run shown in the slides' screenshots
by mimicking it on your machine.
Now let's shift our attention to a program that passes parameters, stack_1.c.
gcc stack_1.c -o stack_1 -ggdb
gdb stack_1
In the debugger
list
break 8
break3
break4
run AAAAAAAAAA
x/6 $esp
next
x/24 $esp
next
x/24 $esp
The breaks are 1) in main just before the function call, 2) in the function just before filling the buffer (which, as a variable, is in the stack) with what the user typed, and 3) just after filling it. When you do the first "x/24 $esp" you'll see a block of text. Find embedded in it the smaller block of text you got before when you did "x/6 $esp". The new stuff is the "stack growth" and is the current function's stack frame laid onto the main function's stack frame. The stack as a whole has the two frames now. After you do the second "x/24 $esp" compare the resulting block of text with the one before. It should show the "AAAAAAAAAA" has inserted itself. The A's appear as their hex code value 41. Find the ten 41's.
Now locate the return address in the stack. The function is depending on it for later use. When the function terminates, the return address will guide the program back into the main function from whence it came. The return address is important. To find the return address:
print $ebp
add 4 to the number that results and treat the sum as an address indexing into a place in the stack. At that place, the number you see is the return address. Where are your ten A's? How far away from the return address are they? If you'd typed 11 A's on the command line, your A's would be one byte closer to the return address. What if you typed 12 or 13? What would happen to the return address if you typed 100 A's?
Part two: sign extension bug
The code flaw resulting in the sign extension bug in crypt_blowfish is embodied in my sample program sign-extension-bug.c. Download, unzip, compile, and run it.
wget
http://www-scf.usc.edu/~csci530l/downloads/sign-extension-bug.zip
unzip sign-extension-bug.zip
gcc sign-extension-bug.c -o
sign-extension-bug
./sign-extension-bug
You will see, live, the same result shown in the slide entitled "A code embodiment" among this lab's slides.
Let's do this again, but replacing the first 3 bytes of the "key" variable with the first 3 letters of your last name. Use an editor to change the first 3 bytes of "key". They are 0x11, 0x22, and 0x33 respectively. Replace them with the ascii codes (look them up) for the upper-case first three letters of your last name (use your last name of record for this class, the one that is on the official roster). For example, my last name being Morgan, I need to insert the ascii codes for M, O, and R. They are 0x4d, 0x4f, and 0x52. So I will change the line in the program that reads:
char key[4] = { 0x11, 0x22, 0x44, 0x88 };
so that it instead reads:
char key[4] = { 0x4d, 0x4f, 0x52, 0x88 };
Do similarly, using letters of your name instead of mine. Do not change the 0x88. After editing, compile and run the program again.
gcc sign-extension-bug.c -o
sign-extension-bug
./sign-extension-bug
This time, same thing as before but now it's personal. Instead of 112244, it's your name getting trashed! Though the letters of your name are correctly loaded at intermediate stages of the program in the end they get destroyed, overwritten by FFs. That wasn't supposed to happen, that's not how we wrote the code! But it did, and the code is wrong.
How could we fix this? My slide titled "A code embodiment" shows the result of running a fixed version, but doesn't show the actual fix. My slide entitled "Information sources" shows several references dealing with this bug, some of which contain suggested fixes. Please consult the references, then fix the code. Work with your fellow students if helpful. When fixed, run the code again. The letters of your name should survive and appear in the last line of output. Once you have achieved that, please capture the information I would like you to include in your submittal:
cat sign-extension-bug.c > fixmyname.txt
./sign-extension-bug >> fixmyname.txt
After you have performed the above lab components, answer the following
questions.
1. In stack_1.c the separation in the stack between the beginning of the argument buf and the beginning of the stored return address is 28 bytes. See the screenshot in the slide entitled "Stack separation between argument and return address," also the article's page 7. In the screenshot, the argument starts at 0xbfffd530. And the return address (always at ebp plus 4 bytes, and ebp holds 0xbfffd548) is at 0xbfffd54c. The separation between their starting points is therefore
0xbfffd54c - 0xbfffd530 = 0x0000001c
or twenty-eight bytes.
Change the code in stack_1.c to provide a buffer length of 2 instead of 10. That is, change line 2 from "char buf[10];" to "char buf[2];". (If you don't know how to use a linux character mode editor ask your lab assistant or fellow student for some quick help-- it's just one character!) Recompile and investigate (break/interrupt just after the stack buffer has been given data). Determine where exactly the argument starts by executing "print &buf". Determine where the return address lies by executing "print $ebp" and adding 4. What is the separation now?
2. An easier, more casual way to observe the buffer overflow is to run
stack_1 from the command line instead of the debugger, supplying the variable's
content as a command-line argument (note the program is written with formal
parameters to receive it). While
you can get away with passing a few more bytes than the buffer's designed for,
without error, at a certain point and beyond you'll experience "Segmentation
fault" error messages. Segmentation being a memory management technique,
and you having screwed up memory pointers, there's no surprise in that
terminology. Run
stack_1.c and cheat by a byte, by passing it 11 characters (using
"12345678901..." lets you visually keep track). Pass it increasing
numbers of characters until you get the "Segmentation fault". Note the
lowest/first value at which you reach that problem.
a. What is that value?
b. in broad generality, why is more than 10 OK up to a point?
c extra credit: I expected the lowest encounter with the problem at 1 byte
more than it actually was. Why was I off by 1 byte and what's the reason for the
problem appearing when it does? What is getting clobbered at that particular
point?
3. Incorporate/append at the end of your submittal file the contents of the "fixmyname.txt" file you generated above.