Hack Proofing Your Network-4

mic64 · 2004-07-16, 04:31 PM

Chapter 4
99
Summary
Solutions Fast Track
Frequently Asked Questions
100 Chapter 4 • Methodology
Introduction
There are several ways to approach any problem; and which approach you choose
usually depends on the resources available to you and the methodology with
which you are most comfortable. In the case of vulnerability research challenges,
the resources may be code, time, or tools.
In some cases, you may be dealing with a software program for which the
source code is readily available. For many people, reading the source code may be
the easiest way for them to determine whether or not there are vulnerabilities;
many vulnerabilities are tied to particular language functions or ways of calling
external functions.The source code often gives the clearest picture of how this
happens in a given program.
Another method of determining how a program works, and therefore
whether there are holes, is reverse engineering, which may require special tools,
such as disassemblers and debuggers. Since much is lost in the translation from
source code to object code, it can often be more difficult to determine exactly
what is happening in reverse engineered code.
The last method is black box testing. Black box testing allows only for the
manipulation of the inputs and the viewing of a given system outputs, without
the internals being known. In some cases (such as attempting to penetrate a
remote system), black box testing may be the only method initially available. In
other cases, it may be used to help chose where to focus further efforts.
In this chapter, we cover the various methodologies used for vulnerability
research, with examples for each method.
Understanding Vulnerability
Research Methodologies
Let us break down vulnerability research methodologies using easily understood terms.
A vulnerability is a problem, either exploitable or not, in anything from a microcontroller
to a supercomputer. Research is the process of gathering information
that may or may not lead to the discovery of a vulnerability. Methodologies are the
commonly used, recommended, or widely accepted methods of vulnerability
research.
Vulnerability research methods are fundamentally the same everywhere. From
the security enthusiast at home to the corporate code auditor, the methods and
tools are the same. Methods ranging from lucky guesses to the scientific method
and tools ranging from hex editors to code disassemblers are applied in everyday
www.syngress.com
www.syngress.com
practice. Some of these methods can appear to be chaotic, while some present
themselves as more detail-oriented and organized. Less experienced researchers
might prefer a more organized approach to vulnerability research, whereas seasoned
researchers with programming experience may rely more on instinct.The
choice of methods tends to be a matter of personal preference.
It should also be mentioned that different data types require different research
methods. Handling binary data requires a very different approach than handling
source code, so let’s examine these approaches separately.
NOTE
There are a number of different organization schemes used by
researchers in the security community when researching vulnerabilities.
These methods are varied; some individuals or groups rely on methodical,
organized, militant audits of programs, performed on a piece-bypiece
basis whereas others use methods with the consistency and
organization of white noise.
Organization is subjective, and best suited to a researcher’s taste. It is
worth mentioning that a number of vulnerability tracking and software
audit tracking packages are freely available; some packages are no more
complex than a Web CGI and SQL Database, while others, such as
Bugzilla, offer a number of features such as user accounts, bug ID numbers
and tracking, and nice interfaces.
Source Code Research
Source code research entails obtaining the source of the program in its proverbial
“potential energy” state.The program source may be written in one of any
number of languages such as C, Perl, Java, C++,ASP, PHP, or the like. Source
code research is typically first begun by searching for error-prone functions.
Searching For Error-Prone Functions
Source is audited in a number of ways.The first method is to use searching utilities
to discover the use of certain error-prone functions in the source code.These
functions may be searched for via the use of utilities such as grep.
Some functions that may be researched are strcpy and sprintf.These C functions
are habitually misused or exploited to perform nefarious activities.The use
Methodology • Chapter 4 101
102 Chapter 4 • Methodology
of these functions can often result in buffer overflows due to lack of bounds
checking. Other functions, such as mktemp, may result in exploitable race conditions
and the overwriting of files, or elevated privileges.
Line-By-Line Review
The next source code review method is a line-by-line review. Line-by-line
reviews involve following the program through execution sequences.This is a
more in-depth look at the program, which requires spending time to get familiar
with all parts of the program.
This type of research usually involves a person following the source through
hypothetical execution sequences. Hypothetical execution sequences use a combination
of different options supported by the program with varying input.The execution
of the program is traced visually, with the researcher mentally tracking the
various data passing through functions as they are handled by the program.
Discovery Through Difference
Discovery through difference is another method used to determine a package’s vulnerabilities.
This type of research is performed when a vendor fixes a vulnerability
in a software package, but doesn’t release details about the problem.This method
is determines whether a file has been altered, and if so, which parts of the file
have been altered from one release to the next.
One of the most important utilities used in this type of research is diff. Diff is
distributed with most UNIX operating systems, and is also available for a wide
variety of other platforms through such groups as the Free Software Foundation.
Diff compares two data samples, and displays any differences encountered.This
program can be used on source files to output the exact differences between the
source bases.
The method of discovery through difference is usually performed to determine
the nature and mode of a vulnerability about which the vendor has released
few details. For example, software update announcements made by Freshmeat
often include vague details about updates to a package that “may affect security,”
such as a recent vulnerability discovered in the axspawn program.
The vulnerability patch was announced as a security update for a potential
buffer overflow. However, no other details were given about the vulnerability.
Upon downloading the 0.2.1 and 0.2.1a versions of the packages, and using the
diff utility to compare them, the problem became apparent:
www.syngress.com
Methodology • Chapter 4 103
elliptic@ellipse:~$ diff axspawn-0.2.1/axspawn.c axspawn-
0.2.1a/axspawn.c
491c491
< envc = 0;
---
> envc = 0;
493c493
< sprintf(envp[envc++], "AXCALL=%s", call);
---
> sprintf(envp[envc++], "AXCALL=%.22s", call);
495c495
< sprintf(envp[envc++], "CALL=%s", (char *)user);
---
> sprintf(envp[envc++], "CALL=%.24s", (char *)user);
497c497
< sprintf(envp[envc++], "PROTOCOL=%s", protocol);
---
> sprintf(envp[envc++], "PROTOCOL=%.20s", protocol);
500c500
< envp[envc] = NULL;
---
> envp[envc] = NULL;
As we can see, the first version of axspawn.c uses sprintf without any restrictions
on the data length. In the second version, the data is length-restricted by
adding format length specifiers.
In some situations, the vendor may already do this work for us by releasing a
patch that is a diff between the two source bases.This is usually the case with
BSD-based operating systems such as FreeBSD.A vulnerability in the FreeBSD
package tools during January of 2002 was discovered that could allow a user to
extract data into a temporary directory and alter it.While this information was
disclosed via the full disclosure method, the patch distributed for pkg_add tells us
exactly where the vulnerability is at:
--- usr.sbin/pkg_install/lib/pen.c 17 May 2001 12:33:39 -0000
+++ usr.sbin/pkg_install/lib/pen.c 7 Dec 2001 20:58:46 -0000
@@ -106,7 +106,7 @@
www.syngress.com
104 Chapter 4 • Methodology
cleanup(0);
errx(2, __FUNCTION__ ": can't mktemp '%s'", pen);
}
- if (chmod(pen, 0755) == FAIL) {
+ if (chmod(pen, 0700) == FAIL) {
cleanup(0);
errx(2, __FUNCTION__ ": can't mkdir '%s'", pen);
}
The sections of source being removed by the patch are denoted with a minus
sign, while the plus sign denotes added sections.As we can see, the section of
source that created the directory with permissions of 0755 is being replaced with
a section that creates the directory with permissions of 0700.
Research may not always be this easy—that said, let’s take a look at
researching binary-only software.
Binary Research
While auditing source is the first-choice method of vulnerability research, binary
research is often the only method we are left with.With the advent of the GNU
License and open source movements, the option of obtaining the source code is
more feasible, but not all vendors have embraced the movement. As such, a great
many software packages remain closed-source.
Tracing Binaries
One method used to spot potential vulnerabilities is tracing the execution of the
program.Various tools can be used to perform this task. Sun packages the truss
program with Solaris for this purpose. Other operating systems include their own
versions, such as strace for Linux.
Tracing a program involves watching the program as it interacts with the
operating system. Environment variables polled by the program can be revealed
with flags used by the trace program. Additionally, the trace reveals memory
addresses used by the program, along with other information.Tracing a program
through its execution can yield information about problems at certain points of
execution in the program.
The use of tracing can help determine when and where in a given program a
vulnerability occurs.
www.syngress.com
Methodology • Chapter 4 105
Debuggers
Debuggers are another method of researching vulnerabilities within a program.
Debuggers can be used to find problems within a program while it runs.There
are various implementations of debuggers available.One of the more commonly
used is the GNU Debugger, or GDB.
Debuggers can be used to control the flow of a program as it executes.With a
debugger, the whole of the program may be executed, or just certain parts.A
debugger can display information such as registers, memory addresses, and other
valuable information that can lead to finding an exploitable problem.
Guideline-Based Auditing
Another method of auditing binaries is by using established design documents
(which should not be confused with source code). Design documents are typically
engineering diagrams or information sheets, or specifications such as a
Request For Comments (RFC).
Researching a program through a protocol specification can lead to a number
of different conclusions.This type of research can not only lead to determining
the compliance of a software package with design specifications, it can also detail
options within the program that may yield problems. By examining the foundation
of a protocol such as Telnet or POP3, it is possible to test services against
these protocols to determine their compliance. Also, applying known types of
attacks (such as buffer overflows or format string attacks) to certain parts of the
protocol implementation could lead to exploitation.
Sniffers
One final method we will mention is the use of sniffers as vulnerability research
tools. Sniffers can be applied to networks as troubleshooting mechanisms or
debugging tools. However, sniffers may also be used for a different purpose.
Sniffers can be used monitor interactivity between systems and users.This can
allow the graphing of trends that occur in packages, such as the generation of
sequence numbers. It may also allow the monitoring of infrastructures like
Common Gateway Interface, to determine the purpose of different CGIs, and
gather information about how they may be made to misbehave.
Sniffers work hand-in-hand with our previously mentioned Guideline-based
auditing. Sniffers may also be used in the research of Web interfaces, or other network
protocols which are not necessarily specified by any sort of public standard,
but are commonly used.
www.syngress.com
106 Chapter 4 • Methodology
The Importance of Source Code Reviews
Auditing source should be a part of any service deployment process.The act of
auditing source involves searching for error-prone functions and using line-byline
auditing methodologies. Often, problems are obscured by the fact that a
given application’s source code may span multiple files.While the code of some
applications may be contained in a single source file, the source code of applications
such as mail transport agents,Web servers, and the like span several source
files, header files, make files, and directories.
Searching Error-Prone Functions
Let us dig into the process of searching for error-prone functions.This type of
search can be performed using a few different methods. One way is to use an
editor and search for error-prone functions by opening each file and using the
editor’s search function.This is time consuming.The more expedient and effi-
cient method involves using the grep utility.
Let’s look at a few rudimentary examples of problems we may find in source
code, that include the above-mentioned functions.
Buffer Overflows
A buffer overflow, also known as a boundary condition error, occurs when an
amount greater than storage set aside for the data is placed in memory. Elias Levy,
also known as Aleph1, wrote an article about this, titled “Smashing the Stack for
Fun and Profit.” It is available in Phrack issue 49, article number 14.
Observe the following program:
/* scpybufo.c */
/* Hal Flynn <mrhal@mrhal.com> */
/* December 31, 2001 */
/* scpybufo.c demonstrates the problem */
/* with the strcpy() function which */
/* is part of the c library. This */
/* program demonstrates strcpy not */
/* sufficiently checking input. When */
/* executed with an 8 byte argument, a */
/* buffer overflow occurs */
www.syngress.com
Methodology • Chapter 4 107
#include <stdio.h>
#include <strings.h>
int main(int argc, char *argv[])
{
overflow_function(*++argv);
return (0);
}
void overflow_function(char *b)
{
char c[8];
strcpy(c, b);
return;
}
In this C program, we can see the use of the strcpy function. Data is taken
from argv[1], then copied into a character array of 8 bytes with the strcpy function.
Since no size checking is performed on either variable, the 8-byte boundary
of the second variable can be overrun, which results in a buffer overflow.
Another commonly encountered error-prone function is sprintf.The sprintf
function is another source of habitual buffer overflow problems. Observe the
following code:
/* sprbufo.c */
/* Hal Flynn <mrhal@mrhal.com> */
/* December 31, 2001 */
/* sprbufo.c demonstrates the problem */
/* with the sprintf() function which */
/* is part of the c library. This */
/* program demonstrates sprintf not */
/* sufficiently checking input. When */
/* executed with an argument of 8 bytes */
/* or more a buffer overflow occurs. */
www.syngress.com
108 Chapter 4 • Methodology
#include <stdio.h>
int main(int argc, char *argv[])
{
overflow_function(*++argv);
return (0);
}
void overflow_function(char *b)
{
char c[8];
sprintf(c, "%s", b);
return;
}
As in the previous example, we have an array taken from argv[1] being copied
to an array of 8 bytes of data.There is no check performed to ensure that the
amount of data being copied between the arrays will actually fit, thus resulting in
a potential buffer overflow.
Similar to the strcpy function is strcat.A common programming error is the
use of the strcat function without first checking the size of the array.This can be
seen in the following example:
/* scatbufo.c */
/* Hal Flynn <mrhal@mrhal.com> */
/* December 31, 2001 */
/* scatbufo.c demonstrates the problem */
/* with the strcat() function which */
/* is part of the c library. This */
/* program demonstrates strcat not */
/* sufficiently checking input. When */
/* executed with a 7 byte argument, a */
/* buffer overflow occurs. */
www.syngress.com
Methodology • Chapter 4 109
#include <stdio.h>
#include <strings.h>
int main(int argc, char *argv[])
{
overflow_function(*++argv);
return (0);
}
void overflow_function(char *b)
{
char c[8] = "0";
strcat(c, b);
return;
}
Data passed from argv[1] to the overflow_function.The data is then concatenated
onto c, an 8-byte character array. Since the size of the data in argv[1] is not
checked, the boundary of c may be overrun.
The gets function is another problematic function in C.The GNU C
Compiler will produce a warning message when it compiles code using the gets
function. Gets does not perform checks on the amount of input received by a
user. Observe the following code:
/* getsbufo.c */
/* Hal Flynn <mrhal@mrhal.com> */
/* December 31, 2001 */
/* This program demonstrates how NOT */
/* to use the gets() function. gets() */
/* does not sufficient check input */
/* length, and can result in serious */
/* problems such as buffer overflows */
www.syngress.com
110 Chapter 4 • Methodology
#include <stdio.h>
int main()
{
get_input();
return (0);
}
void get_input(void)
{
char c[8];
printf("Enter a string greater than seven bytes: ");
gets(c);
return;
}
We can see the use of the gets function.When called, it places the data in the
c character array. However, since this array is only 8 bytes in length, and gets does
not perform proper checking of input, it is easily overflowed.
For additional in-depth information on buffer overflows please refer to
Chapter 8.
Input Validation Bugs
Another common programming problem is the lack of input validation by the
program.The lack of input validation can allow a user to exploit programs such
as setuid executables or Web applications such as CGIs, causing them to misbehave
by passing various types of data to them.
This type of problem can result in format string vulnerabilities.A format string
vulnerability consists of passing several string specifiers such as %i%i%i%i or
%n%n%n%n to a program and possibly resulting in code execution. Format
strings are covered in depth in Chapter 9.
www.syngress.com
Methodology • Chapter 4 111
Rather than covering them in depth, we will provide an example of a format
string vulnerability in code. Observe the following:
/* fmtstr.c */
/* Hal Flynn <mrhal@mrhal.com> */
/* December 31, 2001 */
/* fmtstr.c demonstrates a format */
/* string vulnerability. By supplying */
/* format specifiers as arguments, */
/* attackers may read or write to */
/* memory. */
#include <stdio.h>
int main(int argc, char *argv[])
{
printf(*++argv);
return (0);
}
By running the above program with a string of %n format specifiers, a user
could print to arbitrary locations in memory. If this were a setuid root executable,
this could be exploited to execute code with root privileges.
Lack of input validation by Web applications such as CGIs is another commonly
occurring problem. Often, poorly written CGIs (especially those written in
Perl) permit the escaping of commands by encapsulating them in special characters.
This can allow one to execute arbitrary commands on a system with the privileges
of the Web user.The problem could be exploited to carry out commands
such as removing the index.html, if that file is owned and write-accessible by the
HTTP process. It could even result in a user binding a shell to an arbitrary port
on the system, gaining local access with the permissions of the HTTP process.
This type of problem could also result in a user being able to execute arbitrary
SQL commands. CGI is commonly used to facilitate communication
between a Web front-end and an SQL database back-end, such as Oracle,
MySQL, or Microsoft SQL Server. A user who is able to execute arbitrary SQL
www.syngress.com
112 Chapter 4 • Methodology
commands could view arbitrary tables, perform functions within the database, and
potentially even drop tables.
Observe the following open:
#!/usr/bin/perl
open("ls $ARGV[0] |");
This function does not check the input from $ARGV[0].The intended directory
may be escaped by supplying dot-dot (..) specifiers to the command, which
could list the directory above, and potentially reveal sensitive information.A
deeper discussion of input validation bugs is available in Chapter 7.
Race Conditions
Race conditions are a commonly occurring programming error that can result in
some serious implications. A race condition can be defined as a situation where
one can beat a program to a certain event.This can be anything from the locking
of memory to prevent another process from altering the data in a shared segment
scenario, to the creation of a file within the file system.
A common programming problem is the use of the mktemp function. Let’s
look at the following program:
/* mtmprace.c */
/* Hal Flynn <mrhal@mrhal.com> */
/* mtmprace.c creates a file in the */
/* temporary directory that can be */
/* easily guessed, and exploited */
/* through a symbolic link attack. */
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *example;
char *outfile;
char ex[] = "/tmp/exampleXXXXXX";
example = ex;
www.syngress.com
Methodology • Chapter 4 113
mktemp(example);
outfile = fopen(example, "w");
return (0);
}
This program will, on some operating systems, create a file in the temporary
directory that consists of a predetermined name (it’s called example in the above
source) and ending in six characters, the first five being the process ID, and the
final being a letter.The first problem in this program is that a race occurs
between the check for the existence of the file name and the creation of the file.
Additionally, the name can be easily guessed as the process ID can be predicted.
Therefore, the maximum amount of names the file could use is limited by the
English alphabet, totaling 26 variations.This could result in a symbolic link
attack.To determine whether or not an operating system is using a vulnerable
implementation, examine the files created by this program in the /tmp directory.
By using a utility such as grep, we can investigate large amounts of code for
common problems. Does this still ensure we are safe from vulnerabilities? No. It
does, however, help us find and eliminate the larger part of the programming
problems encountered in programs.The only sure method that one can use to
ensure a secure piece of software is to have multiple parties perform a line-byline
audit. And even then, the security of the software can only be considered
“high,” and not totally secure.
Reverse Engineering Techniques
Reverse engineering programs are one of the most commonly used and accurate
methods of finding vulnerabilities in a closed-source program. Reverse engineering
can be performed with a number of different tools, varying by operating
system and personal taste. However, the methods used to reverse engineer are
similar in most instances.
Generally, you will want to start at a high level and work your way down. In
most cases, this will mean starting with some system monitoring tools to determine
what kinds of files and other resources the program accesses. (A notable
exception is if the program is primarily a network program, in which case you
may want to skip straight to packet sniffing.)
www.syngress.com
114 Chapter 4 • Methodology
Windows doesn’t come with any tools of this sort, so we have to go to a
third party to get them.To date, the premier source of these kinds of tools for
Windows has been the SysInternals site, which can be found at www.sysinternals.
com. In particular, the tools of interest are FileMon, RegMon, and if you’re
using NT, HandleEx.You’ll learn more about these tools in Chapter 5. All you
need to know here is that these tools will allow you to monitor a running program
(or programs) to see what files are being accessed, whether a program is
reading or writing, where in the file it is, and what other files it’s looking for.
That’s the FileMon piece. RegMon allows you to monitor much the same for the
Windows Registry; what keys the program is accessing, modifying, reading,
looking for, etc. HandleEx shows similar information on NT, but is organized in
a slightly different manner. Its output is organized by process, file handle, and
what the file handle is pointing to.
www.syngress.com
VB Decompilers
A fair amount of the code in the world is written in Visual Basic (VB). This
includes both malicious code and regular programs. VB presents a special
challenge to someone wanting to reverse engineer compiled code
written in that language. The last publicly-available VB decompiler only
works up through VB3. Starting in VB5, parts of a compiled VB program
will be “native code” (regular Windows calls), and parts of it will be “pcode”,
which is a bytecode, similar in concept to that to which Java compiles.
The Visual Basic DLL contains an interpreter for this code. The
problem is, there is very little documentation available as to what codes
translate to what VB functions in a compiled program. You could always
decompile the VB DLL, and make your own map, but that would be a
massive undertaking.
The main response to the problem by the underground has been to
use debugging techniques instead. However, this group of people has a
different goal in mind, mainly cracking copy protection mechanisms.
Thus, the information available in those areas is not always directly
applicable to the problem at hand. Most of the public work done in
those areas involves stepping through the code in order to find a section
that checks for a serial number, for example, and disables portions of the
program that don’t check out. The goal in that case is to install a bypass.
Still, such information is a start for the VB analyst.
Notes from the Underground…
Methodology • Chapter 4 115
As an added bonus, there are free versions of nearly all the SysInternals tools,
and most come with source code! (The SysInternals guys run a companion Web
site named Winternals.com where they sell the non-free tools with a little more
functionality added.) UNIX users won’t find that to be a big deal, but it’s still
pretty uncommon on the Windows side.
Most UNIX distributions come with a set of tools that perform the equivalent
function. According to the Rosetta Stone (a list of what a function is called,
cross-referenced by OS.The Rosetta Stone can be found at
http://bhami.com/rosetta.html), there are a number of tracing programs. Of
course, since this is a pretty low-level function, each tracing tool tends to work
with a limited set of OSes. Examples include trace, strace, ktrace, and truss.The following
example is done on Red Hat Linux, version 6.2, using the strace utility.
What strace (and most of the other trace utilities mentioned) does is show system
(kernel) calls and their parameters.We can learn a lot about how a program
works this way.
Rather than just dump a bunch of raw output into your lap, I’ve inserted
explanatory comments in the output:
[elliptic@ellipse]$ echo hello > test
[elliptic@ellipse]$ strace cat test
execve("/bin/cat", ["cat", "test"], [/* 21 vars */]) = 0
Strace output doesn’t begin until the program execution call is made for cat.
Thus, we don’t see the process the shell went through to find cat. By the time
strace kicks in, it’s been located in /bin.We see cat is started with an argument of
“test,” and a list of 21 environment variables. First item of input: arguments.
Second: environment variables.
brk(0) = 0x804b160
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -
1, 0) = 0x40014000
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
The execve call begins its normal loading process; allocating memory, etc. Note
the return value is –1, which indicates an error.The error interpretation is “No
such file...”; indeed, no such file exists.While not exactly “input,” this makes it
clear that if we were able to drop a file by that name, with the right function
names, into the /etc directory, execve would happily run parts of it for us.That
www.syngress.com
116 Chapter 4 • Methodology
would be really useful if root came by later and ran something. Of course, to be
able to do that, we’d need to be able to drop a new file into /etc, which we can’t
do unless someone has messed up the file system permissions. On most UNIX
systems, the ability to write to /etc, means we can get root access any number of
ways.This is just another reason why regular users shouldn’t be able to write to
/etc. Of course, if we’re going to hide a Trojan horse somewhere (after we’ve
already broken root), this might be a good spot.
open("/etc/ld.so.cache", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=12431, ...}) = 0
old_mmap(NULL, 12431, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000
close(4) = 0
open("/lib/libc.so.6", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0755, st_size=4101324, ...}) = 0
read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\210\212"...,
4096) = 4096
The first 4K of libc is read. Libc is the standard shared library where reside all
the functions that you call when you do C programming (such as printf, scanf, etc.).
old_mmap(NULL, 1001564, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) =
0x40019000
mprotect(0x40106000, 30812, PROT_NONE) = 0
old_mmap(0x40106000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED,
4, 0xec000) = 0x40106000
old_mmap(0x4010a000, 14428, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x4010a000
close(4) = 0
mprotect(0x40019000, 970752, PROT_READ|PROT_WRITE) = 0
mprotect(0x40019000, 970752, PROT_READ|PROT_EXEC) = 0
munmap(0x40015000, 12431) = 0
personality(PER_LINUX) = 0
getpid() = 9271
brk(0) = 0x804b160
brk(0x804b198) = 0x804b198
brk(0x804c000) = 0x804c000
open("/usr/share/locale/locale.alias", O_RDONLY) = 4
www.syngress.com
Methodology • Chapter 4 117
fstat64(0x4, 0xbfffb79c) = -1 ENOSYS (Function not
implemented)
fstat(4, {st_mode=S_IFREG|0644, st_size=2265, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -
1, 0) = 0x40015000
read(4, "# Locale name alias data base.\n#"..., 4096) = 2265
read(4, "", 4096) = 0
close(4) = 0
munmap(0x40015000, 4096) = 0
When programs contain a setlocale function call, libc reads the locale information
to determine the correct way to display numbers, dates, times, etc.Again,
permissions are such that you can’t modify the locale files without root access,
but it’s still something to watch for. Notice that the file permissions are conveniently
printed in each fstat call (that’s the 0644 above, for example).This makes
it easy to visually watch for bad permissions. If you do find a locale file to which
you can write, you might be able to cause a buffer overflow in libc.Third (indirect)
item of input: locale files.
open("/usr/share/i18n/locale.alias", O_RDONLY) = -1 ENOENT (No such file
or directory)
open("/usr/share/locale/en_US/LC_MESSAGES", O_RDONLY) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
close(4) = 0
open("/usr/share/locale/en_US/LC_MESSAGES/SYS_LC_MES
SAGES", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=44, ...}) = 0
old_mmap(NULL, 44, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000
close(4) = 0
open("/usr/share/locale/en_US/LC_MONETARY", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
old_mmap(NULL, 93, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40016000
close(4) = 0
open("/usr/share/locale/en_US/LC_COLLATE", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=29970, ...}) = 0
old_mmap(NULL, 29970, PROT_READ, MAP_PRIVATE, 4, 0) = 0x4010e000
close(4) = 0
www.syngress.com
118 Chapter 4 • Methodology
brk(0x804d000) = 0x804d000
open("/usr/share/locale/en_US/LC_TIME", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=508, ...}) = 0
old_mmap(NULL, 508, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40017000
close(4) = 0
open("/usr/share/locale/en_US/LC_NUMERIC", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=27, ...}) = 0
old_mmap(NULL, 27, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40018000
close(4) = 0
open("/usr/share/locale/en_US/LC_CTYPE", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=87756, ...}) = 0
old_mmap(NULL, 87756, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40116000
close(4) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0
open("test", O_RDONLY|O_LARGEFILE) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
Finally, cat opens our file “test.” Certainly, it counts as input, but we can feel
pretty safe that cat won’t blow up based on anything inside the file, because of
what cat’s function is. In other cases, you would definitely want to count the
input files.
read(4, "hello\n", 512) = 6
write(1, "hello\n", 6) = 6
read(4, "", 512) = 0
close(4) = 0
close(1) = 0
_exit(0) = ?
To finish, cat reads up to 512 bytes from the file (and gets 6) and writes them
to the screen (well, file handle 1, which goes to STDOUT at the time). It then
tries to read up to another 512 bytes of the file, and it gets 0, which is the indicator
that it’s at the end of the file. So, it closes its file handles and exits clean
(exit code of 0 is normal exit).
Naturally, I picked a super-simple example to demonstrate.The cat command
is simple enough that we can easily guess what it does, processing-wise, between
calls. In pseudocode:
www.syngress.com
Methodology • Chapter 4 119
int count, handle
string contents
handle = open (argv[1])
while (count = read (handle, contents, 512))
write (STDOUT, contents, count)
exit (0)
For comparison purposes, here’s the output from truss for the same command
on a Solaris 7 (x86) machine:
execve("/usr/bin/cat", 0x08047E50, 0x08047E5C) argc = 2
open("/dev/zero", O_RDONLY) = 3
mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3,
0) = 0xDFBE1000
xstat(2, "/usr/bin/cat", 0x08047BCC) = 0
sysconfig(_CONFIG_PAGESIZE) = 4096
open("/usr/lib/libc.so.1", O_RDONLY) = 4
fxstat(2, 4, 0x08047A0C) = 0
mmap(0x00000000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) =
0xDFBDF000
mmap(0x00000000, 598016, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) =
0xDFB4C000
mmap(0xDFBD6000, 24392, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 4, 561152) = 0xDFBD6000
mmap(0xDFBDC000, 6356, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 3, 0) = 0xDFBDC000
close(4) = 0
open("/usr/lib/libdl.so.1", O_RDONLY) = 4
fxstat(2, 4, 0x08047A0C) = 0
mmap(0xDFBDF000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 0)
= 0xDFBDF000
close(4) = 0
close(3) = 0
sysi86(SI86FPHW, 0xDFBDD8C0, 0x08047E0C, 0xDFBFCEA0) = 0x00000000
fstat64(1, 0x08047D80) = 0
open64("test", O_RDONLY) = 3
www.syngress.com
120 Chapter 4 • Methodology
fstat64(3, 0x08047CF0) = 0
llseek(3, 0, SEEK_CUR) = 0
mmap64(0x00000000, 6, PROT_READ, MAP_SHARED, 3, 0) = 0xDFB4A000
read(3, " h", 1) = 1
memcntl(0xDFB4A000, 6, MC_ADVISE, 0x0002, 0, 0) = 0
write(1, " h e l l o\n", 6) = 6
llseek(3, 6, SEEK_SET) = 6
munmap(0xDFB4A000, 6) = 0
llseek(3, 0, SEEK_CUR) = 6
close(3) = 0
close(1) = 0
llseek(0, 0, SEEK_CUR) = 296569
_exit(0)
Based on the bit at the end, we can infer that the Solaris cat command works
a little differently; it appears that it uses a memory-mapped file to pass a memory
range straight to a write call. An experiment (not shown here) with a larger file
showed that it would do the memorymap/write pair in a loop, handling 256K
bytes at a time.
The point of showing these traces was not to learn how to use the trace tools
(that would take several chapters to describe properly, though it is worth
learning). Rather, it was to demonstrate the kinds of things you can learn by
asking the operating system to tell you what it’s up to.
For a more involved program, you’d be looking for things like fixed-name
/tmp files, reading from files writeable by anyone, any exec calls, and so on.
Disassemblers, Decompilers, and Debuggers
Drilling down to attacks on the binary code itself is the next stop.A debugger is a
piece of software that will take control of another program and allow things like
stopping at certain points in the execution, changing variables, and even changing
the machine code on the fly in some cases. However, the debugger’s ability to do
this may depend on whether the symbol table is attached to the executable (for
most binary-only files, it won’t be). Under those circumstances, the debugger may
be able to do some functions, but you may have to do a lot of manual work, like
setting breakpoints on memory addresses rather than function names.
A decompiler (also called a disassembler) is a program that takes binary code and
turns it into some higher-level language, often assembly language. Some can do
www.syngress.com
Methodology • Chapter 4 121
rudimentary C code, but the code ends up being pretty rough. A decompiler
attempts to deduce some of the original source code from the binary (object)
code, but a lot of information that programmers rely on during development is
lost during the compilation process; for example, variable names. Often, a decompiler
can only name variables with non-useful numeric names while decompiling
unless the symbol tables are present.
The problem more or less boils down to you having to be able to read
assembly code in order for a decompiler to be useful to you. Having said that,
let’s take a look at an example of what a decompiler produces.
One commercial decompiler for Windows that has a good reputation is IDA
Pro, from DataRescue (shown in Figure 4.1). IDA Pro is capable of decompiling
code for a large number of processor families, including the Java Virtual Machine.
Here, we’ve used IDA Pro to disassemble mspaint.exe (Paintbrush).We’ve
scrolled to the section where IDA Pro has identified the external functions upon
www.syngress.com
Figure 4.1 IDA Pro in Action
122 Chapter 4 • Methodology
which mspaint.exe calls. For OSes that support shared libraries (like Windows and
all the modern UNIXs), an executable program has to keep a list of the libraries
it will need.This list is usually human readable if you look inside the binary file.
The OS needs this list of libraries so it can load them for the program’s use.
Decompilers take advantage of this, and are able to insert the names into the
code in most cases, to make it easier for people to read.
We don’t have the symbol table for mspaint.exe, so most of this file is
unnamed assembly code.
If you want to try out IDA Pro for yourself, a limited trial version of IDA Pro
is available for download at www.datarescue.com/idabase/ida.htm. Another very
popular debugger is the SoftICE debugger from Numega. Information about
softICE can be found at http://www.compuware.com/products/nu...rivercentral/.
To contrast, I’ve prepared a short C program (the classic “Hello World”) that
I’ve compiled with symbols, to use with the GNU Debugger (GDB). Here’s the
C code:
#include <stdio.h>
int main ()
{
printf ("Hello World\n");
return (0);
}
Then, I compile it with the debugging information turned on (the –g option.):
[elliptic@ellipse]$ gcc -g hello.c -o hello
[elliptic@ellipse]$ ./hello
Hello World
I then run it through GDB. Comments inline:
[elliptic@ellipse]$ gdb hello
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are welcome to change it and/or distribute copies of it under
certain conditions.
www.syngress.com
Methodology • Chapter 4 123
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux"...
(gdb) break main
I set a breakpoint at the main function. As soon as the program enters main,
the execution pauses and I get control.The breakpoint is set before run.
Breakpoint 1 at 0x80483d3: file hello.c, line 5.
(gdb) run
The run command executes our hello program in the debugger.
Starting program: /home/ryan/hello
Breakpoint 1, main () at hello.c:5
5 printf ("Hello World\n");
(gdb) disassemble
Now that we have reached the breakpoint we set up during the execution of
the debugging session, we issue the disassemble command to display some further
information about the program.
Dump of assembler code for function main:
0x80483d0 <main>: push %ebp
0x80483d1 <main+1>: mov %esp,%ebp
0x80483d3 <main+3>: push $0x8048440
0x80483d8 <main+8>: call 0x8048308 <printf>
0x80483dd <main+13>: add $0x4,%esp
0x80483e0 <main+16>: xor %eax,%eax
0x80483e2 <main+18>: jmp 0x80483e4 <main+20>
0x80483e4 <main+20>: leave
0x80483e5 <main+21>: ret
End of assembler dump.
This is what “hello world” looks like in x86 Linux assembly. Examining your
own programs in a debugger is a good way to get used to disassembly listings.
(gdb) s
www.syngress.com
124 Chapter 4 • Methodology
printf (format=0x8048440 "Hello World\n") at printf.c:30
printf.c: No such file or directory.
I then “step” (s command) to the next command, which is the printf call. GDB
indicates that it doesn’t have the printf source code to give any further details.
(gdb) s
31 in printf.c
(gdb) s
Hello World
35 in printf.c
(gdb) c
Continuing.
A couple more steps into printf, and we get our output. I use “continue” (c
command) to tell GDB to keep running the program until it gets to another
breakpoint or finishes.
Program exited normally.
(gdb)
Other related tools include nm and objdump from the GNU binutils collection.
Objdump is a program for manipulating object files. It can be used to display
symbols in an object file, display the headers in an object file, or even disassemble
an object file into assembly code. Nm performs functions similar to objdump,
allowing the user to see the symbols referenced by an object file.
www.syngress.com
Tools Are No Substitutes For Knowledge
Some of the disassembly and debugging tools are fantastic in the
number of features they offer. However, like any tool, they are not perfect.
This is especially true when dealing with malicious code (viruses,
worms, Trojans) or binary exploits. Often the authors of these types of
binary code specifically want to make analysis difficult, and will take
steps to make the tools less functional. For example, the RST Linux virus
checks to see if it is being debugged, and will exit if that is the case. The
same virus modifies the ELF file headers when it infects a file in such a
Tools & Traps…
Continued
Methodology • Chapter 4 125
Black Box Testing
The term black box refers to any component or part of a system whose inner
functions are hidden from the system user.There are no exposed settings or controls;
it just accepts input and produces output. It is not intended to be open or
modified and there are no user serviceable parts inside.
Black box testing can be likened to binary auditing. Both types of auditing
require dealing with binary data. Black boxes, however, appear with varying
degrees of transparency.We recognize two different classes of problems with
which we may be presented: black box, and obsidian box. Of course, these are conceptual
boxes rather than physical objects.The type of box refers to our level of
visibility into the workings of the system we want to attack.
Naturally, the very idea of a black box is an anathema to most hackers. How
could you have a box that performs some neat function, and not want to know
how it does it? We will be discussing ideas on how to attack a true black box, but
in reality we will be spending most of our energy trying to pry the lid off.
www.syngress.com
way as to make some disassemblers unable to access the virus portion
of the binary directly. (Specifically, there is no declared code segment for
the virus code, but it gets loaded along with the previous segment, and
will still execute.) It’s very common for a piece of malicious code to be
somewhat protected with encryption or compression. The Code Red
worms existed in the wild only as half overflow string/half code,
meaning that none of the standard file headers were present.
All of the above means that you will still need to know how to do
things manually if need be. You will need to be able to tell from examining
a file header that portions have been modified, and how to interpret
the changes. You may need to be able to perform several iterations
of code analysis for encrypted code. You will have to analyze the decryption
routine, replicate the code that does the work, and then analyze the
results.
You may not only have to be able to read assembly language, but
be able to write it in order to copy a decryption or decompression function.
Writing assembly code is generally harder than reading it.
This is not to indicate that the tools are useless. Far from it. You may
hit a stumbling block for which the tool is inadequate, but once past it,
you will want to plug the results right back into the tool and continue
from there. Besides, sometimes using the tools is the best way to learn
how things work in the first place.
126 Chapter 4 • Methodology
Chips
Imagine you have a piece of electronics gear that you would like to reverse engineer.
Most equipment of that type nowadays would be built mostly around integrated
circuits (ICs) of some kind. In our hypothetical situation, you open the
device, and indeed, you see an IC package as expected, but the identifying marks
have been sanded off! You pull the mystery chip out of its socket and try to
determine which chip it is.
Unknown ICs are a good example of a real-life black box (they’re even
black).Without the markings, you may have a lot of difficulty determining what
kind of chip it is.
What can you tell from a visual inspection? You can tell it has 16 pins, and
that’s about it. If you examine the circuit board it came out of, and start visually
following the traces in the board, you can probably pretty easily determine the
pins to which the power goes, and that can be verified with a volt meter.
Guessing which pins take power (and how much) can be fun, because if you get
it wrong, you can actually fry the chip.
Beyond that, you’ll probably have to try to make inferences based on any
other components in the gadget.You can start to make a list of components that
attach to the chip, and to which pins they attach. For example, perhaps two of the
pins eventually connect to a light emitting diode (LED).
If it turns out that the chip is a simple Transistor-to-Transistor Logic (TTL)
device, you might be able to deduce simple logic functions by applying the
equivalent of true-and-false signals to various pins and measuring for output on
other pins. If you could deduce, for example, that the chip was simply a bunch of
NAND (not-and) gates, you could take that information, go to a chip catalog,
and figure out pretty quickly which chip (or equivalent) you have.
On the other hand, the chip could turn out to be something as complex as a
small microprocessor or an entire embedded system. If it were the latter case, there
would be far, far too many combinations of inputs and outputs for a trial-and-error
map. For an embedded system, there will probably also be analog components (for
example, a speaker driver) that will frustrate any efforts to map binary logic.
For an example of a small computer on a chip of this sort, go to
http://www.parallaxinc.com/html_file...dule_bs2p.asp.
Parallax produces a family of chips that have built-in BASIC interpreters, as
well as various combinations of input and output mechanisms.The underlying
problem with such a complex device is that the device in question has way more
states than you could possibly enumerate. Even a tiny computer with a very small
www.syngress.com
Methodology • Chapter 4 127
amount of memory can produce an infinite amount of nonrepeating output. For
a simple example, imagine a single-chip computer that can do addition on huge
integers. All it has to do is run a simple program that adds 1 to the number each
time and outputs that for any input you give it.You’d probably pretty quickly
infer that there was a simple addition program going on, but you wouldn’t be
able to infer any other capabilities of the chip.You wouldn’t be able to tell if it
was a general-purpose programmable computer, or if it was hardware designed to
do just the one function.
Some folks have taken advantage of the fact that special sequences are very
unlikely to be found in black boxes, either by accident or when actively looked
for. All the person hiding a sequence has to do is make sure the space of possibilities
is sufficiently large to hide his special sequence. For a concrete example, read
the following article: http://www.casinoguru.com/features/0...9_tocatch.htm.
It tells of a slot machine technician who replaced the chip in some slot machines,
so that they would pay a jackpot every time a particular sequence of coins was put
in the machine, and the handle pulled.Talk about the ultimate Easter egg!
So, if you can’t guess or infer from the information and experiments available
to you what this chip does, what do you do? You open it! Open a chip? Sure.
Researchers of “tamper-proof ” packaging for things like smart cards have done
any number of experiments on these types of packages, including using acid to
burn off the packaging, and examining the chip layout under a microscope.We’ll
cover this kind of hardware hacking in Chapter 14.
So, as indicated before, our response to being frustrated at not being able to
guess the internals of a black box is to rip it open. An analogy can be found in
this author’s experiences visiting Arizona’s obsidian mines—held at arms length,
obsidian looks like a black rock. However, if held up to a bright light one can see
the light through the stone.There are no truly “black boxes,” but rather, they are
“obsidian boxes” that permit varying degrees of vision into them. In other words,
you always have some way to gain information about the problem you’re trying
to tackle.
www.syngress.com
128 Chapter 4 • Methodology
Summary
Vulnerability research methodologies are the commonly used principles of
auditing systems for vulnerabilities.The process of source code research begins
with searching the source code for error-prone directives such as strcpy and sprintf.
Another method is the line-by-line review of source code by the person auditing
the program, which is a comprehensive audit of the program through all of its
execution sequences. Discovery through difference is another method, using the
diff utility on different versions of the same software to yield information about
security fixes.The method of undertaking binary research can involve various
utilities such as tracing tools, debuggers, guideline-based auditing, and sniffers.
An auditing source code review involves the search for error-prone functions
and line-by-line auditing methodologies. In this chapter, we looked at an
example of an exploitable buffer overflow using strcpy, an example using sprintf, an
example using strcat, and an example using gets.We dissected input validations
bugs, such as a format string vulnerability using printf, and a open function
written in Perl.We also examined a race condition vulnerability in the mktemp
function.
Reverse engineering is one of the most commonly used and accurate
methods of finding vulnerabilities in a closed-source program.This type of
research is performed from the top-down.Windows auditing tools are available
from sysinternals.com, and using the Rosetta Stone list to map system calls across
platforms. In this chapter, we traced the execution of the cat program, first on a
Red Hat Linux system, then a Solaris 7 system.
Disassemblers, and debuggers drill down into binary code.A disassembler
(also known as a decompiler) is a program that takes binary code and turns it
into a higher-level language like assembly.A debugger is a program that can control
the execution of another program. In this chapter, we examined the output
of disassembly on the Windows platform using IDA Pro, then performed a
debugging session with GDB on a Linux system.We also discussed objdump, a
program used to manipulate object files; and nm, a program that displays the
symbol information contained in object files.
A black box is a (conceptual) component whose inner functions are hidden
from the user; black box testing is similar to binary auditing, in that it involves
reverse-engineering integrated circuits. One may also identify a chip by deduction
of output, or by literally ripping it open to examine it. Black boxes have
varying degrees of transparency.
www.syngress.com
Methodology • Chapter 4 129
Solutions Fast Track
Understanding Vulnerability Research Methodologies
Source research and review is the most ideal vulnerability research
methodology.
Source research is often conducted through searching for error-prone
directives, line-by-line review, and discovery through difference.
Binary research is often performed through tracing binaries, debuggers,
guideline-based auditing, and sniffers.
The Importance of Source Code Review
Source review is a necessary part of ensuring secure programs.
Searching for error-prone directives in source can yield buffer overflows,
input validation bugs, and race conditions.
The grep utility can be used to make the searching of error-prone
directives efficient.
Reverse Engineering Techniques
Freely available auditing tools for Windows are available from
www.sysinternals.com.
The Rosetta Stone (at http://bhami.com/rosetta.html) can be used to
map system utilities across platforms.
Debuggers can be used to control the execution of a program, and find
problem sections of code.
Black Box Testing
Black box testing is the process of discovering the internals of a
component that is hidden from the naked eye.
Ripping open a black box is the easiest way to determine the internals.
There are no true black boxes. Most allow varying degrees of
transparency.
www.syngress.com
130 Chapter 4 • Methodology
Q: What is the best method of researching vulnerabilities?
A: This question can only yield a subjective answer.The best methods a
researcher can use are the ones he or she is most comfortable with, and are
most productive for the research.The recommended approach is to experiment
with various methods, and organization schemes.
Q: Is decompiling and other reverse engineering legal?
A: In the United States, reverse engineering may soon be illegal.The Digital
Millennium Copyright Act includes a provision designed to prevent the circumvention
of technological measures that control access to copyrighted
works. Source code can be copyrighted, and therefore makes the reverse
engineering of copyrighted code illegal.
Q: Are there any tools to help with more complicated source code review?
A: Tools such as SCCS and CVS may make source review easier. Additionally,
integrated development environments (IDEs) may also make source review an
easier task.
Q: Where can I learn about safe programming?
A: A couple different resources one may use are the Secure UNIX Programming
FAQ at www.whitefang.com/sup/secure-faq.html, or the secprog mailing list
moderated by Oliver Friedrichs.
Q: Where can I download the source to these example programs?
A: The source is available at www.syngress.com/solutions.

2004-07-16, 04:31 PM	#1
mic64 註冊會員榮譽勳章勳章總數2 UID - 582 在線等級: 註冊日期: 2002-12-06 VIP期限: 2007-04 住址: MIB總部文章: 412 精華: 0 現金: 499 金幣資產: 499 金幣	Hack Proofing Your Network-4 Chapter 4 99 Summary Solutions Fast Track Frequently Asked Questions 100 Chapter 4 • Methodology Introduction There are several ways to approach any problem; and which approach you choose usually depends on the resources available to you and the methodology with which you are most comfortable. In the case of vulnerability research challenges, the resources may be code, time, or tools. In some cases, you may be dealing with a software program for which the source code is readily available. For many people, reading the source code may be the easiest way for them to determine whether or not there are vulnerabilities; many vulnerabilities are tied to particular language functions or ways of calling external functions.The source code often gives the clearest picture of how this happens in a given program. Another method of determining how a program works, and therefore whether there are holes, is reverse engineering, which may require special tools, such as disassemblers and debuggers. Since much is lost in the translation from source code to object code, it can often be more difficult to determine exactly what is happening in reverse engineered code. The last method is black box testing. Black box testing allows only for the manipulation of the inputs and the viewing of a given system outputs, without the internals being known. In some cases (such as attempting to penetrate a remote system), black box testing may be the only method initially available. In other cases, it may be used to help chose where to focus further efforts. In this chapter, we cover the various methodologies used for vulnerability research, with examples for each method. Understanding Vulnerability Research Methodologies Let us break down vulnerability research methodologies using easily understood terms. A vulnerability is a problem, either exploitable or not, in anything from a microcontroller to a supercomputer. Research is the process of gathering information that may or may not lead to the discovery of a vulnerability. Methodologies are the commonly used, recommended, or widely accepted methods of vulnerability research. Vulnerability research methods are fundamentally the same everywhere. From the security enthusiast at home to the corporate code auditor, the methods and tools are the same. Methods ranging from lucky guesses to the scientific method and tools ranging from hex editors to code disassemblers are applied in everyday www.syngress.com www.syngress.com practice. Some of these methods can appear to be chaotic, while some present themselves as more detail-oriented and organized. Less experienced researchers might prefer a more organized approach to vulnerability research, whereas seasoned researchers with programming experience may rely more on instinct.The choice of methods tends to be a matter of personal preference. It should also be mentioned that different data types require different research methods. Handling binary data requires a very different approach than handling source code, so let’s examine these approaches separately. NOTE There are a number of different organization schemes used by researchers in the security community when researching vulnerabilities. These methods are varied; some individuals or groups rely on methodical, organized, militant audits of programs, performed on a piece-bypiece basis whereas others use methods with the consistency and organization of white noise. Organization is subjective, and best suited to a researcher’s taste. It is worth mentioning that a number of vulnerability tracking and software audit tracking packages are freely available; some packages are no more complex than a Web CGI and SQL Database, while others, such as Bugzilla, offer a number of features such as user accounts, bug ID numbers and tracking, and nice interfaces. Source Code Research Source code research entails obtaining the source of the program in its proverbial “potential energy” state.The program source may be written in one of any number of languages such as C, Perl, Java, C++,ASP, PHP, or the like. Source code research is typically first begun by searching for error-prone functions. Searching For Error-Prone Functions Source is audited in a number of ways.The first method is to use searching utilities to discover the use of certain error-prone functions in the source code.These functions may be searched for via the use of utilities such as grep. Some functions that may be researched are strcpy and sprintf.These C functions are habitually misused or exploited to perform nefarious activities.The use Methodology • Chapter 4 101 102 Chapter 4 • Methodology of these functions can often result in buffer overflows due to lack of bounds checking. Other functions, such as mktemp, may result in exploitable race conditions and the overwriting of files, or elevated privileges. Line-By-Line Review The next source code review method is a line-by-line review. Line-by-line reviews involve following the program through execution sequences.This is a more in-depth look at the program, which requires spending time to get familiar with all parts of the program. This type of research usually involves a person following the source through hypothetical execution sequences. Hypothetical execution sequences use a combination of different options supported by the program with varying input.The execution of the program is traced visually, with the researcher mentally tracking the various data passing through functions as they are handled by the program. Discovery Through Difference Discovery through difference is another method used to determine a package’s vulnerabilities. This type of research is performed when a vendor fixes a vulnerability in a software package, but doesn’t release details about the problem.This method is determines whether a file has been altered, and if so, which parts of the file have been altered from one release to the next. One of the most important utilities used in this type of research is diff. Diff is distributed with most UNIX operating systems, and is also available for a wide variety of other platforms through such groups as the Free Software Foundation. Diff compares two data samples, and displays any differences encountered.This program can be used on source files to output the exact differences between the source bases. The method of discovery through difference is usually performed to determine the nature and mode of a vulnerability about which the vendor has released few details. For example, software update announcements made by Freshmeat often include vague details about updates to a package that “may affect security,” such as a recent vulnerability discovered in the axspawn program. The vulnerability patch was announced as a security update for a potential buffer overflow. However, no other details were given about the vulnerability. Upon downloading the 0.2.1 and 0.2.1a versions of the packages, and using the diff utility to compare them, the problem became apparent: www.syngress.com Methodology • Chapter 4 103 elliptic@ellipse:~$ diff axspawn-0.2.1/axspawn.c axspawn- 0.2.1a/axspawn.c 491c491 < envc = 0; --- > envc = 0; 493c493 < sprintf(envp[envc++], "AXCALL=%s", call); --- > sprintf(envp[envc++], "AXCALL=%.22s", call); 495c495 < sprintf(envp[envc++], "CALL=%s", (char )user); --- > sprintf(envp[envc++], "CALL=%.24s", (char )user); 497c497 < sprintf(envp[envc++], "PROTOCOL=%s", protocol); --- > sprintf(envp[envc++], "PROTOCOL=%.20s", protocol); 500c500 < envp[envc] = NULL; --- > envp[envc] = NULL; As we can see, the first version of axspawn.c uses sprintf without any restrictions on the data length. In the second version, the data is length-restricted by adding format length specifiers. In some situations, the vendor may already do this work for us by releasing a patch that is a diff between the two source bases.This is usually the case with BSD-based operating systems such as FreeBSD.A vulnerability in the FreeBSD package tools during January of 2002 was discovered that could allow a user to extract data into a temporary directory and alter it.While this information was disclosed via the full disclosure method, the patch distributed for pkg_add tells us exactly where the vulnerability is at: --- usr.sbin/pkg_install/lib/pen.c 17 May 2001 12:33:39 -0000 +++ usr.sbin/pkg_install/lib/pen.c 7 Dec 2001 20:58:46 -0000 @@ -106,7 +106,7 @@ www.syngress.com 104 Chapter 4 • Methodology cleanup(0); errx(2, __FUNCTION__ ": can't mktemp '%s'", pen); } - if (chmod(pen, 0755) == FAIL) { + if (chmod(pen, 0700) == FAIL) { cleanup(0); errx(2, __FUNCTION__ ": can't mkdir '%s'", pen); } The sections of source being removed by the patch are denoted with a minus sign, while the plus sign denotes added sections.As we can see, the section of source that created the directory with permissions of 0755 is being replaced with a section that creates the directory with permissions of 0700. Research may not always be this easy—that said, let’s take a look at researching binary-only software. Binary Research While auditing source is the first-choice method of vulnerability research, binary research is often the only method we are left with.With the advent of the GNU License and open source movements, the option of obtaining the source code is more feasible, but not all vendors have embraced the movement. As such, a great many software packages remain closed-source. Tracing Binaries One method used to spot potential vulnerabilities is tracing the execution of the program.Various tools can be used to perform this task. Sun packages the truss program with Solaris for this purpose. Other operating systems include their own versions, such as strace for Linux. Tracing a program involves watching the program as it interacts with the operating system. Environment variables polled by the program can be revealed with flags used by the trace program. Additionally, the trace reveals memory addresses used by the program, along with other information.Tracing a program through its execution can yield information about problems at certain points of execution in the program. The use of tracing can help determine when and where in a given program a vulnerability occurs. www.syngress.com Methodology • Chapter 4 105 Debuggers Debuggers are another method of researching vulnerabilities within a program. Debuggers can be used to find problems within a program while it runs.There are various implementations of debuggers available.One of the more commonly used is the GNU Debugger, or GDB. Debuggers can be used to control the flow of a program as it executes.With a debugger, the whole of the program may be executed, or just certain parts.A debugger can display information such as registers, memory addresses, and other valuable information that can lead to finding an exploitable problem. Guideline-Based Auditing Another method of auditing binaries is by using established design documents (which should not be confused with source code). Design documents are typically engineering diagrams or information sheets, or specifications such as a Request For Comments (RFC). Researching a program through a protocol specification can lead to a number of different conclusions.This type of research can not only lead to determining the compliance of a software package with design specifications, it can also detail options within the program that may yield problems. By examining the foundation of a protocol such as Telnet or POP3, it is possible to test services against these protocols to determine their compliance. Also, applying known types of attacks (such as buffer overflows or format string attacks) to certain parts of the protocol implementation could lead to exploitation. Sniffers One final method we will mention is the use of sniffers as vulnerability research tools. Sniffers can be applied to networks as troubleshooting mechanisms or debugging tools. However, sniffers may also be used for a different purpose. Sniffers can be used monitor interactivity between systems and users.This can allow the graphing of trends that occur in packages, such as the generation of sequence numbers. It may also allow the monitoring of infrastructures like Common Gateway Interface, to determine the purpose of different CGIs, and gather information about how they may be made to misbehave. Sniffers work hand-in-hand with our previously mentioned Guideline-based auditing. Sniffers may also be used in the research of Web interfaces, or other network protocols which are not necessarily specified by any sort of public standard, but are commonly used. www.syngress.com 106 Chapter 4 • Methodology The Importance of Source Code Reviews Auditing source should be a part of any service deployment process.The act of auditing source involves searching for error-prone functions and using line-byline auditing methodologies. Often, problems are obscured by the fact that a given application’s source code may span multiple files.While the code of some applications may be contained in a single source file, the source code of applications such as mail transport agents,Web servers, and the like span several source files, header files, make files, and directories. Searching Error-Prone Functions Let us dig into the process of searching for error-prone functions.This type of search can be performed using a few different methods. One way is to use an editor and search for error-prone functions by opening each file and using the editor’s search function.This is time consuming.The more expedient and effi- cient method involves using the grep utility. Let’s look at a few rudimentary examples of problems we may find in source code, that include the above-mentioned functions. Buffer Overflows A buffer overflow, also known as a boundary condition error, occurs when an amount greater than storage set aside for the data is placed in memory. Elias Levy, also known as Aleph1, wrote an article about this, titled “Smashing the Stack for Fun and Profit.” It is available in Phrack issue 49, article number 14. Observe the following program: /* scpybufo.c / / Hal Flynn <mrhal@mrhal.com> / / December 31, 2001 / / scpybufo.c demonstrates the problem / / with the strcpy() function which / / is part of the c library. This / / program demonstrates strcpy not / / sufficiently checking input. When / / executed with an 8 byte argument, a / / buffer overflow occurs / www.syngress.com Methodology • Chapter 4 107 #include <stdio.h> #include <strings.h> int main(int argc, char argv[]) { overflow_function(++argv); return (0); } void overflow_function(char b) { char c[8]; strcpy(c, b); return; } In this C program, we can see the use of the strcpy function. Data is taken from argv[1], then copied into a character array of 8 bytes with the strcpy function. Since no size checking is performed on either variable, the 8-byte boundary of the second variable can be overrun, which results in a buffer overflow. Another commonly encountered error-prone function is sprintf.The sprintf function is another source of habitual buffer overflow problems. Observe the following code: /* sprbufo.c / / Hal Flynn <mrhal@mrhal.com> / / December 31, 2001 / / sprbufo.c demonstrates the problem / / with the sprintf() function which / / is part of the c library. This / / program demonstrates sprintf not / / sufficiently checking input. When / / executed with an argument of 8 bytes / / or more a buffer overflow occurs. / www.syngress.com 108 Chapter 4 • Methodology #include <stdio.h> int main(int argc, char argv[]) { overflow_function(++argv); return (0); } void overflow_function(char b) { char c[8]; sprintf(c, "%s", b); return; } As in the previous example, we have an array taken from argv[1] being copied to an array of 8 bytes of data.There is no check performed to ensure that the amount of data being copied between the arrays will actually fit, thus resulting in a potential buffer overflow. Similar to the strcpy function is strcat.A common programming error is the use of the strcat function without first checking the size of the array.This can be seen in the following example: /* scatbufo.c / / Hal Flynn <mrhal@mrhal.com> / / December 31, 2001 / / scatbufo.c demonstrates the problem / / with the strcat() function which / / is part of the c library. This / / program demonstrates strcat not / / sufficiently checking input. When / / executed with a 7 byte argument, a / / buffer overflow occurs. / www.syngress.com Methodology • Chapter 4 109 #include <stdio.h> #include <strings.h> int main(int argc, char argv[]) { overflow_function(++argv); return (0); } void overflow_function(char b) { char c[8] = "0"; strcat(c, b); return; } Data passed from argv[1] to the overflow_function.The data is then concatenated onto c, an 8-byte character array. Since the size of the data in argv[1] is not checked, the boundary of c may be overrun. The gets function is another problematic function in C.The GNU C Compiler will produce a warning message when it compiles code using the gets function. Gets does not perform checks on the amount of input received by a user. Observe the following code: /* getsbufo.c / / Hal Flynn <mrhal@mrhal.com> / / December 31, 2001 / / This program demonstrates how NOT / / to use the gets() function. gets() / / does not sufficient check input / / length, and can result in serious / / problems such as buffer overflows / www.syngress.com 110 Chapter 4 • Methodology #include <stdio.h> int main() { get_input(); return (0); } void get_input(void) { char c[8]; printf("Enter a string greater than seven bytes: "); gets(c); return; } We can see the use of the gets function.When called, it places the data in the c character array. However, since this array is only 8 bytes in length, and gets does not perform proper checking of input, it is easily overflowed. For additional in-depth information on buffer overflows please refer to Chapter 8. Input Validation Bugs Another common programming problem is the lack of input validation by the program.The lack of input validation can allow a user to exploit programs such as setuid executables or Web applications such as CGIs, causing them to misbehave by passing various types of data to them. This type of problem can result in format string vulnerabilities.A format string vulnerability consists of passing several string specifiers such as %i%i%i%i or %n%n%n%n to a program and possibly resulting in code execution. Format strings are covered in depth in Chapter 9. www.syngress.com Methodology • Chapter 4 111 Rather than covering them in depth, we will provide an example of a format string vulnerability in code. Observe the following: / fmtstr.c / / Hal Flynn <mrhal@mrhal.com> / / December 31, 2001 / / fmtstr.c demonstrates a format / / string vulnerability. By supplying / / format specifiers as arguments, / / attackers may read or write to / / memory. / #include <stdio.h> int main(int argc, char argv[]) { printf(++argv); return (0); } By running the above program with a string of %n format specifiers, a user could print to arbitrary locations in memory. If this were a setuid root executable, this could be exploited to execute code with root privileges. Lack of input validation by Web applications such as CGIs is another commonly occurring problem. Often, poorly written CGIs (especially those written in Perl) permit the escaping of commands by encapsulating them in special characters. This can allow one to execute arbitrary commands on a system with the privileges of the Web user.The problem could be exploited to carry out commands such as removing the index.html, if that file is owned and write-accessible by the HTTP process. It could even result in a user binding a shell to an arbitrary port on the system, gaining local access with the permissions of the HTTP process. This type of problem could also result in a user being able to execute arbitrary SQL commands. CGI is commonly used to facilitate communication between a Web front-end and an SQL database back-end, such as Oracle, MySQL, or Microsoft SQL Server. A user who is able to execute arbitrary SQL www.syngress.com 112 Chapter 4 • Methodology commands could view arbitrary tables, perform functions within the database, and potentially even drop tables. Observe the following open: #!/usr/bin/perl open("ls $ARGV[0] \|"); This function does not check the input from $ARGV[0].The intended directory may be escaped by supplying dot-dot (..) specifiers to the command, which could list the directory above, and potentially reveal sensitive information.A deeper discussion of input validation bugs is available in Chapter 7. Race Conditions Race conditions are a commonly occurring programming error that can result in some serious implications. A race condition can be defined as a situation where one can beat a program to a certain event.This can be anything from the locking of memory to prevent another process from altering the data in a shared segment scenario, to the creation of a file within the file system. A common programming problem is the use of the mktemp function. Let’s look at the following program: / mtmprace.c / / Hal Flynn <mrhal@mrhal.com> / / mtmprace.c creates a file in the / / temporary directory that can be / / easily guessed, and exploited / / through a symbolic link attack. / #include <stdio.h> #include <stdlib.h> int main() { char example; char outfile; char ex[] = "/tmp/exampleXXXXXX"; example = ex; www.syngress.com Methodology • Chapter 4 113 mktemp(example); outfile = fopen(example, "w"); return (0); } This program will, on some operating systems, create a file in the temporary directory that consists of a predetermined name (it’s called example in the above source) and ending in six characters, the first five being the process ID, and the final being a letter.The first problem in this program is that a race occurs between the check for the existence of the file name and the creation of the file. Additionally, the name can be easily guessed as the process ID can be predicted. Therefore, the maximum amount of names the file could use is limited by the English alphabet, totaling 26 variations.This could result in a symbolic link attack.To determine whether or not an operating system is using a vulnerable implementation, examine the files created by this program in the /tmp directory. By using a utility such as grep, we can investigate large amounts of code for common problems. Does this still ensure we are safe from vulnerabilities? No. It does, however, help us find and eliminate the larger part of the programming problems encountered in programs.The only sure method that one can use to ensure a secure piece of software is to have multiple parties perform a line-byline audit. And even then, the security of the software can only be considered “high,” and not totally secure. Reverse Engineering Techniques Reverse engineering programs are one of the most commonly used and accurate methods of finding vulnerabilities in a closed-source program. Reverse engineering can be performed with a number of different tools, varying by operating system and personal taste. However, the methods used to reverse engineer are similar in most instances. Generally, you will want to start at a high level and work your way down. In most cases, this will mean starting with some system monitoring tools to determine what kinds of files and other resources the program accesses. (A notable exception is if the program is primarily a network program, in which case you may want to skip straight to packet sniffing.) www.syngress.com 114 Chapter 4 • Methodology Windows doesn’t come with any tools of this sort, so we have to go to a third party to get them.To date, the premier source of these kinds of tools for Windows has been the SysInternals site, which can be found at www.sysinternals. com. In particular, the tools of interest are FileMon, RegMon, and if you’re using NT, HandleEx.You’ll learn more about these tools in Chapter 5. All you need to know here is that these tools will allow you to monitor a running program (or programs) to see what files are being accessed, whether a program is reading or writing, where in the file it is, and what other files it’s looking for. That’s the FileMon piece. RegMon allows you to monitor much the same for the Windows Registry; what keys the program is accessing, modifying, reading, looking for, etc. HandleEx shows similar information on NT, but is organized in a slightly different manner. Its output is organized by process, file handle, and what the file handle is pointing to. www.syngress.com VB Decompilers A fair amount of the code in the world is written in Visual Basic (VB). This includes both malicious code and regular programs. VB presents a special challenge to someone wanting to reverse engineer compiled code written in that language. The last publicly-available VB decompiler only works up through VB3. Starting in VB5, parts of a compiled VB program will be “native code” (regular Windows calls), and parts of it will be “pcode”, which is a bytecode, similar in concept to that to which Java compiles. The Visual Basic DLL contains an interpreter for this code. The problem is, there is very little documentation available as to what codes translate to what VB functions in a compiled program. You could always decompile the VB DLL, and make your own map, but that would be a massive undertaking. The main response to the problem by the underground has been to use debugging techniques instead. However, this group of people has a different goal in mind, mainly cracking copy protection mechanisms. Thus, the information available in those areas is not always directly applicable to the problem at hand. Most of the public work done in those areas involves stepping through the code in order to find a section that checks for a serial number, for example, and disables portions of the program that don’t check out. The goal in that case is to install a bypass. Still, such information is a start for the VB analyst. Notes from the Underground… Methodology • Chapter 4 115 As an added bonus, there are free versions of nearly all the SysInternals tools, and most come with source code! (The SysInternals guys run a companion Web site named Winternals.com where they sell the non-free tools with a little more functionality added.) UNIX users won’t find that to be a big deal, but it’s still pretty uncommon on the Windows side. Most UNIX distributions come with a set of tools that perform the equivalent function. According to the Rosetta Stone (a list of what a function is called, cross-referenced by OS.The Rosetta Stone can be found at http://bhami.com/rosetta.html), there are a number of tracing programs. Of course, since this is a pretty low-level function, each tracing tool tends to work with a limited set of OSes. Examples include trace, strace, ktrace, and truss.The following example is done on Red Hat Linux, version 6.2, using the strace utility. What strace (and most of the other trace utilities mentioned) does is show system (kernel) calls and their parameters.We can learn a lot about how a program works this way. Rather than just dump a bunch of raw output into your lap, I’ve inserted explanatory comments in the output: [elliptic@ellipse]$ echo hello > test [elliptic@ellipse]$ strace cat test execve("/bin/cat", ["cat", "test"], [/ 21 vars */]) = 0 Strace output doesn’t begin until the program execution call is made for cat. Thus, we don’t see the process the shell went through to find cat. By the time strace kicks in, it’s been located in /bin.We see cat is started with an argument of “test,” and a list of 21 environment variables. First item of input: arguments. Second: environment variables. brk(0) = 0x804b160 old_mmap(NULL, 4096, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, - 1, 0) = 0x40014000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) The execve call begins its normal loading process; allocating memory, etc. Note the return value is –1, which indicates an error.The error interpretation is “No such file...”; indeed, no such file exists.While not exactly “input,” this makes it clear that if we were able to drop a file by that name, with the right function names, into the /etc directory, execve would happily run parts of it for us.That www.syngress.com 116 Chapter 4 • Methodology would be really useful if root came by later and ran something. Of course, to be able to do that, we’d need to be able to drop a new file into /etc, which we can’t do unless someone has messed up the file system permissions. On most UNIX systems, the ability to write to /etc, means we can get root access any number of ways.This is just another reason why regular users shouldn’t be able to write to /etc. Of course, if we’re going to hide a Trojan horse somewhere (after we’ve already broken root), this might be a good spot. open("/etc/ld.so.cache", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0644, st_size=12431, ...}) = 0 old_mmap(NULL, 12431, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000 close(4) = 0 open("/lib/libc.so.6", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0755, st_size=4101324, ...}) = 0 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\210\212"..., 4096) = 4096 The first 4K of libc is read. Libc is the standard shared library where reside all the functions that you call when you do C programming (such as printf, scanf, etc.). old_mmap(NULL, 1001564, PROT_READ\|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0x40019000 mprotect(0x40106000, 30812, PROT_NONE) = 0 old_mmap(0x40106000, 16384, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_FIXED, 4, 0xec000) = 0x40106000 old_mmap(0x4010a000, 14428, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_FIXED\| MAP_ANONYMOUS, -1, 0) = 0x4010a000 close(4) = 0 mprotect(0x40019000, 970752, PROT_READ\|PROT_WRITE) = 0 mprotect(0x40019000, 970752, PROT_READ\|PROT_EXEC) = 0 munmap(0x40015000, 12431) = 0 personality(PER_LINUX) = 0 getpid() = 9271 brk(0) = 0x804b160 brk(0x804b198) = 0x804b198 brk(0x804c000) = 0x804c000 open("/usr/share/locale/locale.alias", O_RDONLY) = 4 www.syngress.com Methodology • Chapter 4 117 fstat64(0x4, 0xbfffb79c) = -1 ENOSYS (Function not implemented) fstat(4, {st_mode=S_IFREG\|0644, st_size=2265, ...}) = 0 old_mmap(NULL, 4096, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, - 1, 0) = 0x40015000 read(4, "# Locale name alias data base.\n#"..., 4096) = 2265 read(4, "", 4096) = 0 close(4) = 0 munmap(0x40015000, 4096) = 0 When programs contain a setlocale function call, libc reads the locale information to determine the correct way to display numbers, dates, times, etc.Again, permissions are such that you can’t modify the locale files without root access, but it’s still something to watch for. Notice that the file permissions are conveniently printed in each fstat call (that’s the 0644 above, for example).This makes it easy to visually watch for bad permissions. If you do find a locale file to which you can write, you might be able to cause a buffer overflow in libc.Third (indirect) item of input: locale files. open("/usr/share/i18n/locale.alias", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale/en_US/LC_MESSAGES", O_RDONLY) = 4 fstat(4, {st_mode=S_IFDIR\|0755, st_size=4096, ...}) = 0 close(4) = 0 open("/usr/share/locale/en_US/LC_MESSAGES/SYS_LC_MES SAGES", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0644, st_size=44, ...}) = 0 old_mmap(NULL, 44, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40015000 close(4) = 0 open("/usr/share/locale/en_US/LC_MONETARY", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0644, st_size=93, ...}) = 0 old_mmap(NULL, 93, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40016000 close(4) = 0 open("/usr/share/locale/en_US/LC_COLLATE", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0644, st_size=29970, ...}) = 0 old_mmap(NULL, 29970, PROT_READ, MAP_PRIVATE, 4, 0) = 0x4010e000 close(4) = 0 www.syngress.com 118 Chapter 4 • Methodology brk(0x804d000) = 0x804d000 open("/usr/share/locale/en_US/LC_TIME", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0644, st_size=508, ...}) = 0 old_mmap(NULL, 508, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40017000 close(4) = 0 open("/usr/share/locale/en_US/LC_NUMERIC", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0644, st_size=27, ...}) = 0 old_mmap(NULL, 27, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40018000 close(4) = 0 open("/usr/share/locale/en_US/LC_CTYPE", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG\|0644, st_size=87756, ...}) = 0 old_mmap(NULL, 87756, PROT_READ, MAP_PRIVATE, 4, 0) = 0x40116000 close(4) = 0 fstat(1, {st_mode=S_IFCHR\|0620, st_rdev=makedev(136, 4), ...}) = 0 open("test", O_RDONLY\|O_LARGEFILE) = 4 fstat(4, {st_mode=S_IFREG\|0664, st_size=6, ...}) = 0 Finally, cat opens our file “test.” Certainly, it counts as input, but we can feel pretty safe that cat won’t blow up based on anything inside the file, because of what cat’s function is. In other cases, you would definitely want to count the input files. read(4, "hello\n", 512) = 6 write(1, "hello\n", 6) = 6 read(4, "", 512) = 0 close(4) = 0 close(1) = 0 _exit(0) = ? To finish, cat reads up to 512 bytes from the file (and gets 6) and writes them to the screen (well, file handle 1, which goes to STDOUT at the time). It then tries to read up to another 512 bytes of the file, and it gets 0, which is the indicator that it’s at the end of the file. So, it closes its file handles and exits clean (exit code of 0 is normal exit). Naturally, I picked a super-simple example to demonstrate.The cat command is simple enough that we can easily guess what it does, processing-wise, between calls. In pseudocode: www.syngress.com Methodology • Chapter 4 119 int count, handle string contents handle = open (argv[1]) while (count = read (handle, contents, 512)) write (STDOUT, contents, count) exit (0) For comparison purposes, here’s the output from truss for the same command on a Solaris 7 (x86) machine: execve("/usr/bin/cat", 0x08047E50, 0x08047E5C) argc = 2 open("/dev/zero", O_RDONLY) = 3 mmap(0x00000000, 4096, PROT_READ\|PROT_WRITE\|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xDFBE1000 xstat(2, "/usr/bin/cat", 0x08047BCC) = 0 sysconfig(_CONFIG_PAGESIZE) = 4096 open("/usr/lib/libc.so.1", O_RDONLY) = 4 fxstat(2, 4, 0x08047A0C) = 0 mmap(0x00000000, 4096, PROT_READ\|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xDFBDF000 mmap(0x00000000, 598016, PROT_READ\|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xDFB4C000 mmap(0xDFBD6000, 24392, PROT_READ\|PROT_WRITE\|PROT_EXEC, MAP_PRIVATE\| MAP_FIXED, 4, 561152) = 0xDFBD6000 mmap(0xDFBDC000, 6356, PROT_READ\|PROT_WRITE\|PROT_EXEC, MAP_PRIVATE\| MAP_FIXED, 3, 0) = 0xDFBDC000 close(4) = 0 open("/usr/lib/libdl.so.1", O_RDONLY) = 4 fxstat(2, 4, 0x08047A0C) = 0 mmap(0xDFBDF000, 4096, PROT_READ\|PROT_EXEC, MAP_PRIVATE\|MAP_FIXED, 4, 0) = 0xDFBDF000 close(4) = 0 close(3) = 0 sysi86(SI86FPHW, 0xDFBDD8C0, 0x08047E0C, 0xDFBFCEA0) = 0x00000000 fstat64(1, 0x08047D80) = 0 open64("test", O_RDONLY) = 3 www.syngress.com 120 Chapter 4 • Methodology fstat64(3, 0x08047CF0) = 0 llseek(3, 0, SEEK_CUR) = 0 mmap64(0x00000000, 6, PROT_READ, MAP_SHARED, 3, 0) = 0xDFB4A000 read(3, " h", 1) = 1 memcntl(0xDFB4A000, 6, MC_ADVISE, 0x0002, 0, 0) = 0 write(1, " h e l l o\n", 6) = 6 llseek(3, 6, SEEK_SET) = 6 munmap(0xDFB4A000, 6) = 0 llseek(3, 0, SEEK_CUR) = 6 close(3) = 0 close(1) = 0 llseek(0, 0, SEEK_CUR) = 296569 _exit(0) Based on the bit at the end, we can infer that the Solaris cat command works a little differently; it appears that it uses a memory-mapped file to pass a memory range straight to a write call. An experiment (not shown here) with a larger file showed that it would do the memorymap/write pair in a loop, handling 256K bytes at a time. The point of showing these traces was not to learn how to use the trace tools (that would take several chapters to describe properly, though it is worth learning). Rather, it was to demonstrate the kinds of things you can learn by asking the operating system to tell you what it’s up to. For a more involved program, you’d be looking for things like fixed-name /tmp files, reading from files writeable by anyone, any exec calls, and so on. Disassemblers, Decompilers, and Debuggers Drilling down to attacks on the binary code itself is the next stop.A debugger is a piece of software that will take control of another program and allow things like stopping at certain points in the execution, changing variables, and even changing the machine code on the fly in some cases. However, the debugger’s ability to do this may depend on whether the symbol table is attached to the executable (for most binary-only files, it won’t be). Under those circumstances, the debugger may be able to do some functions, but you may have to do a lot of manual work, like setting breakpoints on memory addresses rather than function names. A decompiler (also called a disassembler) is a program that takes binary code and turns it into some higher-level language, often assembly language. Some can do www.syngress.com Methodology • Chapter 4 121 rudimentary C code, but the code ends up being pretty rough. A decompiler attempts to deduce some of the original source code from the binary (object) code, but a lot of information that programmers rely on during development is lost during the compilation process; for example, variable names. Often, a decompiler can only name variables with non-useful numeric names while decompiling unless the symbol tables are present. The problem more or less boils down to you having to be able to read assembly code in order for a decompiler to be useful to you. Having said that, let’s take a look at an example of what a decompiler produces. One commercial decompiler for Windows that has a good reputation is IDA Pro, from DataRescue (shown in Figure 4.1). IDA Pro is capable of decompiling code for a large number of processor families, including the Java Virtual Machine. Here, we’ve used IDA Pro to disassemble mspaint.exe (Paintbrush).We’ve scrolled to the section where IDA Pro has identified the external functions upon www.syngress.com Figure 4.1 IDA Pro in Action 122 Chapter 4 • Methodology which mspaint.exe calls. For OSes that support shared libraries (like Windows and all the modern UNIXs), an executable program has to keep a list of the libraries it will need.This list is usually human readable if you look inside the binary file. The OS needs this list of libraries so it can load them for the program’s use. Decompilers take advantage of this, and are able to insert the names into the code in most cases, to make it easier for people to read. We don’t have the symbol table for mspaint.exe, so most of this file is unnamed assembly code. If you want to try out IDA Pro for yourself, a limited trial version of IDA Pro is available for download at www.datarescue.com/idabase/ida.htm. Another very popular debugger is the SoftICE debugger from Numega. Information about softICE can be found at http://www.compuware.com/products/nu...rivercentral/. To contrast, I’ve prepared a short C program (the classic “Hello World”) that I’ve compiled with symbols, to use with the GNU Debugger (GDB). Here’s the C code: #include <stdio.h> int main () { printf ("Hello World\n"); return (0); } Then, I compile it with the debugging information turned on (the –g option.): [elliptic@ellipse]$ gcc -g hello.c -o hello [elliptic@ellipse]$ ./hello Hello World I then run it through GDB. Comments inline: [elliptic@ellipse]$ gdb hello GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. www.syngress.com Methodology • Chapter 4 123 Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) break main I set a breakpoint at the main function. As soon as the program enters main, the execution pauses and I get control.The breakpoint is set before run. Breakpoint 1 at 0x80483d3: file hello.c, line 5. (gdb) run The run command executes our hello program in the debugger. Starting program: /home/ryan/hello Breakpoint 1, main () at hello.c:5 5 printf ("Hello World\n"); (gdb) disassemble Now that we have reached the breakpoint we set up during the execution of the debugging session, we issue the disassemble command to display some further information about the program. Dump of assembler code for function main: 0x80483d0 <main>: push %ebp 0x80483d1 <main+1>: mov %esp,%ebp 0x80483d3 <main+3>: push $0x8048440 0x80483d8 <main+8>: call 0x8048308 <printf> 0x80483dd <main+13>: add $0x4,%esp 0x80483e0 <main+16>: xor %eax,%eax 0x80483e2 <main+18>: jmp 0x80483e4 <main+20> 0x80483e4 <main+20>: leave 0x80483e5 <main+21>: ret End of assembler dump. This is what “hello world” looks like in x86 Linux assembly. Examining your own programs in a debugger is a good way to get used to disassembly listings. (gdb) s www.syngress.com 124 Chapter 4 • Methodology printf (format=0x8048440 "Hello World\n") at printf.c:30 printf.c: No such file or directory. I then “step” (s command) to the next command, which is the printf call. GDB indicates that it doesn’t have the printf source code to give any further details. (gdb) s 31 in printf.c (gdb) s Hello World 35 in printf.c (gdb) c Continuing. A couple more steps into printf, and we get our output. I use “continue” (c command) to tell GDB to keep running the program until it gets to another breakpoint or finishes. Program exited normally. (gdb) Other related tools include nm and objdump from the GNU binutils collection. Objdump is a program for manipulating object files. It can be used to display symbols in an object file, display the headers in an object file, or even disassemble an object file into assembly code. Nm performs functions similar to objdump, allowing the user to see the symbols referenced by an object file. www.syngress.com Tools Are No Substitutes For Knowledge Some of the disassembly and debugging tools are fantastic in the number of features they offer. However, like any tool, they are not perfect. This is especially true when dealing with malicious code (viruses, worms, Trojans) or binary exploits. Often the authors of these types of binary code specifically want to make analysis difficult, and will take steps to make the tools less functional. For example, the RST Linux virus checks to see if it is being debugged, and will exit if that is the case. The same virus modifies the ELF file headers when it infects a file in such a Tools & Traps… Continued Methodology • Chapter 4 125 Black Box Testing The term black box refers to any component or part of a system whose inner functions are hidden from the system user.There are no exposed settings or controls; it just accepts input and produces output. It is not intended to be open or modified and there are no user serviceable parts inside. Black box testing can be likened to binary auditing. Both types of auditing require dealing with binary data. Black boxes, however, appear with varying degrees of transparency.We recognize two different classes of problems with which we may be presented: black box, and obsidian box. Of course, these are conceptual boxes rather than physical objects.The type of box refers to our level of visibility into the workings of the system we want to attack. Naturally, the very idea of a black box is an anathema to most hackers. How could you have a box that performs some neat function, and not want to know how it does it? We will be discussing ideas on how to attack a true black box, but in reality we will be spending most of our energy trying to pry the lid off. www.syngress.com way as to make some disassemblers unable to access the virus portion of the binary directly. (Specifically, there is no declared code segment for the virus code, but it gets loaded along with the previous segment, and will still execute.) It’s very common for a piece of malicious code to be somewhat protected with encryption or compression. The Code Red worms existed in the wild only as half overflow string/half code, meaning that none of the standard file headers were present. All of the above means that you will still need to know how to do things manually if need be. You will need to be able to tell from examining a file header that portions have been modified, and how to interpret the changes. You may need to be able to perform several iterations of code analysis for encrypted code. You will have to analyze the decryption routine, replicate the code that does the work, and then analyze the results. You may not only have to be able to read assembly language, but be able to write it in order to copy a decryption or decompression function. Writing assembly code is generally harder than reading it. This is not to indicate that the tools are useless. Far from it. You may hit a stumbling block for which the tool is inadequate, but once past it, you will want to plug the results right back into the tool and continue from there. Besides, sometimes using the tools is the best way to learn how things work in the first place. 126 Chapter 4 • Methodology Chips Imagine you have a piece of electronics gear that you would like to reverse engineer. Most equipment of that type nowadays would be built mostly around integrated circuits (ICs) of some kind. In our hypothetical situation, you open the device, and indeed, you see an IC package as expected, but the identifying marks have been sanded off! You pull the mystery chip out of its socket and try to determine which chip it is. Unknown ICs are a good example of a real-life black box (they’re even black).Without the markings, you may have a lot of difficulty determining what kind of chip it is. What can you tell from a visual inspection? You can tell it has 16 pins, and that’s about it. If you examine the circuit board it came out of, and start visually following the traces in the board, you can probably pretty easily determine the pins to which the power goes, and that can be verified with a volt meter. Guessing which pins take power (and how much) can be fun, because if you get it wrong, you can actually fry the chip. Beyond that, you’ll probably have to try to make inferences based on any other components in the gadget.You can start to make a list of components that attach to the chip, and to which pins they attach. For example, perhaps two of the pins eventually connect to a light emitting diode (LED). If it turns out that the chip is a simple Transistor-to-Transistor Logic (TTL) device, you might be able to deduce simple logic functions by applying the equivalent of true-and-false signals to various pins and measuring for output on other pins. If you could deduce, for example, that the chip was simply a bunch of NAND (not-and) gates, you could take that information, go to a chip catalog, and figure out pretty quickly which chip (or equivalent) you have. On the other hand, the chip could turn out to be something as complex as a small microprocessor or an entire embedded system. If it were the latter case, there would be far, far too many combinations of inputs and outputs for a trial-and-error map. For an embedded system, there will probably also be analog components (for example, a speaker driver) that will frustrate any efforts to map binary logic. For an example of a small computer on a chip of this sort, go to http://www.parallaxinc.com/html_file...dule_bs2p.asp. Parallax produces a family of chips that have built-in BASIC interpreters, as well as various combinations of input and output mechanisms.The underlying problem with such a complex device is that the device in question has way more states than you could possibly enumerate. Even a tiny computer with a very small www.syngress.com Methodology • Chapter 4 127 amount of memory can produce an infinite amount of nonrepeating output. For a simple example, imagine a single-chip computer that can do addition on huge integers. All it has to do is run a simple program that adds 1 to the number each time and outputs that for any input you give it.You’d probably pretty quickly infer that there was a simple addition program going on, but you wouldn’t be able to infer any other capabilities of the chip.You wouldn’t be able to tell if it was a general-purpose programmable computer, or if it was hardware designed to do just the one function. Some folks have taken advantage of the fact that special sequences are very unlikely to be found in black boxes, either by accident or when actively looked for. All the person hiding a sequence has to do is make sure the space of possibilities is sufficiently large to hide his special sequence. For a concrete example, read the following article: http://www.casinoguru.com/features/0...9_tocatch.htm. It tells of a slot machine technician who replaced the chip in some slot machines, so that they would pay a jackpot every time a particular sequence of coins was put in the machine, and the handle pulled.Talk about the ultimate Easter egg! So, if you can’t guess or infer from the information and experiments available to you what this chip does, what do you do? You open it! Open a chip? Sure. Researchers of “tamper-proof ” packaging for things like smart cards have done any number of experiments on these types of packages, including using acid to burn off the packaging, and examining the chip layout under a microscope.We’ll cover this kind of hardware hacking in Chapter 14. So, as indicated before, our response to being frustrated at not being able to guess the internals of a black box is to rip it open. An analogy can be found in this author’s experiences visiting Arizona’s obsidian mines—held at arms length, obsidian looks like a black rock. However, if held up to a bright light one can see the light through the stone.There are no truly “black boxes,” but rather, they are “obsidian boxes” that permit varying degrees of vision into them. In other words, you always have some way to gain information about the problem you’re trying to tackle. www.syngress.com 128 Chapter 4 • Methodology Summary Vulnerability research methodologies are the commonly used principles of auditing systems for vulnerabilities.The process of source code research begins with searching the source code for error-prone directives such as strcpy and sprintf. Another method is the line-by-line review of source code by the person auditing the program, which is a comprehensive audit of the program through all of its execution sequences. Discovery through difference is another method, using the diff utility on different versions of the same software to yield information about security fixes.The method of undertaking binary research can involve various utilities such as tracing tools, debuggers, guideline-based auditing, and sniffers. An auditing source code review involves the search for error-prone functions and line-by-line auditing methodologies. In this chapter, we looked at an example of an exploitable buffer overflow using strcpy, an example using sprintf, an example using strcat, and an example using gets.We dissected input validations bugs, such as a format string vulnerability using printf, and a open function written in Perl.We also examined a race condition vulnerability in the mktemp function. Reverse engineering is one of the most commonly used and accurate methods of finding vulnerabilities in a closed-source program.This type of research is performed from the top-down.Windows auditing tools are available from sysinternals.com, and using the Rosetta Stone list to map system calls across platforms. In this chapter, we traced the execution of the cat program, first on a Red Hat Linux system, then a Solaris 7 system. Disassemblers, and debuggers drill down into binary code.A disassembler (also known as a decompiler) is a program that takes binary code and turns it into a higher-level language like assembly.A debugger is a program that can control the execution of another program. In this chapter, we examined the output of disassembly on the Windows platform using IDA Pro, then performed a debugging session with GDB on a Linux system.We also discussed objdump, a program used to manipulate object files; and nm, a program that displays the symbol information contained in object files. A black box is a (conceptual) component whose inner functions are hidden from the user; black box testing is similar to binary auditing, in that it involves reverse-engineering integrated circuits. One may also identify a chip by deduction of output, or by literally ripping it open to examine it. Black boxes have varying degrees of transparency. www.syngress.com Methodology • Chapter 4 129 Solutions Fast Track Understanding Vulnerability Research Methodologies Source research and review is the most ideal vulnerability research methodology. Source research is often conducted through searching for error-prone directives, line-by-line review, and discovery through difference. Binary research is often performed through tracing binaries, debuggers, guideline-based auditing, and sniffers. The Importance of Source Code Review Source review is a necessary part of ensuring secure programs. Searching for error-prone directives in source can yield buffer overflows, input validation bugs, and race conditions. The grep utility can be used to make the searching of error-prone directives efficient. Reverse Engineering Techniques Freely available auditing tools for Windows are available from www.sysinternals.com. The Rosetta Stone (at http://bhami.com/rosetta.html) can be used to map system utilities across platforms. Debuggers can be used to control the execution of a program, and find problem sections of code. Black Box Testing Black box testing is the process of discovering the internals of a component that is hidden from the naked eye. Ripping open a black box is the easiest way to determine the internals. There are no true black boxes. Most allow varying degrees of transparency. www.syngress.com 130 Chapter 4 • Methodology Q: What is the best method of researching vulnerabilities? A: This question can only yield a subjective answer.The best methods a researcher can use are the ones he or she is most comfortable with, and are most productive for the research.The recommended approach is to experiment with various methods, and organization schemes. Q: Is decompiling and other reverse engineering legal? A: In the United States, reverse engineering may soon be illegal.The Digital Millennium Copyright Act includes a provision designed to prevent the circumvention of technological measures that control access to copyrighted works. Source code can be copyrighted, and therefore makes the reverse engineering of copyrighted code illegal. Q: Are there any tools to help with more complicated source code review? A: Tools such as SCCS and CVS may make source review easier. Additionally, integrated development environments (IDEs) may also make source review an easier task. Q: Where can I learn about safe programming? A: A couple different resources one may use are the Secure UNIX Programming FAQ at www.whitefang.com/sup/secure-faq.html, or the secprog mailing list moderated by Oliver Friedrichs. Q: Where can I download the source to these example programs? A: The source is available at www.syngress.com/solutions.

	送花文章: 0, 收花文章: 21 篇, 收花: 61 次

相似的主題
主題	主題作者	討論區	回覆	最後發表
軟體 - 輕鬆玩轉火狐	psac	應用軟體使用技術文件	2	2006-08-29 08:44 AM
Hack Proofing Your Network-3	mic64	網路軟硬體架設技術文件	0	2004-06-30 01:21 PM
Hack Proofing Your Network-1	mic64	網路軟硬體架設技術文件	1	2004-06-30 08:05 AM
Hack Proofing Your Network[下]	mic64	網路軟硬體架設技術文件	1	2004-06-21 04:32 PM
Hack Proofing Your Network	mic64	網路軟硬體架設技術文件	12	2004-06-21 03:28 PM

Google 提供的廣告