The Reverse Challenge: analysis

 

 

This chapter tells the details of what we did, how we did it and why we did it; it is quite a long story.

 

Our first step analyzing the recently downloaded ‘the-binary’ file is determine what type of file it is:

 

# file the-binary

the-binary: ELF 32-bit LSB executable, Intel 80386, version 1, statically linked, stripped

 

That gives a lot of useful information:

 

-          It is an standard ELF executable, probably a Linux binary.

-          There is no debug information, symbols are stripped, and no shared library file is used. That makes our task a lot harder: forget about easy debugging with gdb, and we must dismiss using tools like ltrace.

 

Let’s continue our static analysis:

 

# strings -a the-binary

  <-lots of output lines deleted->

 

From its output we can conclude several things:

 

The multiple entries like:

 

GCC: (GNU) 2.7.2.l.2

 

indicate the program has been generated with gcc version 2.7.2.1.2.

 

@(#) The Linux C library 5.3.12

 

Confirms this is a linux binary, compiled with libc version 5.

 

These lines are quite interesting:

 

[mingetty]

/tmp/.hj237349

/bin/csh -f -c "%s" 1> %s 2>&1

TfOjG

/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:.

PATH

HISTFILE

linux

TERM

/bin/sh

/bin/csh -f -c "%s"

 

mingetty is a minimal getty process for virtual consoles (see man 8 mingetty). Why should this be here?

/tmp/.hj237349 indicates a temporal filename, the leading dot is a simple method of trying to hide it.

/bin/csh –f –c “%s” 1> %s 2>&1, and the other csh reference indicates that, at some point, the program will try to execute a command with csh, redirecting its output.

TfOjG seems to be a password-like string. Maybe at some point it is used to validate user input.

PATH, HISTFILE, etc... are common shell environment variables. This shows that, somewhere, the program is able to open a shell.

 

%d.%d.%d.%d

%u.%u.%u.%u

%c%s

gethostby*.getanswer: asked for "%s", got "%s"

RESOLV_HOST_CONF

/etc/host.conf

order

resolv+: %s: "%s" command incorrectly formatted.

 

..and etc... indicates that the library resolv+ (now part of libc) is included in the binary. So, at some point, the program will try to resolve hostnames or IP addresses; some network activity is expected then.

 

The lines

yplib.c,v 2.6 1994/05/27 14:34:43 swen Exp

/var/yp/binding

 

and many others like them indicate the presence of libc NIS calls. The resolv+ library uses them, but they could also be called directly.

 

The string

*nazgul*

 

seemed very suspicious (a password or the like), but a quick search on the web showed us it marks the beginning of a Linux compiled message catalog. It is so fun to learn...

 

We then proceeded to get file information from objdump.

 

# objdump -x the-binary

 

the-binary:     file format elf32-i386

the-binary

architecture: i386, flags 0x00000102:

EXEC_P, D_PAGED

start address 0x08048090

 

Program Header:

    LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12

         filesz 0x00024222 memsz 0x00024222 flags r-x

    LOAD off    0x00024228 vaddr 0x0806d228 paddr 0x0806d228 align 2**12

         filesz 0x0000c094 memsz 0x00011970 flags rw-

             

Sections:

Idx Name          Size      VMA       LMA       File off  Algn

  0 .init         00000008  08048080  08048080  00000080  2**4

                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  1 .text         0001f53c  08048090  08048090  00000090  2**4

                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  2 __libc_subinit 00000004  080675cc  080675cc  0001f5cc  2**2

                  CONTENTS, ALLOC, LOAD, READONLY, DATA

  3 .fini         00000008  080675d0  080675d0  0001f5d0  2**4

                  CONTENTS, ALLOC, LOAD, READONLY, CODE

  4 .rodata       00004c4a  080675d8  080675d8  0001f5d8  2**2

                  CONTENTS, ALLOC, LOAD, READONLY, DATA

  5 .data         0000c084  0806d228  0806d228  00024228  2**2

                  CONTENTS, ALLOC, LOAD, DATA

  6 .ctors        00000008  080792ac  080792ac  000302ac  2**2

                  CONTENTS, ALLOC, LOAD, DATA

  7 .dtors        00000008  080792b4  080792b4  000302b4  2**2

                  CONTENTS, ALLOC, LOAD, DATA

  8 .bss          000058dc  080792bc  080792bc  000302bc  2**2

                  ALLOC

  9 .note         00000d5c  00000000  00000000  000302bc  2**0

                  CONTENTS, READONLY    

 10 .comment      00000ea6  00000000  00000000  00031018  2**0

                  CONTENTS, READONLY

objdump: the-binary: no symbols  

 

****************

The program will beging at memory address 0x08048000.

The program will be loaded at address 0x08048000+00000090=0x08048090.

The .text section is "big": 0x0001f53c bytes.

It is located 0x90 into the file.

And, it is aligned to 16 byte boundary: 2^4 = 2**4 = 16.

****************

 

Nothing really new here. But then we, for the first time, generated a HUGE assembler listing with objdump –d and -D:  (lots of info suppressed)

 

# objdump -d the-binary 

objdump: the-binary: no symbols

 

the-binary:     file format elf32-i386

 

Disassembly of section .init:

 

08048080 <.init>:

 8048080:       e8 23 f5 01 00          call   0x80675a8

 8048085:       c2 00 00                ret    $0x0

Disassembly of section .text:

 

08048090 <.text>:

 8048090:       59                      pop    %ecx    

...

 80675cb:       90                      nop

Disassembly of section .fini:

 

080675d0 <.fini>:

 80675d0:       e8 3b 0b fe ff          call   0x8048110

 80675d5:       c2 00 00                ret    $0x0

 

 

# objdump -D the-binary

...

...

080675cc <__libc_subinit>:

 80675cc:       3c 6d                   cmp    $0x6d,%al

 80675ce:       05                      .byte 0x5

 80675cf:       08                      .byte 0x8

Disassembly of section .fini:

 

080675d0 <.fini>:

 80675d0:       e8 3b 0b fe ff          call   0x8048110

 80675d5:       c2 00 00                ret    $0x0

...

...

Disassembly of section .rodata:

 

080675d8 <.rodata>:

 80675d8:       5b                      pop    %ebx        

...

...

0806d228 <.data>:

 806d228:       00 00                   add    %al,(%eax)        

...

...

080792ac <.ctors>:

 80792ac:       ff                      (bad)

 80792ad:       ff                      (bad)

 80792ae:       ff                      (bad)

 80792af:       ff 00                   incl   (%eax)

 80792b1:       00 00                   add    %al,(%eax)

        ...

Disassembly of section .dtors:

 

080792b4 <.dtors>:

 80792b4:       ff                      (bad)

...

 

It was  mostly unreadable.

 

We tried to determine what system calls could the binary execute. Knowing that system calls are executed in Linux in the following way:

 

-          Calls are made through INT 0x80

-          System call is identified with EAX register

-          First 5 parameters are send with EBX, ECX, EDX, ESI and EDI registers.

-          More parameters (if any) are sent through the stack.

 

and that the call identification numbers are defined in /usr/include/asm/unistd.h, locating system calls is easy: just search for “int    $0x80” in the listing (or “cd 80”, the hexadecimal numbers corresponding to such instruction). Doing that:

 

# objdump -d the-binary 2>/dev/null| grep "cd 80" | wc -l

     47    

 

So there are 47 system calls in the binary.

 

To find a system call and its parameters:

 

# objdump -d the-binary | grep -B 7 "cd 80" | more

...

EXAMPLE:

--

 80480e6:       e8 49 00 00 00          call   0x8048134

 80480eb:       50                      push   %eax

 80480ec:       e8 cb de 00 00          call   0x8055fbc

 80480f1:       5b                      pop    %ebx

 80480f2:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi

 80480f9:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi

 8048100:       b8 01 00 00 00          mov    $0x1,%eax

 8048105:       cd 80                   int    $0x80

--   

 

At this point, a small perl script was created to identify system calls in the objdump output. The script, called syscall.pl is available as appendix 0. With it, it was possible to clearly generate a list of the system calls in the code:  (parameters omitted for brevity)

 

 80480b4: cd 80                      int    $0x80 # personality()

 8048105: cd 80                      int    $0x80 # exit()

 8056a11: cd 80                      int    $0x80 # wait4()

 8056a54: cd 80                      int    $0x80 # socketcall()

 8056a9c: cd 80                      int    $0x80 # socketcall()

 8056ae4: cd 80                      int    $0x80 # socketcall()

 8056b26: cd 80                      int    $0x80 # socketcall()

 8056b72: cd 80                      int    $0x80 # socketcall()

 8056bcc: cd 80                      int    $0x80 # socketcall()

 8056c1e: cd 80                      int    $0x80 # socketcall()

 8056c78: cd 80                      int    $0x80 # socketcall()

 8056cd1: cd 80                      int    $0x80 # socketcall()

 8056d1c: cd 80                      int    $0x80 # socketcall()

 8057140: cd 80                      int    $0x80 # chdir()

 805716c: cd 80                      int    $0x80 # close()

 805719b: cd 80                      int    $0x80 # dup2()

 80571ca: cd 80                      int    $0x80 # execve()

 80571f0: cd 80                      int    $0x80 # fork()

 8057214: cd 80                      int    $0x80 # geteuid()

 8057238: cd 80                      int    $0x80 # getpid()

 8057263: cd 80                      int    $0x80 # gettimeofday()

 8057292: cd 80                      int    $0x80 # ioctl()

 80572bf: cd 80                      int    $0x80 # kill()

 80572ee: cd 80                      int    $0x80 # open()

 805731e: cd 80                      int    $0x80 # read()

 8057344: cd 80                      int    $0x80 # setsid()

 8057372: cd 80                      int    $0x80 # sigprocmask()

 805739c: cd 80                      int    $0x80 # uname()

 80573c8: cd 80                      int    $0x80 # unlink()

 80573fa: cd 80                      int    $0x80 # write()

 8057424: cd 80                      int    $0x80 # alarm()

 8057450: cd 80                      int    $0x80 # time()

 8057482: cd 80                      int    $0x80 # writev()

 80574ac: cd 80                      int    $0x80 # select()

 80574f7: cd 80                      int    $0x80 # sigaction()

 8057530: cd 80                      int    $0x80 # sigsuspend()

 8057560: cd 80                      int    $0x80 # exit()

 8065d23: cd 80                      int    $0x80 # mmap()

 8065d65: cd 80                      int    $0x80 # stat()

 8065da1: cd 80                      int    $0x80 # fstat()

 8066106: cd 80                      int    $0x80 # fcntl()

 8066136: cd 80                      int    $0x80 # lseek()

 8066163: cd 80                      int    $0x80 # munmap()

 8066192: cd 80                      int    $0x80 # readv()

 80661c6: cd 80                      int    $0x80 # mremap()

 8066206: cd 80                      int    $0x80 # brk()

 8066244: cd 80                      int    $0x80 # brk()

 

 

Obviously, the binary can still hide more system calls, as more code sections could be hidden in other parts of the program, posing as data. Or the program could modify itself under certains conditions. But it is a start to have this list...

 

There are many socketcall(), that confirms the hypothesis of lots of network usage, and there are some potentially dangerous system calls, such as kill or unlink.

 

We then decided to give IDA-pro a try... With it, we generated another HUGE assembler listing. Main advantage here is that IDA makes a great job with some operations, like a switch statement, that makes the code more readable. More interestingly, it automatically identifies all the linux system calls, and put a comment in the corresponding line. It wouldn’t be the last time we discovered an easier way to do something we had already done.

 

Finally, we run DEC against the binary. It generates a C-like code, so it helps to transform those dark lines full of CMP, JNZ, JZ, etc... instructions into something more readable. But it still generates a very long –and incomprehensible- listing.

 

It was totally impractical to analyze directly such beasts without more help, so we tried some other approach.

 

It was time to start a bit of dynamic analysis.

 

It could be potentially dangerous to run such a program in an unprotected environment, so we proceeded to build our test box:

 

First of all, we created a vmware Linux disk inside our original Linux test system. The advantage of it is maximum isolation and restoring in minutes if needed. VMware network configuration used was “host-only”. This setting allows the creation of a virtual network, based on an internal VMware virtual hub, communicating the guest and host operating systems without needing a real network connection. This kind of configuration involves a controlled and isolated environment where you can develop any network test without damaging other systems.

 

Inside of it, just for checking if we could use it in some other systems without vmware, we created a chroot environment with a shell plus some basic tools inside, like strace. The process is easy: just copy binaries and shared libraries used by them, identified with the ldd command.

 

A chroot’ed environment is not totally secure. There are some forms to escape from it, if you are root. So we decided to start the program with a non-root user, running a simple program named change-user, that changes real uid & gid to a test account.

 

Here is the script session of what we did:

 

# chroot ./chroot

bash# change-user

uid=500(test) gid=500(test) groups=500(test)

bash$ strace ./the-binary

execve("./the-binary", ["./the-binary"], [/* 25 vars */]) = 0

personality(PER_LINUX)                  = 0

geteuid()                               = 500

_exit(-1)                               = ?

bash$

 

Ooops! Program is getting its effective uid at the very beginning and exits. Most likely it expects to be in a privileged account and refuses to run in a normal one.

 

 

We decided to run the binary with a root account. After all, the worst thing would be to reinstall our vmware environment, and no chroot() system call had been found in the binary (the easiest way to escape a chroot jail):

 

bash# strace ./the-binary

execve("./the-binary", ["./the-binary"], [/* 25 vars */]) = 0

personality(PER_LINUX)                  = 0

geteuid()                               = 0

sigaction(SIGCHLD, {SIG_IGN}, {SIG_DFL}, 0x40037c68) = 0

fork()                                  = 1767

_exit(0)                                = ?

 

Now, program has created a child process and exited. A quick check with ps shows that no process with PID 1767 is running.

 

Let’s try again, but now using the –f option to strace, so it follows child processes:

 

bash# strace -f ./the-binary

execve("./the-binary", ["./the-binary"], [/* 25 vars */]) = 0

personality(PER_LINUX)                  = 0

geteuid()                               = 0

sigaction(SIGCHLD, {SIG_IGN}, {SIG_DFL}, 0x40037c68) = 0

fork()                                  = 1777

[pid  1777] setsid( <unfinished ...>

[pid  1776] _exit(0)                    = ?

<... setsid resumed> )                  = 1777

sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0

fork()                                  = 1778

[pid  1777] _exit(0)                    = ?

chdir("/")                              = 0

close(0)                                = 0

close(1)                                = 0

close(2)                                = 0

time(NULL)                              = 1020713618

socket(PF_INET, SOCK_RAW, 0xb /* IPPROTO_??? */) = 0

sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}, 0x40037c68) = 0

sigaction(SIGTERM, {SIG_IGN}, {SIG_DFL}, 0x40037c68) = 0

sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0

sigaction(SIGCHLD, {SIG_IGN}, {SIG_IGN}, 0x80575a8) = 0

recv(0,       (ctrl-C was pressed here)

 <unfinished ...>

 

OK. Now we have a lot of information.

 

First of all, two consecutive fork() calls are executed. That explained why there was no trace of the child process created before. This is a quite suspicious behaviour: most of standard unix daemons will create a child process with fork(), but not two in cascade; it is likely that the purpose of them is just making our job as analysts more difficult.

 

After that, a bit of action is done: signals SIGCHLD, SIGTERM and SIGHUP are captured and ignored (the binary doesn’t want to be killed easily), changes to root directory (it doesn’t seem to worry about being in a chroot’ed environment), closes stdin, stdout, and stderr, and opens a socket in raw mode (so it becomes its standard input). It then tries to receive something through that socket. After a while, it was clear nothing else was going to happen, so we stopped it with Ctrl-C.

 

Interestingly, the socket is opened with unknown protocol 0xB. So it is trying to listen in quite a strange network traffic, most likely nobody will normally send.

 

Protocol 0xB was  unknown to strace and to us, so we did a small search about it and found this in /etc/protocols:

 

nvp     11      NVP-II          # Network Voice Protocol

 

Specifications for the Network Voice Protocol (NVP) are available in RFC741.

 

It could be that this is a very specialized sniffer, but most likely is just waiting for someone to instruct it what to do; and the 0xB protocol is just a covert-channel.

 

We decided to build up a program capable of sending data with IP 0xB protocol, and send it to the binary, to check its reactions. Making such a program is quite trivial: we named it talk.c and it is available as appendix 0.

 

At this point we also worried if the binary would check specific information in the IP header, so we decided to build another program capable of talking 0xB protocol, but this time based on libnet library[3]. That program would give us an easy way of controlling network headers in case we would need it. We named it rev.c, and it is available as appendix 0.

 

In parallel, we discovered fenris. We found it through simple web crawling, searching for a miraculous tool that will help us in our task. At the beginning we started playing around with version 0.01. Better not to talk about the hours lost trying to compile, set up the tool, find the correct command line options... to find several days later that fenris author gave an indication of how to start using it against "the-binary", and there was a version 0.02 available with more options. At the time we found this information we had solved that questions by ourselves...

 

Anyway, fenris is a great tool, but it isn't easy to make it work in a chroot environment, so we decided to run it out of it. (Again: this is just a test environment with vmware. We are not so crazy). It would be nice to have an “attach to running pid” option in fenris.

 

You should read fenris documentation, but one of the most useful things it does is identifying library functions. This is done getting the first bytes (by default, 24) of every function in a library and generating a MD5 checksum with them. After that, every time a function is called, its own MD5 checksum is generated and compared with the references previously stored. If there is a match, voilà, we have –probably, there are false positives- identified a function.

 

As "the binary" was compiled with a version 5 library, a special signature database, provided with the name support/fn-libc5.dat, should be used so functions are properly identified.

 

So, we run:

 

# ./fenris -s -f –p -L support/fn-libc5.dat /root/chroot/reverse/the-binary

 

+++ Executing '/root/chroot/reverse/the-binary' (pid 12261, static) +++

[00000000] 0:00 \ new map: 40000000:77824 (/lib/ld-linux.so.2)

[080480ba] 12261:00 SYS personality (0x0) = -1073742596 (Unknown error 1073742596)

[080480c7] 12261:00 local fnct_1 (0, 1, l/bffffcf4, l/bffffcfc)

[080480c7] 12261:00 + fnct_1 = 0x805756c

[080480c7] 12261:00 # Matches for signature 168E4F1E: setfpucw

[08057579] 12261:01  <8057579> cndt: on-match block +5 executed

[080575a6] 12261:00 ...return from function = <void>

[080480cf] 12261:00 local fnct_2 ()

[080480cf] 12261:00 + fnct_2 = 0x8056d44

[080480cf] 12261:00 # Matches for signature 9C89C698: libc_init

....

 

These lines identify several libc startup functions: setfpucw, libc_init, the call to personality(), etc... For each of the functions identified by fenris in a certain address, we made a change in our assembler listings (IDA, objdump and REC). Changing something like “call 0x08056d44” to a “call libc_init” it certainly makes your life easier.

 

[080480d9] 12261:00 + fnct_4 = 0x8055f08

[080480d9] 12261:00 # Matches for signature D8F7AA72: atexit

...

[08055f0f] 12261:01  + fnct_5 = 0x8055f34

[08055f0f] 12261:01  # Matches for signature B1845073: new_exitfn

...

[0804817b] 12261:02   + fnct_9 = 0x805720c

[0804817b] 12261:02   # Matches for signature 5527EA2B: geteuid libc_geteuid

 

are more library functions being identified.

 

[080481a3] 12261:02   local fnct_10 (l/bffffd97 "/root/chroot/reverse/the-binary", 0, 31)

[080481a3] 12261:02   + fnct_10 = 0x8057764

[080481a3] 12261:02   \ new buffer candidate: bffffd97:32

[080481a3] 12261:02   # Matches for signature 4E05FA21: memset

 

This is something really interesting: program is calling memset with these parameters: a string containing its name, a 0, and a 31 –the length of its name-. Most likely, program is erasing its own name!

 

At this point, a quick check with ps command showed that the binary has indeed been messing around with its name: all instances are created with “[mingetty]” as a process name. That explains why this string was in the binary: it is an attempt to hide himself, acting as a system process.

 

Let’s continue with fenris output:

 

[080481d0] 12261:02   local fnct_11 (17, 1)

[080481d0] 12261:02   + fnct_11 = 0x80569bc

[080481d0] 12261:02   # Matches for signature 8AE66F9A: signal ssignal

 

It is capturing signal 17 (SIGCHLD) as we already knew from strace output.

 

[080481d5] 12261:02   + fnct_13 = 0x80571e8

[080481d5] 12261:02   # Matches for signature BCF79788: fork libc_fork vfork

[080571f0] 12261:03    fork () = 12262

+++ New process 12262 attached +++

 

Here the fork() function is identified. The created process is also traced, as we specified the –f option to fenris.

 

[08056026] 12261:04     local fnct_18 (0)

[08056026] 12261:04     + fnct_18 = 0x8057554

[08056026] 12261:04     # Matches for signature 84D91FB0: exit

 

The father process exits after flushing buffers...

 

[080481e8] 12262:02   + fnct_14 = 0x805733c

[080481e8] 12262:02   # Matches for signature DD587118: libc_setsid setsid

[08057348] 12262:03    SYS setsid () = 12262

 

The child process continues, and as a first step it executes setsid().

 

And then, after several signal() calls it creates another child:

 

[080481f6] 12262:02   # Matches for signature BCF79788: fork libc_fork vfork

[080571f0] 12262:03    fork () = 12263

+++ New process 12263 attached +++

 

We later on found that we were lucky the first time. Doing two quick forks is likely to confuse fenris enough so the second child is not analyzed! Sometimes the system has to spend some time to attach to the new process...

 

This second child does the actions we already knew, allowing us to identify the library functions chdir and close. And then something interesting happens:

 

[0804824b] 12263:02   local fnct_17 (0)

[0804824b] 12263:02   + fnct_17 = 0x8057444

[0804824b] 12263:02   # Matches for signature 58B72F00: libc_time time

[08057454] 12263:03    SYS time (0x0) = 1021219858 [Sun May 12 18:10:58 2002]

[08057456] 12263:03    <8057456> cndt: if-above block (signed) +16 executed

[0805746c] 12263:02   ...return from function = <void>

[08048254] 12263:02   local fnct_18 (1021219858)

[08048254] 12263:02   + fnct_18 = 0x80559a0

[08048254] 12263:02   # No matches for signature BAEE4234.

 

It is calling the time() function, to get the local time, and then it calls an unknown fnct_18() with its result. The fnct_18 then enters in a kind of loop calling the also unknown fnct_19:

 

[08055b9c] 12263:03    local fnct_19 ()

[08055b9c] 12263:03    # No matches for signature 60DCBA5A.

[08055e42] 12263:04     <8055e42> cndt: on-match block +36 skipped

[08055e93] 12263:04     <8055e93> cndt: if-below block (signed) +19 executed

[08055eba] 12263:04     <8055eba> cndt: if-below block (signed) +10 executed

[08055ecb] 12263:03    ...return from function = <void>

[08055bae] 12263:03    <8055bae> cndt: if-above block (unsigned) -20 repeated

[08055b9c] 12263:03    local fnct_19 ()

[08055b9c] 12263:03    # No matches for signature 60DCBA5A.

[08055e42] 12263:04     <8055e42> cndt: on-match block +36 skipped

[08055e93] 12263:04     <8055e93> cndt: if-below block (signed) +19 executed

....lots of similar lines here...

 

What could be the time needed for? At this point we had some alternatives:

 

q