|Info:||See <https://www.honeynet.org/gsoc/project1> for
|Author:||Zhijie Chen (Joyan) <[email protected]>|
|Description:||Mid-term Report on PHoneyC GSoC project 1. This report
describes what I have done on the PHoneyC’s libemu integration
for shellcode and heapspray detection during the first half of
the GSoC. Till now, the main ideas on this feature has been
fast-implemented (actually I mean poor coding style) and the
whole flow works well, with some code rewriting and performance
optimization needed in the future.
PHoneyC is a low-interaction honeyclient written by Jose Nazario. The
shellcode (SC for short) and heapspray (HS for short) detection module
for PHoneyC is listed on the GSoC this year and I feel lucky to be
chosen to implement it. This report is the main idea about how to
detect SC/HS in PHoneyC and how to build and run this version of
PHoneyC. Note that this module (I call it honeyjs) is far from
complete currently and this report is only for midterm evaluation. So
it is possible that the way to build and run it won’t work in the
As for the introduction to PHoneyC, I think I’d better quote what the
original developer said in his paper ‘PhoneyC: A Virtual client
My approach to detection shellcode and heapspray can be simply
1. Firstly I have modified the python-spidermonkey v0.0.1a
interrupted on each assignment.
2. Then I check if the r-value of this assignment is a string.
If so, I use libemu to check for shellcodes in this string. If
there are shellcode within the string, it will append an alert
message into the alert list.
3. A series of shellcode alerts relating to one variable will
be summarized into a potential heapspray alert.
analyze the shellcodes for mal-download URLs and other
information using libemu.
Also there are some optimizations such as mal-value hash table to
avoid duplicate check to the same value and dataflow tracking (e.g.
the concatenation of a mal-string (string that contains shellcodes)
with any other string will result in a mal-string).
The above is all I have done in the first half of this GSoC, and the
python module I implemented is named honeyjs.
To successfully compile the honeyjs module, the following
software/library is required:
“Pyrex – a Language for Writing Python Extension Modules.”
use version 0.9.8.5.
“SpiderMonkey is the code-name for the Mozilla’s C
<http://www.mozilla.org/js/spidermonkey/>. I use version 1.8.0
“libemu is a small library written in c offering basic x86
emulation and shellcode detection using GetPC heuristics.”
<http://libemu.carnivore.it/>. I use the version from the CVS.
For the reason that I will rewrite the pyrex code in C to use the
latest version of python-spidermonkey, it’s meaningless to write any
automatic install scripts this moment. So you have to confirm the
packages above are correctly installed and manually change the path to
the libraries and header files in ./lib/setup,py and run the command
make to build it.
To test this branch of PHoneyC, change the LINK variable in
honeyclient.py to your URL and run it. The shellcode/heapspray
alerts will be printed, the shellcode will be analyzed and the
URLs will be stored in a python list if it is a download-and-exec
NOTE: The deobfuscating module is developed by another GSoCer so the
current deobfuscating ability is limited. We will merge together at
the end of the GSoC.
For example, running the honeyclient.py on the test sample 2448.html
will prints like this:
There are some known problems with the current implementation, which
- The ‘strange’ behavior of libemu’s shellcode analyzing
function. Sometimes the shellcode can’t be profiled thoroughly,
for example, the download-and-exec shellcode in
ssreader_0day.html sometimes can only finish the LoadLibrary
and GetprocAddress calls in the emulation, and won’t go on to
invoke GetSystemDirectory and URLDownloadToFile APIs, as seen
from the shellcode profile.
- It costs too much time to check a heapspray sledge for
shellcode. Some optimization is needed.
Things I will do next:
- Read the source code of the latest python-spidermonkey module
- Rewrite the whole honeyjs module in C. The current version of
honeyjs is based on python-spidermonkey v0.0.1a, which is
written in pyrex, but the latest version of python-spidermonkey
(v0.0.8) is written in C and has less bugs. And also another
PHoneyC GSoC project is also based on the latest version of
python-spidermonkey, so it’s necessary for me to implement
honeyjs based on python-spidermonkey 0.0.8, too.
- Write a more user-friendly install script for release.
- Document the implementation.
- Merge in Geng Wang’s DOM simulation codes.
- Try some new features, for example, hooking more APIs which
will be called in the shellcode.