GSoC Project #2 - Develop and Improve PhoneyC

PhoneyC is a low-interaction client honeypot designed to allow researcher to quickly and easily identify and analyze malicious websites and their malware. We hope to be adding DOM emulation and automated shellcode detection using LibEmu this summer, amongst other features, to help improve detection and performance.Primary Mentor: Jose NazarioStudent: Geng WangDeliverables
Our final goal is to do all kinds of obfuscation perfectly, extract the suspicious part and leave it for further analyzation. This requires us to make our honeypot as closely to a real browser as possible while running scripts.
For the two main part of this, we wish to use python-spidermonkey (like an interpreter in python, but more stronger) to bind DOM object in python to javascript context and run scripts in python. Since we use lesser javascript, the honeypot will be more stable and more like a browser. And the interaction between python and javascript is better than c or java, so this will make our client honeypot more easy for maintenance (better reaction to new attacks or deobfuscation means).TimelineMay 23rd: Make sure the new honeypot is ready for most regular webpages. (It is half ready now!)July 6th: Work on all deobfuscation means, innerHTML and so on.
August 10th: Implement more detection scripts.

A review to what we have done yet

Our work mainly focuses on DOM simulation. I believe the following is the most important for deobfuscation, but we also do lot more so that our program can handle normal web pages. We will not list them here.
Our code can be found at:
http://code.google.com/p/phoneyc/source/browse/phoneyc#phoneyc/branches/phoneyc_wanggeng
1. DOM tree generation.

A python object: It can be everything!

The code is like this:
class unknown_obj(object):
    def __call__(self, *arg): return unknown_obj()
    def __getitem__(self, key): return unknown_obj()
    def __getattr__(self, name): return unknown_obj()
 
The three methods are: __call__ for function calls (*arg means arg is the argument list), __getitem__ for the visit to members using '[]', such as a[3] and 3 is the key, __getattr__ just like we mentioned, for any visit to members using '.'. So almost every kind of codes is legal to an object like this. For example:

A few differences between IE7 and FF3, what we discovered in coding

There are of course more of them, but we only list which will bring
confusion to our code. Note that the current version is based on IE,
not FF, since its more vulnerable.

I don't know how to write HTML in this blog, so i hope i can make them clear without examples.

1. Both in IE and FF, we can use the ID of a DOM object to call it. But we cannot always use 'document.id' to call it. In FF, document.f (f is id of a form) is undefined, but in IE, document.i (i is id of an image) and some other DOMs is undefined.

Something about python: __setattr__ and __getattr__

It seems that there was some problems in this blog system, and i was busy with my final exam, so i haven't written blog a long time since the project starts.

But the work has been going on. I've been spent some time studying on the language faculty of javascript, and comparing it with python. Though this two are both scripting language, python is somehow much stronger. We'll see this from the differences between the setter/getter function and __setattr__/__getattr__ method in python.

First, let's see what's in javascript. Use the example from mozilla website:

Syndicate content