Project 3 - Cuckoo Sandbox

Student: Abdulellah Alsaheel
Primary mentor: Claudio Guarnieri
Backup mentor: Dario Fernandes, Tiffany Youzhi Bao

Google Melange: http://www.google-melange.com/gsoc/project/google/gsoc2012/cs_saheel/19001

Project Overview:
Cuckoo Sandbox is an Automated Malware Analysis developed by Claudio Guarnieri,
mainly Cuckoo is a lightweight solution that performs automated dynamic analysis of
provided Windows binaries. It is able to return comprehensive reports on key API calls and network activity.
Project Plan:
• April 23rd - May20th: Community Bonding Period
• May 21st - May 27th: reviewing libwireshark implementations & reading its documentations.
• May 28st - May 31th: get myself familiarize with SWIG (Simplified Wrapper and
Interface Generator) in order to binding wireshark dissectors.
• June 1st - June 7th: Evaluate the use of libwireshark bindings.
• June 8th - June 30th: write bindings for libwireshark by using SWIG this if it was viable,
otherwise I will shift to use Scapy to write protocols dissectors for these protocols:
TCP, UDP, ICMP, DNS, HTTP, FTP, IRC, SMB, SIP, TELNET, SMTP, SSH, IMAP and POP.
• July 1th – July 9th: protocols dissectors testing.
July 9th - July 13th: Mid Term Assessments
• July 10th – July 25th: developing a component to reconstruct the data streams and
to recover the downloaded files whenever it is possible.
• July 26th – July 29th: testing and code refactoring for the previous component.
• July 30th - Aug 13th: integrating all the work in Cuckoo with reports generating.
August 13th: Suggested "pencils down" date, coding close to done
• Aug 14th – Aug 20th: documentation preparation.
August 20th: Firm "pencils down" date, coding must be done
August 24th - August 27th: Final Assessments
August 31st - Public code uploaded and available to Google

Notes:
All code developed must respect Cuckoo coding guidelines (essentially PEP-8),
fully documented in doctoring format and when possible it must comes with unit tests.

Project Deliverables:
improve cuckoo's ability of analyzing network traffic by delivering these components:
- protocols dissectors.
- reconstructing data streams with recovering downloaded files.

Project Source Code Repository:
https://github.com/cssaheel/dissectors
https://github.com/cuckoobox/cuckoo
Student Weekly Blog:
https://www.honeynet.or/blog/340

Project Useful Links:
http://www.cuckoobox.org/index.php
http://malwr.com/

Project Updates:
Project Updates:
June 4th:
Done last week:
- read "Interfacing C/C++ and Python with SWIG". (Status : Passed).
- Testing SWIG with test units. (Status : Passed).
- Testing SWIG with some of the header files .h of debian's libwireshark. (Status : Passed).
- Start to compile etheral in order to understand the way of building libwireshark.

Planned for next week:
- complete compiling etheral.
- From SWIG get the result file (file_wrap.c) for each header file (file.h) in the debian libwireshark.
- modify etheral's build scripts to include every result file (file_wrap.c) to be built with etheral. This in order to have shared objects .so files which can be imported from python directly.

June 10th:
Done last week:
- complete compiling etheral. (Status : Passed).
- modify etheral's build scripts to include every result file (file_wrap.c) to be built with etheral. (Status : Not Passed ).
- wrote a brief report on the issues that i have faced with writing a wireshark bindings.

Planned for next week:
- by using Scapy to read pcap files, i will implement the base classes which will be used for reading any protocol packet.

June 18th:
Done last week:
- implemented a library which takes a ".cap" file and extract its packets and then dissects the packets' fields and returns them as a list, the code is here https://github.com/cssaheel/dissectors/blob/master/dissector.py
- implemented a simple program which uses the previous library to demonstrate the usage, the code is here https://github.com/cssaheel/dissectors/blob/master/usedissector.py
- wrote a brief comparison between scapy and other libraries.

Planned for next week:
- start adding the rest of the required protocols which are:
HTTP, FTP, IRC, SIP, TELNET, SMTP, SSH, IMAP and POP.
this in order to get them dissected as well.

June 24th:
Done last week:
- I have implemented 5 protocols dissectors:
1- http dissector:
https://github.com/cssaheel/dissectors/blob/master/http.py [RFC2616]
2- FTP dissector:
https://github.com/cssaheel/dissectors/blob/master/ftp.py [RFC959]
3- IRC dissector:
https://github.com/cssaheel/dissectors/blob/master/irc.py [RFC1459]
4- SIP dissector:
https://github.com/cssaheel/dissectors/blob/master/sip.py [RFC2543]
5- TELNET dissector:
https://github.com/cssaheel/dissectors/blob/master/telnet.py
[RFC856,RFC857,RFC858,RFC859,RFC860,RFC726,RFC652,RFC653,RFC654,RFC655,RFC656,RFC657,
RFC658,RFC698,RFC727,RFC735,RFC732,RFC1043,RFC734,RFC736,RFC749,RFC779,RFC1091,RFC885,
RFC927,RFC933,RFC946,RFC1041,RFC1053,RFC1073,RFC1079,RFC1372,RFC1184,RFC1096,RFC1408,
RFC1571,RFC2941,RFC2946,RFC1572,RFC1647,RFC2217]
also I have tested them and they are working fine, obviously my resources to implement the dissectors was depend on RFC files & also I were testing and comparing the result of my dissectors with wireshark results.
Planned for next week:
- continue adding the rest of the required protocols which are:
SMTP, SSH, IMAP and POP.
and I will improve the previous implemented protocols.

July 2nd:
Done last week:
the previous week focused on debugging the previous dissectors. also i have implemented 2 protocols dissectors:
1- SMTP dissector:
https://github.com/cssaheel/dissectors/blob/master/smtp.py [RFC2821]
2- SSH dissector:
https://github.com/cssaheel/dissectors/blob/master/ssh.py [RFC4250,RFC4251,RFC4252,RFC4253,RFC4254]
even so SSH dissector is a massive one and still it does need some modifications and more testing and debugging.

Note: my resources to implement the dissectors was depend on RFC files & also I were testing and comparing the results of my dissectors with wireshark results.

July 9:
Done last week:
the previous week focused on debugging the previous dissectors especially SSH, FTP and SMTP.
- implemented 2 protocols dissectors:
1- IMAP dissector:
https://github.com/cssaheel/dissectors/blob/master/imap.py
2- POP dissector:
https://github.com/cssaheel/dissectors/blob/master/pop.py
- applied PEP-8 style code on all of the code files.

Planned for next week:
continue improve the implemented code.

July 16:
Done last week:
the previous week focused on applying some modifications to comply with cuckoo's way of code.
this has involved:
1- how the data should be represented.
2- returns only the useful data, so no empty fields will be sent out of the library.
3- comments on all of the code files, which are similar to cuckoo's comments.

Planned for next week:
follow with the next tasks in the plan.

July 22:
Done last week:
1- tracing the output of the library and modifying it to comply with cuckoo's code.
2- code testing and improving.

Planned for next week:
start the study of how to reconstruct tcp streams.

July 30:
Done last week:
implemented tcp streams reconstruction for these protocols:
HTTP, FTP, IRC, SIP, IMAP and POP.

Planned for next week:
follow implementing tcp streams reconstruction for the rest of the protocols.

August 6:
Done last week:
- implemented tcp streams reconstruction for these protocols:
SSH, TELNET, SMTP.

Planned for next week:
start implementing downloaded files recovery for http, ftp and smtp.

August 13:
Done last week:
- downloaded files recovery feature for http, ftp and smtp protocols.

Planned for next week:
- start preparing the documentation.

August 17:
Done last week:
- the documentation has been prepared https://github.com/cssaheel/dissectors/blob/master/documentation.pdf