GSoC 2010 Proposed Ideas

Please note that GSoC 2011 has now successfully completed. This content is being retained for reference only.

GSoC Project Ideas

Below is a list of project ideas that we were keen to develop during GSoC 2010 (you can find previous GSoC project ideas here). We are always also interested in hearing any ideas for additional relevant honeynet-related R&D projects (although remember that to qualify for receiving GSoC funding from Google your project’s deliverables need to fit in to GSoC’s 3-month project timescales!). If you have a suitable and interesting project, we’ll always try and find the right resources to mentor it and support you. We are also always looking for volunteers who are enthusiastic and interested in getting involved in honeynet R&D.

Each sponsored GSoC project will have one or more mentors available to provide a guaranteed contact point to students, plus one or more technical advisors to help applicants with the technical direction and delivery of the project (often the original author of the tool or its current maintainer, and usually someone recognised as an international expert in their particular field). Our Google Summer of Code organisational administrators will also be available to all sponsored GSoC students for general advice and logistical support. For all questions about the Honeynet Project, the GSoC program or our projects, please contact us on #gsoc-honeynet on irc.freenode.net or email us at [email protected] To learn more about the Google Summer of Code event, see the the GSoC Website.

Project 1 – Improve our low interaction client honeypot PHoneyC

Attacks against Internet users are increasingly delivered through web browsers via client side exploits. Browser scripts have become a major client-side exploit delivery mechanism, with drive-by download attacks using obfuscated Javascript becoming the dominant form. Client honeypots have been developed to access potentially malicious web content and attempt to determine whether the content returned is malicious or not. Low-interaction client honeypots have the advantages of higher-performance, greater scalability and lower resource consumption than high-interaction client honeypot, and are used in many Internet initiatives such as the Google Safe-Browsing Project. However, to achieve higher detection rates, low-interaction client honeypots must develop effective deobfuscation mechanisms to deal with obfuscated Javscript.

Honeynet Project members have been working on a low interaction, emulated client honeypot called PHoneyC that attempts to detect malicious content in the wild in a number of ways. It is designed to be faster and more scalable than traditional high interaction client honeypots (you can find the latest draft of Jose’s LEET 09 paper here for background reading). Development over the past year (including under GSoC 2009) has added detection of malicious shellcode within javascript byte code using the LibEmu generic x86 emulator for shellcode detection, integration of Pyprofjsploit proof of concept into PHoneyC, refactoring code, better DOM emulation, merging different branches and adding new malicious PDF analysis capabilities. PHoneyC is now regularly used as a drive-by download attack detection system or for analysis of suspicious URLs.

However, we would like to add further additional features to make PHoneyC still more useful to analysts. These include:

1. Abnormal based detection such as mentioned in the Wepawet white paper released in 2010
2. An automated signature generation system
3. A global proxy, sample & url sharing system
4. A database of malicious CLSIDs vs exploits / CVE IDs

Skills required:

C programming, Python programming, good understanding of Javascript and the DOM model

Mentors:

Jose Nazario (US), Angelo Dell’Aera (IT), Georg Wicherski (DE), Chengyu Song (CN), Jianwei Zhuge (CN) and Thanh Nguyen (VN)

Project 2 – PHP/RFI Sandbox

Rationale: Currently Honeynet Project and security community members routinely collect remote file inclusion (RFI) requests in many server and honeypot logs. RFI attacks usually send PHP or Perl code to the web server. However, we lack a dedicated RFI sandbox specifically to run and analyze potentially PHP and Perl malcode.

RFI attacks are still largely uncharacterized but have increasingly been used to build substantial attack botnets. Analysts currently have to spend considerable amounts of time manually inspecting RFIs to classify them and, if they are bots, characterizing them. Automated sandbox analysis would substantially increase our analysis capabilities, enabling analysts to focus on the results rather than the manual analysis and classification.

We have a couple of previous projects that might be able to form a basis for this work, or the project could be built from scratch. Previous relevant projects are:A project called pKaji (written by M Hafiz of the Malaysian MyCERT chapter)Some basic python scripts from last year’s Glastopf project (written by Lukas Rist of our Chicago Chapter), which currently parse sandbox HTML output, extracts the C&C information and stores it into a sqlite database. There is also a small bot which joins an IRC server and channel to collect some information that is used in analysis of PHP files collected by Glastopf

Requirements:Support PHP and PerlAccurateConfigurable (run duration, ports to block, connection throttles, etc)ThoroughSecure (support chroot, to minimize risk of sandbox compromise)Provide a simple, human readable distillation of results akin to norman sandbox, wepawet, Anubis sandbox, CWSandbox, joebox, etc.Easily deployable by others (e.g. something someone could download and install as a simple software package)

Inputs (support all as alternatives):URLScriptProgrammatic (e.g. pipelined into existing honeypot sensor deployments)Should support collecting sensor info, too

Outputs:XML report (schema to be determined, base it on CWS 2.x)Network connectionsSupport SMTP, HTTP, IRC distillationsSubprocesses and argsLocal system read and write informationDeobfuscated code if neededAV classification

License:
TBD but something that would allow for anyone else to use openly, maybe GPL or BSD targetting a publicly accessible release.

Milestones:
1. PHP (or Perl) wrapper takes a PHP script in, reads its configuration, and provides a summary of what the RFI script does. Alternative: PHP interpreter extension (funcall)
2. Store results in an SQL DB (support MySQL, pgSQL)
3. Web UI usable by othersSearch by various parameters, reports, etcDisplay sandbox resultsAccept uploads and inputs, similar functionality as wepawet or anubis

4. Deployable package anyone can download, set up, and use within a short time.

A candidate solution would probably target PHP5 and some of the built-in extensions to avoid having to hack the interpreter (which is prone to being outdated quickly). There are extensions that support function overloading (e.g. define a hook before the real function to log the arguments and result codes). Perl has this support built into a base module.

Background reading:
Jose has provided some additional background reading. Also see the online documentation for funcall and perlsub.

Skills required:
PHP, Perl or Python programming, experience with database and web based UI design, good understanding of web servers and web requests

Mentors:
Jose Nazario (US), M Hafiz (MY) and Hugo Gonzales (MX)

Project 3 – Improving the Dionaea low interaction honeypot

Honeynet Project members have developed a number of solutions for emulating vulnerable computer systems and automatically collecting attacks against them. Honeypots such as Nepenthes and HoneyTrap have proven to be successful at capturing known attacks, but have generally proved difficult to extend and add signatures to for newly discovered vulnerabilities. They have also struggled to reliably detect and capture previously unknown, zero day exploits. Shellcode emulation in LibEmu has helped, but integration with existing honeypots has been demanding.

Dionaea was another of our successful GSoC 2009 projects, developed by Markus Koetter and now being used as a next generation replacement for traditional low interaction network based malware detectors such as NepenthesDionaea includes detection of unknown attacks via LibEmu and better updatability and scalability.

During GSoC2010 we’d like to extend Dionaea further by adding:

– A VoIP vulnerability module
– A SIP module, perhaps using Skinny IM client
– UDP popup spam catching
– Retrofitting submit-http support, to help people move between Nepenthes and Dionaea sensors without having to change their submission backend at the same time
– Matching attacks to Snort signatures, so that a Snort attack ID was logged with any detected exploits, OS fingerprints, attacker details where a fingerprint exists
– Integration with Nebula, etc for dynamic generation of Snort signatures from successfully detected attacks
– Polished XMPP submission server and downloadable backend user interface for viewing attackers submitted over XMPP and HTTP
– ntlmv2 authentication for smb + testsuite (can be metasploit) (easy)
– support for the 10 most used dce rpc calls which are not implemented yet, a testsuite to verify them working correctly (most likely windows binaries using the windows api), and documentation why (e.g. SAMR.Connect4 or SAMR.Connect5) is of any use for the attacker. (nasty)
– the dce rpc call which are used by conficker to bruteforce accounts (easy)
– gss api negotiation for smb + testsuite (can be smbclient) (nightmare)
– modify incidents to allow carrying lists and dicts in c and python (average?)
– maybe support for MS10-012/CVE-2010-0020 – if required?

We believe that this project is important because existing low interaction honeypots are used by a wide range of researchers and organisations to study internet attacks, so increasing attack detection rates will potentially benefit many people with interests in this area.

Skills required:

C programming, Python programming, understanding Windows x86 shellcode and exploit/malware propagation
Previous experience with DionaeaNepenthesHoneyTrap and LibEmu would be very useful

Mentors:

Markus Koetter (DE) and Mark Schloesser (DE)

Project 4 – VOIP (SIP) honeypots

The goal of this project is to design a low interaction SIP honeypot that passively listens for nefarious SIP traffic, and integrate this into the Dionaeaframework (see project details above), so that the pre-existing reporting modules can be used to document details of attacks.

VoIP with SIP is becoming the de-facto standard for voice communication on the Internet. As this technology becomes more common, malicious parties have more opportunity and stronger motive to take control of these systems to conduct nefarious activities.

This project is intended to mature and extend the functionality of existing honeypots (mostly implemented in python and simple shell scripts, so that SIP (UDP port 5060) can be integrated as another module within Dionaea.

Dionaea was one of our GSoC 2009 projects, developed by Markus Koetter. It is considered the next generation replacement for the Nepenthesproject. Note that there

This project will likely have 2 parts, which may be able to split between a team of two, but preferable a single student will complete the project.

Design of low interaction SIP honeypot (3 weeks)
Testing of honeypot (2 weeks, in parallel with Dionaea integration)
Design of integration as a Dionaea module (3 weeks)

References:
Dionaea framework – http://dionaea.carnivore.it/

SIP related honeynet references:
http://www.usken.no/
https://honeynet.org.au/?q=phoneynet_part2
http://sipvicious.org/

Skills required:
Python/C/C++
Working knowledge of SIP protocol
Must work closely with the Dionaea developers to integrate module.

Mentors:
Sjur Usken (NO), Ben Reardon (AU), Markus Koetter (DE) and Mark Schloesser (DE)

Project 5 – Skype Honeypot

Spam/spim, phishing lures and drive-by-download URLs are increasingly populated through instant messenger clients like Skype. We would like to build a Skype honeypot that was capable of simultaneously logging in to multiple Skype accounts that would be created with usernames and profiles likely to be found by malicious searchers. The system may be high interaction, such as launching multiple Skype logins and then automatically capturing/scraping screen output within a desktop environment, or may be low interaction honeypot that emulates a Skype client via utilisation of the Skype API. The output from the system would be an audit trail of suspect URLs, timestamps, senders and message bodies suitable for use as an input into a client honeypot or sandbox system (and ideally, if time allows, attempt to visualise attacker activity in a simple UI).

Skills:

The Skype protocol is notorious for being a ‘black-box’, although some research has been done in this field. Unless the student is exceptionally capable, we’d suggest this project doesn’t become and attempt to reverse the skype protocol (bringing with it associated legal issues such as extracting keys from binaries). Some background information:

“On each login session, Skype generates a session key from 192 random bits. The session key is encrypted with the hard-coded login server’s 1536-bit RSA key to form an encrypted session key.”

http://en.wikipedia.org/wiki/Skype_protocol

Suitable skills for developing UI automation, data extraction and results presentation using the tools of your choosing.

Mentors:

David Watson (UK) or one of the Norwegian Honeynet Project (NO)

Project 6 – Mobile device honeypot

As mobile devices and smartphones such as the iPhone, Android or Blackberry become increasingly powerful and important, attackers are increasingly targeting them. Most of the honeynet technologies developed over the past decade have yet to be ported to mobile platforms. This project would attempt to create either a low or high interaction honeypot solution for mobile devices. Challenges include the closed nature of some mobile handsets (and associated need for jailbreaking), network connections to mobile carrier networks being less easily examinable than over traditional wired IP networks, and the generic lack of mobile forensics tools. The output of this project would be a prototype mobile honeypot device capable of logging and reporting network based attacks and possible compromises of mobile devices.

Skills:

Experience working with mobile device development relevant to the selected platform, Java, Objective C, python, etc.
Depending upon student experience and whether emulation or real hardware is to be used, it might be easiest to specify Android (since you can unlock Android phones easily and officially, unlike the iPhone, which can be jail-broken but this may change at a future date) and there is more open sourced development information available.

http://devsushi.com/2007/11/15/getting-started-with-the-blackberry-java-development-environment-jde/

Mentors:

Jamie Riden (UK), Thorsten Holz (DE), David Watson (UK) or one of the Norwegian Honeynet Project (NO)

Project 7 – Developing an Instant Messenger Honeypot

We are currently only aware of two published approaches to employing honeypot principles to analyze attacks using instant messaging as their transmission vector:

HoneyIM: Fast Detection and Suppression of Instant Messaging Malware in Enterprise-like Networks

Analyzing Network and Content Characteristics of Spim using Honeypots

Both previously published methods employ a high interaction honeypot approach, requiring a complete operating system with common IM clients installed as the honeypot.

We would like to build a low interaction IM honeypot client that would allow for easier installation, greater performance, ease of management and updates, plus standardised reporting.

Requirements:

– Support for common IM protocols, such as at least OASCAR (ICQ, AIM), MSN and maybe XMPP (GoogleTalk) and Yahoo Messenger.

– Able to detect malicious instant messages.

– Able to analyze malicious content using sandboxes (CWSandbox, Anubis, joebox) and honeyclients (PHoneyC, CaptureHPC).

– Storing results in a database

– User interface with visualization of malicious messages that have been received

Skills required:

C/Perl/Python programming, experience with database and web based UI design, good understanding of web servers and web requests

Mentors:

M Hafiz (MY) and Hugo Gonzales (MX)

Project 8 – Botnet Command & Control Spy/Monitor

There are a number of closed source or restricted release projects that are intended to allow security analysts to covertly join and monitor botnet command & control (C&C) channels, for research and monitoring purposes. Most are modified IRC clients with extended logging but relatively limited functionality.

We believe that it would be useful for the security community to have a powerful open source tool to monitor all common botnet C&C infrastructures and this project would provide the basis for such a tool to be developed.

Requirements:

– Able to handle at least HTTP and irc C&C infrastructures
– Able to provide detailed information about the C&C server
– In the case of irc: collecting information about other bots in channel and the bot herders
– Risk minimisation and stealth/obfuscation of operator’s location
– Logging to a database
– Scalability to handle many simultaneous connections and updates
– User interface capable of managing information from large amounts of botnet activity
– If time allows, attempt to mime a drone from the currently monitored botnet interms of requesting data and proper responses on the botherders commands, based on results from the PHP sandbox or observed behavior from other drones

Skills required:

Python programming, experience with database and web based UI design, good understanding of web servers and web requests

Mentors:

M Hafiz (MY) and Angelo Dell’Aera (IT)

Project 9 – Glastopf Honeypot Improvements

Glastopf was one of last year’s successful GSoC project, with Lukas Rist as the sponsored Student. Currently Sven Vetsch, another student from Switzerland, is rewriting the Glastopf architecture to be more capable to handle the complexity the project has now reached. The results will be his bachelor thesis, but there are a number of additional ideas that we would also like to see added to Glastopf, including:

– Glastopf sandbox interaction module
Create a module for the Glastopf honeypot which offers a generic interface to use sandboxes to analyze the collected malicious scripts and executables

– Glastopf SQL Injection simulator module
Create a module for the Glastopf honeypot which is capable of analyzing SQL Injection attacks and generates appropriate responses (low interaction SQL injection honeypot)

– Glastopf template based response module
Create a module for the Glastopf honeypot which can determine which common web applications (such as TYPO3, Joomla, WordPress, Drupal etc.) are being attacked and dynamically present itself as the victim system

– Glastopf Sensor
Create a lightweight sensor which can forward all requests to a central installation of a full-featured Glastopf. The main Glastopf server would also need an additional interface to manage such lightweight sensors.

Skills required:

Python programming, experience with database and web based UI design, good understanding of web servers and web requests

Mentors:

Sven Vetsch (DE)

Project 10 – Improve our high interaction client honeypot Capture-HPC

Capture-HPC is one of our most actively developed public projects (including during GSoC 2009). Capture-HPC provides a method of driving a real high interaction windows system running within a virtual machine to potentially malicious websites, obtained from sources such as spam or DNS typosquatting. State changes to the VM are monitored and malicious activity is detected by measuring unexpected changes. It is regularly used in surveys of malicious websites and has been extended to support a number of Internet enabled applications and file formats. CaptureBAT is the original behavioural analysis tool that Capture-HPC is based on, using Windows API hooking to monitor state.

Capture-HPC has been widely used and been described as the state-of-art high interaction client honeypot system in many academic papers, but has several drawbacks:

1. It does not contain fine grained attack detection mechanism, i.e. Capture-HPC cannot tell us which vulnerability is exploited, or is likely being exploited
2. it is more like a malware analysis system for downloaded malware rather than detection system for drive-by download attack. If downloaded malware does not perform any malicious activities, there might be a false negative
3. It does not handle the ‘insufficient’ plugin problem
4. The recovery system still relies on snapshot system from VMware, which is slow.

The goal of this project would be to continue the current planned development of Capture-HPC and CaptureBAT, addressing these issues and continuing to advance Capture-HPC, possibly splitting the workload into several individual projects depending on a student’s experience with client honeypots and drive by downloads. We also seek input for the future development roadmap of Capture-HPC v3.

We believe that continuing to improve Capture-HPC will encourage more automated analysis of malicious websites, helping to detect new generations of client focused attacks and further improve web browser security for Internet users.

Skills required:

C programming, Java programming, familiarity with Windows and Internet Explorer internals

Mentors:

Peter Komisarczuk (NZ), Chengyu Song (CN) and Jianwei Zhuge (CN)

Project 11 – Improve reliability and stealth of Capture-BAT to run in sandboxes

This project is related to project 10. It is based on Capture-HPC which is one of our most actively developed public projects. Capture-HPC provides a method of driving a real high interaction windows system running within a
virtual machine to potentially malicious websites, obtained from sources such as spam, DNS typosquatting or scanning the Internet. State changes to the operating system and VM are monitored and malicious activity is detected by measuring unexpected changes. It is regularly used in surveys of malicious websites and has been extended to support a number of Internetenabled applications and file formats. Capture-BAT is the original behavioural analysis tool that Capture-HPC is based on, using Windows API hooking to monitor operating system state.

Capture-HPC has been widely used and been described as the state-of-art high interaction client honeypot system in many academic papers, but has several drawbacks:

1. An attacker can detect that Capture-BAT and Capture-HPC is installed on a system through registry entries, processes/services and API hooks etc. We need to effectively hide Capture-BAT and HPC. You will investigate how a
potential attacker can detect these and then find ways to mitigate the detection.

2. The current version of Capture-HPC can analyse over one hundred thousand URL’s before needing a reboot. Can you analyse where the problem is and make the system more reliable? This may include developing the system to support
different virtualisation solutions such as VirtualBox, or to use bare metal systems.

3. Work with project 10 to integrate developments so that we have fine grained detection, better download detection and provide more reliablity (i.e. less false positives) in the system

Like project 10 the goal of this project would be to continue the current planned development of Capture-HPC and CaptureBAT, addressing these issues and continuing to advance Capture-HPC, possibly splitting the workload into several individual projects depending on a student’s experience with client honeypots and drive by downloads. We also seek input for the future development roadmap of Capture-HPC v3.

We believe that continuing to improve Capture-HPC will encourage more automated analysis of malicious websites, helping to detect new generations of client focused attacks and further improve web browser security for Internet users.

Skills:

C programming, Java programming, familiarity with Windows and Internet Explorer internals, and familiarisation with virutalisation technology, some scripting may be advantageous but not essential.

Mentors:

Peter Komisarczuk (NZ)

Project 12 – Improve high interaction honeypot capabilities

GSoC 2009 saw the first attempts to move high interaction client honeypot data collection from kernel/user space into the hypervisor layer (with Qebek). We have been slowly continuing this work and would like to further improve our virtual machine introspection capabilities (not necessarily using the Qebek qemu based approach, this project could be an alternative solution based on the pros and cons we have learned over the past year).

High interaction honeypot project goals for 2010 are:

1. Develop a module lister that can run completely external to the kernel and list the DLLs loaded by a process and the drivers loaded by kernel
2. Develop a more powerful network monitor
3. Develop a solution that works not only with Windows but Linux guest OSes within the hypervisor too
4. Expand work on QEBEK to hook non-keyboard input, extract downloaded files, etc
5. Develop a VMSafe API based introspection solution for use in high interaction honeypots

And also to continue improving our existing host based high interaction honeypot solution Sebek too, to:

1. Add support for new OSes like Vista and Win7, mostly on porting the network monitor to the new network filtering system and kernel socket that replaced TDI
2. Evaluate moving to a bootkit based solution

Skills required:

C programming, kernel driver programming, familiarity with Windows or Linux internals, virtualisation.

Mentors:

Chengyu Song (CN, Win32/VMI), Jianwei Zhuge (CN, Win32/VMI), Brian Hay (US, Win32/VMI), Rob McMillen (US, Linux/VMI), Eugene Teo (SG, Linux), Georg Wicherski (DE, VMI), Ron Dodge (US, VMI), Thanh Nguyen (VN)

Project 13 – Infected host detection through DNS analysis

Organizations are already using DNS information to detect infections. This project could develop a toolkit for detecting network based infections in an intelligent manner.

Using a list of known bad domains or fully qualified DNS names, it’s obvious that potentially infected hosts can be identified. Perhaps a little less obvious would be to use DNS requests as a behavioral detection mechanism to more accurately pick out infected hosts.

This could potentially be a Snort plugin or a completely separate set of tools. The requirements are simply that a set of known bad domains are accepted as input and DNS data (either log data or packets) .

The output of the project would be a solution that uses log/packet analysis of DNS traffic to identify infected networked hosts.

Some previous work was done by the Malaysian Honeynet Project chapter, who developed a similar project for tracking malicious DNS domain name queries. They developed this project to keep tracking for Conficker DNS infection queries, with some example output available here and here.

Skills:

C or Python programming, good knowledge of network traffic protocols and IDS signatures

Mentors:

Jeff Nathan (US), Tillmann Werner (DE), Felix Leder (DE), Angelo Dell’Aera (IT), Mahmud Ab Rahman (MY) and Hugo Gonzales (MX)

Project 14 – TraceExploit: Replay the collected network trace to perform successful exploit

During the deployment and operation of distributed Honeynet projects such as the Honeynet Project’s GDH effort, we have collected a large amount of network traces that carry the server-side exploits, among which there may be valuable exploits targeting 0day vulnerabilities.

If we have a mature tool that can provide the replay functionality to reconstruct an exploit scenario using just the collected trace and the targeted service, (with the adaptability of different hostname, ip, port-number, session cookie, version) without the original exploit code (that we generally cannot collect in our honeynet), we can still expose the exploit and perform vulnerability analysis, demonstrating the value of honeynets and perform additional forensics, etc.

Although this idea has been studied in some academic research efforts, such asProtocol-Independent Adaptive Replay of Application Dialog[NDSS’06], and Replayer: Automatic Protocol Replay by Binary Analysis[CCS’06], we are not aware of any open source or free tools which provide such functionalities. Furthermore, we can have more advanced expectations on packaging the exploit dialogs based on the collected exploit trace, just like the best well-known Metasploit, for example, using different shellcode payload, targeting different platforms and versions, and other features that you can propose.

Skills:

C or Python programming, good knowledge of network traffic protocols, shell codes and exploits

Mentors:

Jianwei Zhuge (CN), Tillmann Werner (DE), Felix Leder (DE) and Thanh Nguyen (VN)

Project 15 – A uniform sandbox/sandnet with data collection capabilities

Many Honeynet Project and security community memebrs could benefit from a fully open sourced sandbox/sandnet solution to either locally analyse their malware collected, send malware samples to a central analysis platform, or be a node in analysis cluster architecture helping the community. Various public sandboxes exist (Threatexpert, Anubis, CWSandbox …) and some chapters have their own solution, be barebones or virtualised but all those may lack a standard analysis model and some tools to extract critical information even-though they all may complement each other.

The proposed analysis infrastructure would probably be based on VirtualBox for its ease of use and performance, plus scripts to control it and some added tools for the analysis.

The proposed architecture is intended to be globaly deployed and provide all the chapters with at least a simple analysis, while plugins or modules could extend the functionalities.

A few main components could fit in place together to help organise such infrastructure:
• scheduler (take from a repository) and decide what, when and how long to analyse (start workers)
• workers (sandboxes), windows based, working on a simple to administer virtual machine (while allowing barebone systems) – virtualbox proposed
• data-collectors and analysers (script and tools), in a modular way to perform analysis (extract critical information)

This would be designed to work regardless of the number of components and location. It would allow clustering and scaling.
• a local C&C with a few systems locally (one server for example)
• a local C&C with a few systems locally or in the near region (different servers in the vicinity)
• a global C&C with a lot more worldwide located nodes (global honeynet distributed system)

The latter could use the power of all the nodes deployed when idle.

Skills:

TBC

Mentors:

Felix Leder (DE), Thanh Nguyen (VN) and Nicolas Collery (FR/SG)

Project 16 – Honeeebox Data Management Interface (developing a user interface for analysing collected low interaction honeypot data

Honeynet Project members have developed a number of leading open source low interaction honeypot solutions that are used to automatically record data about network based malware attacks, such as Nepenthes and HoneyTrap. We have a number of active international sensor deployments to collect malware globally and are in the process of rolling out a larger low interaction sensor network during 2009. However, currently there is no publicly available web based reporting interface available for users of low interaction sensor systems.

The goal of this project would be to implement a web based user interface and management reporting tool to allow analysts to easily explore large amounts of malware data. Typical tasks will be to search for high level trends (growth of a particular malware strain over time, attacks from a certain location on a particular day, etc). End users will be the operators of malware collection sensors or interested analysts within the secuirty community.

As input, the system will take reasonably simple CSV type data from low interaction malware sensors (such as timestamp, source IP, attack type, attacker IP address, MD5sum, etc in the form of an HTTP POST). This data is then automatically enriched by submitting the malware binary samples to multiple sandbox and antivirus engines for analysis (both public and private). The output from this post processing analysis is usally returned as XML or text after a short period, by HTTP or email. We also perform IP geo-location and ASN resolution against IP address to provide more information about sources, including latitude and longitutude for spatial mapping.

This data will be persisted in a database, procesed and then presented via a new web interface to multiple distributed analyst users. This interesting project and malware data set provides many potential data analysis, information presentation and information security data visualisation options for interested GSoC students. We have a number of prototype reporting interface examples available internally, or you are free to develop a new system from scratch. Background reading and design inspiration might be found by looking at how leading network security and antivirus vendors or opensource groups current present similar information, or by applying skills you bring to the project from your personal experiences and specialisms. Successful students will also be lucky enough to have access to a number of the leading subject matter experts in this field as technical advisors.

We believe that this project is important to the community as it will help researchers to more easily understand the types of attacks routinely occuring on the Internet today.

Skills required:

Probably Python and Javacript programming plus some database experience, although any suitable previous web development and user interface development experience would be good. We are happy to support whatever development toolkit you are most capable in, and follow a development approach of releasing small updates often, for maximum user feedback.

Mentor: David Watson (UK)

Project 17 – Interactive Visualisation of Honeynet Data

Honeynets generate a great deal of data and the steps involved in using this data to provide meaningful information are challenging. Some of the challenges we are facing in visualization include missing data, data integrity, data formatting, data manipulation, data normalization, visualizing time series, interactive visualizations, and the development of dashboard approaches to visualization. While there are numerous potential projects associated with the visualization of honeynet data, we are particularly interested in increasing our suite of data visualization tools by addressing some of these challenges.

Current high priorities include:

1) Interactive visualizations of data over time
2) development of an interactive visualization platform for standard data sets like NepenthesDianaeaSnort, and others
3) innovative dashboard approaches to visualization

The goal of this project is to develop an interactive vizualisation platform for honeynet data. A Browser/Flash application would be the end product, perhaps using a visual development tools such as processing.org, a Java like platform, or another suitable tool of your choosing. The goal would be to provide better rapid visualisation tools for network attacks. The input would be taking standard data sets like NepenthesDionaeaSnort, etc and convert them into effective, informative visualisations.

Some recent examples hacked together by Honeynet Project members:

– Conficker Timeline (David)
– HonEeeBox (David)
– HonEeeBox demo1 (David)
– HonEeeBox demo2 (David)
– http://dataviz.com.au/gallery.html (Ben)
– SecViz.org

Some background reading/viewing.

Skills:

Experience with data processing and graphical visualisation packages. Enthusiasm and willing to explore ideas.

Mentors:

Mentors: Ben Reardon (AU), Kara Nance (US), Sebastien Tricaud (FR), David Watson (UK) and Hugo Gonzales (MX)

Project 18 – Log file anonymization

A library to perform generic log files anonymization. Right now we have a problem with datasharing. People are scared to do that. The idea of this log anonymization library would be to help having anonymous logs consistent between logs and network captures.

LogAnon is a log anonymization project that targets anyone who wants to share logs but is scared to do so because it may contain sensitive data. Please note this project is just a hobby, won’t be big and professional like FLAIM (http://flaim.ncsa.illinois.edu).

LogAnon mission statement are:

* Anonymize various logs at glance in a consistent manner. IPs appearing in a PCAP and in Syslog will be anonymized equally
* Provide a simple API
* Core written in C, Python bindings
* Cross platform (read compiles with gcc *and* visual)

We are starting the library from scratch, good project management skills are required.

Skills:

Skills: C programming, valgrind, Python with ctypes

Mentors:

Sebastien Tricaud (FR), Jamie Riden (UK), Brian Hay (US) and Hugo Gonzales (MX)

Project 19 – Honeywall UI redevelopment

As anyone who has used it knows only too well, the walleye user interface for the honeywall needs some serious UI TLC! This project would replace the current relatively static web based UI with a more dynamic interface that takes advantage of recent advances in web development.

Skills required:

Web development experience (Python and Javascript programming preferred), experience with database and web based UI design, good understanding of web applications, previous exposure to network and honeynet data types (or the Honeywall) useful.

Mentors:

Rob McMillen (US) and David Watson (UK)

Project 20 – Design and Implement the Portal of HoneyCloud Service

Description:

In the future, The Honeynet Project may provide a HoneyCloud Service for the security community which integrating all kinds of Internet threat monitoring, detection, analysis and tracking services, including Dionaea (LI Honeypot), Honeywall and Sebek (HI Honeypot), Capture-HPC (HI honeypot), PHoneyC (LI Client Honeypot) and other tools and services under development, etc.

The most flexible portal to access the HoneyCloud Service for normal users is IM (Instant Messenging), so we are planning to design and implement the HoneyCould Portal Service.

The requirements include:

– A RBAC-based user/service management system, which provides flexible mechanisms for different roles of users, to register and request the services in the HoneyCloud.

– A robot that provides the communication interface between the users and the HoneyCloud services, more intelligence dealing with natural language querying is better.

– Design and implement universal and flexible interface between Portal and the services in the HoneyCloud, then the services can be integrated with minimum development efforts. The protocol between Portal and the Services uses XMPP or any other protocol that you feel would be superior.

– Cooperate with other developers to integrate sample HoneyCloud services(e.g. PHoneyC, Dionaea).

– The Portal should support XMPP (GoogleTalk) for user interface, supporting other IM protocols like OASCAR (ICQ, AIM), MSN, QQ and Yahoo Messenger is additional, to provide the extensive access scope of users with their prefered IM software. Libpurple may be good assistant for achieving this.

– Web-based UI that provides the status and statistical information of the registered HoneyCloud services and the service providing records.

Skills:

C/C++(or Java), Python programming, experience with database and web based UI design, good understanding of Cloud Computing, IM protocols and software

Mentors: Jianwei Zhuge (CN)Groups: