spacer TO LEARN THE TOOLS, TACTICS, AND MOTIVES OF THE email the Honeynet Project
Home
About the Project
Challenges
Presentations
Whitepapers
Tools
Our Book
Funding/Donations
Status Reports
Mirrors

spacer
spacer  
Research Projects
spacer

Honeynet Project Research Topics - Last Updated 12 March, 2007.

We are very excited about the opportunity to apply for Google Summer of Code 2007. Below are many different projects we would like to mentor. Each project is mentored by a highly experienced member. As an organization dedicated to open source, all tools below will be licensed GPLv2 and the author maintains copyright, unless otherwise noted. In addition to Google SoC, these project can be used by universities, graduate students, and research organizations for topics in security research. To learn more, contact research@honeynet.org.

The Topics:

Libemu Project
The `libemu' project aims at creating a generic x86 emulation API as an independent library, to be used in free network security products. With this library, it is possible to execute shellcodes or data suspected to be a shellcode on a virtual, emulated processor and behave the shellcode's actions. This allows a program using the library to generically detect shellcodes in network streams and to automatically analyze the shellcode's behaviour. Two possible candidates for inclusion are the snort stream processor, creating a generic, vulnerability independent alert for unencoded shellcodes seen on the network and the nepenthes low interaction honeypot. Nepenthes currently uses static signatures, shellcodes are matched against, to learn how the malware can be downloaded from a malicious host. Usage of this library allows nepenthes to generically understand previously unknown shellcodes. Internally, the library emulates major parts of the x386 instruction sets and modifies the emulated registers. Mapping of abstract information about system libraries allows onto the emulated virtual memory allows the library to detect and emulate API usage, hence understanding the `actions', the shellcode performs. Not substantially failing in using API is a sure indicator for the presence of a shellcode, hence providing generic shellcode detection. Requirements include

  • Your C is as perfect as your assembly.
  • You really love the x86 architecture and ever wanted to read all docs about it.
  • You know windows as good as linux as you will emulate a windows process enviroment on a linux development system.
  • You can team up with other developers

Mentored by Georg Wicherski of the German Honeynet Project.

Random Server Response Selection
The problem deals with generating unbiased input of servers (or more precisely server requests) to client honeypots. Currently, researchers guess where to find malicious servers (SPAM, specific topic areas, typo DNS, etc.) and task a client honeypot in inspecting these areas for malicious servers. This approach is likely to identify malicious servers, but the conclusion about the malicious server is impeded by the input that was used. E.g. one can state that the number of malicious servers on adult content that is indexed by search engine Google is declining. However, make a more general statement about malicious servers on the Internet. In order to do so, it is necessary to select input of servers in an unbiased manner.

Besides allowing a more general conclusion about malicious servers on the Internet, random input also allows to shine light into areas that noone as shown light to yet. There might be areas on the Internet that have not been considered to serve malicious content and a random selection is likely to investigate these areas. This should allow us to create heuristics that allow to search malicious servers in a more efficient way. Student would be required to implement a tool (preferably in Java) that allows her to select web pages at random. This is not as trivial as building a crawler and selecting web pages at random intervals, since this would introduce bias towards pages that have a high degree of pages that link to that page. In addition, a simple crawler would neglect web pages that are isolated islands on the Internet, such as servers SPAM messages point to. The student is expected

  • review a handful of papers in the literature that deals with selection of web pages at random
  • develop a short proposal on how she believes a web page can be selected at random.
  • develop the tool and unit tests
  • create documentation of the tool
  • integrate the tool into a HoneyC Queuer component
  • development experience with Java
  • experience developing unit tests
  • experience in web development

Mentored by Christian Seirfert of the New Zealand Honeynet Project.

Monitoring of Botnets With Advanced Communication Channels
One of the main problems in today's Internet are botnets. A botnet can be defined as a network of compromised machines that can be remotely controlled by an attacker. On every compromised machine, a so called bot is installed which establishes a connection to a remote control network by which the attacker can issue arbitrary commands. Typical examples for these remote control networks are IRC networks or HTTP servers, but obfuscated / encrypted and even Peer-to-Peer based communication channels have been observed in the last few years. Botnets can be used by an attacker for many malicious activities: carrying out Distributed Denial-of-Service (DDoS) attacks, sending out millions of spam e-mails, stealing sensitive information from the compromised machines, or seeding new malware are just a few examples.

Honeypots are very good at capturing information about botnets. For example, the honeypot software nepenthes is able to automatically collect a binary of autonomous spreading malware by emulating the vulnerable parts of a network service. We are thus able to automatically collect bots and similar malware. With the help of an automated analysis process, for example with tools like Norman Sandbox or CWSandbox, it is often possible to extract all information relevant to a given botnet, e.g., for IRC-based botnets we can extract information about the botnet's Command & Control server, channel, nickname and similar information. Based on this information, we can then start to track the botnet.

However, if the botnet uses non-standard protocols or advanced communication channels like obfuscated / encrypted or Peer-to-Peer based communication, tracking becomes rather harder. One possibility to track this kind of botnets is to execute the bot within a controlled environment and then observe what the bot is doing. A honeypot can be used as a basic block for such a controlled environment: due to the Data Control and Date Capture mechanisms offered by a honeynet, we can execute the bot on a honeypot and then observe what it is doing. Based on the collected data it is then possible to infere for what purposes the botnet is used.

Within this project, the necessary infrastructure to track advanced botnets should be developed: a given bot binary is uploaded to a honeypot and executed. For a predefined period of time, all communication of the bot binary with other hosts on the Internet is observed and an analysis report is generated. Moreover, several tools for Data Analysis should be developed in order to make the whole analysis process easier. In order to enable a scalable approach which can also track several botnets on one physical machine, different virtualization and emulation tools should be evaluated whether or not they can be used.

Mentored by Thorsten Holz of the German Honeynet Project.

Data Analysis
Of all of our listed ideas, this one is the most broad of them all. We have a variety of different data analysis challenges as we capture and analyze a variety of different data. We are looking for a variety of different participants in this, including helping with

  • Extending the Sebek data capture tool. The Honeynet Project wish to extend the functionality of the Sebek client (running on multiple operating systems) and associated command line tools to allow automatic mapping of attacker source IP address to captured keystrokes. Currently, correlating the source IP address of a hostile network connection to the associated keystrokes and IO activity is a highly manual task, and adding such a capability will enable much more powerful and automated data analysis tools to be produced. The mentored student will review current codebase and a small body of published materials, before attempting to extend the Sebek client on one or more OSes so that Sebek command line tools return the attacker's source IP in their output, or at least a best approximation to the source of the network activity spawning shell activity. Alternative approaches will also be considered. Strong C development skills and good knowledge of one or more OS kernels and network sockets would be essential.
  • Developing an IRC data explorer. Attackers often install IRC servers or bots on compromised honeypots, resulting in sizeable plain text data sets that often contain multiple IRC channels, talkers and languages. Manual review of IRC data is very time consuming, and efforts to visualise IRC activity are currently rather basic. The student will review previous approaches and then attempt to develop an application, ideally a web based GUI, that enables analysts to process and manipulate IRC data extracted from pcap files (extraction mechanisms already available). A session based approach will be taken, with the application enabling effective parsing, filtering, searching, tagging and classification of interesting content. Optional extensions include automated assessment of language per talker and channel, multi-user collaboration and data/session persistence, and exploring effective methods of visualising IRC activity over time. Development would ideally be in python, and experience with databases and web development frameworks such as TurboGears would be useful, although no particular approach is set and alternatives can also be considered.
  • Network security data analysis tool development. The student will join the Honeynet Project's existing Data Analysis team and help develop one or more components or areas of functionality within a significant Data Analysis framework currently under development. Possible areas of interest include Afterglow style visualisations, automated generation of dynamic event timelines, modules for distributed trending, etc. Python development experience would be ideal, as would knowledge of databases and some prior exposure to network data structures, however exact areas of project focus within the larger initiative will depend upon the student's particular skill sets.
  • Centralised malware collection and analysis. The Honeynet Project is about to launch a project to consolidate as much existing data from current discrete malware collection efforts as possible into a single secure central malware repository, and then expand current malware collection capabilities by producing and supporting a set downloadable malware capture systems to be made freely available for download from honeynet.org. This will allow interested parties to automatically capture malware locally and optionally upload samples to the central repository. To support this initiative, back end and front end malware analysis and reporting systems need to be developed, and centralised automated analysis of all collected malware will need to be performed using tools like CWSandbox, VirusTotal, Norman Sandbox, etc. New tools will also be developed to improve our capabilities in this area, in particular in regards to automated analysis and classification of malware samples. Depending upon technical experience, students will be involved in developing and enhancing the system at all levels from collection through analysis and trend reporting.
  • A revised and improved Honeywall UI. The Honeynet Project's Honeywall data capture and data control tool. Students will be involved in extending and overhauling the current basic web management and analysis interface, applying the latest web 2.0 style techniques to the interface and enabling much more use of dynamic client/server interaction. Depending upon areas of experience, individual areas of functionality will be assigned from within a larger set of objectives, with students working alongside existing Honeynet Project development team members.

Mentored by David Watson of the UK Honeynet Project.


Back to Top