We are very excited about the opportunity to apply for
Google Summer of Code 2007. Below
are many different projects we would like to mentor. Each project is mentored
by a highly experienced member. As an organization
dedicated to open source, all tools below will be licensed GPLv2 and
the author maintains copyright, unless
otherwise noted. In addition to Google SoC, these project can be used by
universities, graduate students, and research organizations for
topics in security research. To learn more, contact
research@honeynet.org.
The Topics:
Libemu Project
The `libemu' project aims at creating a generic x86 emulation API as an
independent library, to be used in free network security products. With
this library, it is possible to execute shellcodes or data suspected to
be a shellcode on a virtual, emulated processor and behave the
shellcode's actions. This allows a program using the library to
generically detect shellcodes in network streams and to automatically
analyze the shellcode's behaviour.
Two possible candidates for inclusion are the snort stream processor,
creating a generic, vulnerability independent alert for unencoded
shellcodes seen on the network and the nepenthes low interaction
honeypot. Nepenthes currently uses static signatures, shellcodes are
matched against, to learn how the malware can be downloaded from a
malicious host. Usage of this library allows nepenthes to generically
understand previously unknown shellcodes.
Internally, the library emulates major parts of the x386 instruction
sets and modifies the emulated registers. Mapping of abstract
information about system libraries allows onto the emulated virtual
memory allows the library to detect and emulate API usage, hence
understanding the `actions', the shellcode performs. Not substantially
failing in using API is a sure indicator for the presence of a
shellcode, hence providing generic shellcode detection. Requirements
include
- Your C is as perfect as your assembly.
- You really love the x86 architecture and ever wanted to read all
docs about it.
- You know windows as good as linux as you will emulate a windows
process enviroment on a linux development system.
- You can team up with other developers
Mentored by Georg Wicherski
of the German Honeynet Project.
Random Server Response Selection
The problem deals with generating unbiased input of servers (or more
precisely server requests) to client honeypots. Currently, researchers
guess where to find malicious servers (SPAM, specific topic areas, typo
DNS, etc.) and task a client honeypot in inspecting these areas for
malicious servers. This approach is likely to identify malicious servers,
but the conclusion about the malicious server is impeded by the input
that was used. E.g. one can state that the number of malicious servers
on adult content that is indexed by search engine Google is declining.
However, make a more general statement about malicious servers on the
Internet. In order to do so, it is necessary to select input of servers
in an unbiased manner.
Besides allowing a more general conclusion about malicious servers on the
Internet, random input also allows to shine light into areas that noone
as shown light to yet. There might be areas on the Internet that have not
been considered to serve malicious content and a random selection is likely
to investigate these areas. This should allow us to create heuristics
that allow to search malicious servers in a more efficient way.
Student would be required to implement a tool (preferably in Java) that
allows her to select web pages at random. This is not as trivial as building
a crawler and selecting web pages at random intervals, since this would
introduce bias towards pages that have a high degree of pages that link
to that page. In addition, a simple crawler would neglect web pages that
are isolated islands on the Internet, such as servers SPAM messages point to.
The student is expected
- review a handful of papers in the literature that deals with selection of web pages at random
- develop a short proposal on how she believes a web page can be selected at random.
- develop the tool and unit tests
- create documentation of the tool
- integrate the tool into a HoneyC Queuer component
- development experience with Java
- experience developing unit tests
- experience in web development
Mentored by Christian Seirfert
of the New Zealand Honeynet Project.
Monitoring of Botnets With Advanced Communication Channels
One of the main problems in today's Internet are botnets. A botnet can be defined as a
network of compromised machines that can be remotely controlled by an attacker. On every
compromised machine, a so called bot is installed which establishes a connection to a
remote control network by which the attacker can issue arbitrary commands. Typical examples
for these remote control networks are IRC networks or HTTP servers, but obfuscated /
encrypted and even Peer-to-Peer based communication channels have been observed in the
last few years. Botnets can be used by an attacker for many malicious activities: carrying
out Distributed Denial-of-Service (DDoS) attacks, sending out millions of spam e-mails,
stealing sensitive information from the compromised machines, or seeding new malware are just a few examples.
Honeypots are very good at capturing information about botnets. For example, the honeypot
software nepenthes is able to automatically collect a binary of autonomous spreading malware
by emulating the vulnerable parts of a network service. We are thus able to automatically
collect bots and similar malware. With the help of an automated analysis process, for example
with tools like Norman Sandbox or CWSandbox, it is often possible to extract all information
relevant to a given botnet, e.g., for IRC-based botnets we can extract information about the
botnet's Command & Control server, channel, nickname and similar information. Based on this
information, we can then start to track the botnet.
However, if the botnet uses non-standard protocols or advanced communication channels like
obfuscated / encrypted or Peer-to-Peer based communication, tracking becomes rather harder.
One possibility to track this kind of botnets is to execute the bot within a controlled environment
and then observe what the bot is doing. A honeypot can be used as a basic block for such a
controlled environment: due to the Data Control and Date Capture mechanisms offered by a honeynet,
we can execute the bot on a honeypot and then observe what it is doing. Based on the collected data
it is then possible to infere for what purposes the botnet is used.
Within this project, the necessary infrastructure to track advanced botnets should be developed: a
given bot binary is uploaded to a honeypot and executed. For a predefined period of time, all
communication of the bot binary with other hosts on the Internet is observed and an analysis report
is generated. Moreover, several tools for Data Analysis should be developed in order to make the
whole analysis process easier. In order to enable a scalable approach which can also track several
botnets on one physical machine, different virtualization and emulation tools should be evaluated
whether or not they can be used.
Mentored by Thorsten Holz
of the German Honeynet Project.
Data Analysis
Of all of our listed ideas, this one is the most broad of them all. We have a variety of
different data analysis challenges as we capture and analyze a variety of different data.
We are looking for a variety of different participants in this, including helping with
- Extending the Sebek data capture tool.
The Honeynet Project wish to extend the functionality of the Sebek
client (running on multiple operating systems) and associated command
line tools to allow automatic mapping of attacker source IP address to
captured keystrokes. Currently, correlating the source IP address of a
hostile network connection to the associated keystrokes and IO activity
is a highly manual task, and adding such a capability will enable much
more powerful and automated data analysis tools to be produced.
The mentored student will review current codebase and a small body of
published materials, before attempting to extend the Sebek client on one
or more OSes so that Sebek command line tools return the attacker's
source IP in their output, or at least a best approximation to the
source of the network activity spawning shell activity. Alternative
approaches will also be considered. Strong C development skills and good knowledge of one or more OS kernels
and network sockets would be essential.
- Developing an IRC data explorer.
Attackers often install IRC servers or bots on compromised honeypots,
resulting in sizeable plain text data sets that often contain multiple
IRC channels, talkers and languages. Manual review of IRC data is very
time consuming, and efforts to visualise IRC activity are currently
rather basic.
The student will review previous approaches and then attempt to develop
an application, ideally a web based GUI, that enables analysts to
process and manipulate IRC data extracted from pcap files (extraction
mechanisms already available). A session based approach will be taken,
with the application enabling effective parsing, filtering, searching,
tagging and classification of interesting content. Optional extensions
include automated assessment of language per talker and channel,
multi-user collaboration and data/session persistence, and exploring
effective methods of visualising IRC activity over time.
Development would ideally be in python, and experience with databases
and web development frameworks such as TurboGears would be useful,
although no particular approach is set and alternatives can also be
considered.
- Network security data analysis tool development.
The student will join the Honeynet Project's existing Data Analysis team
and help develop one or more components or areas of functionality within
a significant Data Analysis framework currently under development.
Possible areas of interest include Afterglow style visualisations,
automated generation of dynamic event timelines, modules for distributed
trending, etc.
Python development experience would be ideal, as would knowledge of
databases and some prior exposure to network data structures, however
exact areas of project focus within the larger initiative will depend
upon the student's particular skill sets.
- Centralised malware collection and analysis.
The Honeynet Project is about to launch a project to consolidate as much
existing data from current discrete malware collection efforts as
possible into a single secure central malware repository, and then
expand current malware collection capabilities by producing and
supporting a set downloadable malware capture systems to be made freely
available for download from honeynet.org. This will allow interested
parties to automatically capture malware locally and optionally upload
samples to the central repository.
To support this initiative, back end and front end malware analysis and
reporting systems need to be developed, and centralised automated
analysis of all collected malware will need to be performed using tools
like CWSandbox, VirusTotal, Norman Sandbox, etc. New tools will also be
developed to improve our capabilities in this area, in particular in
regards to automated analysis and classification of malware samples.
Depending upon technical experience, students will be involved in
developing and enhancing the system at all levels from collection
through analysis and trend reporting.
- A revised and improved Honeywall UI.
The Honeynet Project's Honeywall data capture
and data control tool.
Students will be involved in extending and overhauling the current basic
web management and analysis interface, applying the latest web 2.0 style
techniques to the interface and enabling much more use of dynamic
client/server interaction. Depending upon areas of experience,
individual areas of functionality will be assigned from within a larger
set of objectives, with students working alongside existing Honeynet
Project development team members.
Mentored by David Watson
of the UK Honeynet Project.
|