UPDATE: the log data is posted here. A notification group about new log sharing is here.
This WASL 2009 workshop reminded me that I always used to bitch that some academic researchers use antediluvian data sets for their research (Lincoln labs 1998 set used in 2008 “security research” makes me want to just curse and kick people in the balls, then laugh, then cry, then cry more…).
However, why are they doing it? Don’t they realize that testing their “innovative intrusion detection” or “neural network-based log analysis” on such prehistoric data will not render it relevant to today’s threats? And will only ensure ensuing hilarity 🙂
Well, maybe the explanation is simpler: there is no public, real-world source of logs that allows comparison between different security research efforts.
I hereby promise to make my collection of real-world logs (mostly collected from the honeypots run in 2004-2006) public (UPDATE: logs available here). Here is the description of the collection:
Size: 100MB compressed; about 1GB uncompressed (more is available upon request)
Date collected: 2006
Type: Linux logs /var/log/messages, /var/log/secure, process accounting records/var/log/pacct, other Linux logs, Apache web server logs /var/log/httpd/access_log, /var/log/httpd/error-log, /var/log/httpd/referer-log and /var/log/httpd/audit_log, Sendmail /var/log/mailog, Squid /var/log/squid/access_log, /var/log/squid/store_log, /var/log/squid/cache_log, etc. Firewall and Snort NIDS logs are also available.
License: public; use for whatever you want. Acknowledging the source is nice;Beerware license is even better.
Sanitization: No additional sanitization is required before use for research.
So, for now, if your research requires real-world logs with normal operation data, suspicious data, anomalous data and attack data – drop me an email or get them here.