Trends in Discovery Techniques

Human attackers and automated worms were found to employ several strategies to find vulnerable systems. This section describes these strategies and identifies trends in their use.

Search-Based Strategies

PHPShell is a PHP script which allows shell commands to be executed on a web server. Typically the PHPShell script is protected by a password so only the server administrator can access it. We deployed honeypots that advertise an unrestricted PHPShell application, which attackers often tried to exploit.
PHP Shell
The majority of attacks on PHPShell honeypots that we observed were preceded by a discovery request which contained a referrer from a search engine. The search-engine queries are revealed to us by the default browser behavior, which sent the query to us as the Referer header. This technique is good for the attacker, because most of the time-consuming work of finding potentially vulnerable systems has been done by the search engine, eliminating the need for the attacker search across many different hosts themselves. Some copies of PERL/Shellbot were captured which had routines to search Google for certain scripts while other searches seem to have been performed manually. See Appendix C for example code from a captured copy of such a bot.

One disadvantage to attackers of using search engines is the new single point of failure they create. For instance, the Santy worm used Google to search for new targets, however Google started blocking Santy's queries which stopped the further spread of the worm. It should be noted that some bots have been observed which use Yahoo search and not just Google.

IP-Based Strategies

Some probes appeared to use IP-based scanning, such as several captures of the Lupper worm. When studied inside a virtual machine environment, the worm scanned a sequential range of IP addresses to see which, if any, were running a web server. If a web server was present, the worm attacked using several exploits that attempted to execute code on the server. IP-based scanning entails a relatively high cost per system infected in terms of search time and network resources, assuming a low density of targets. Worms which use search engines to locate their targets have a much lower cost per target because the search engine does the work of finding potentially vulnerable hosts.

Note that IP scanning will not work for name-based virtual hosts, a technique for hosting many websites on a single IP address that was introduced in HTTP 1.1. Using this method, the request for a web page has to contain the appropriate hostname, such as 'www.example.com' that is being asked for. Since there is no way for an IP-based scanning program to determine this name, there is no way for it to successfully exploit a site using virtual hosting. Name-based virtual hosting is popular with shared web hosting providers as they don't have to provied each website with a unique IP address.

Spider-Based Strategies

While observing hits on our honeynet, we noticed a high amount of traffic from spiders - a spider is a program which fetches a series of web pages for analysis, for example Google's and Yahoo's web crawlers. Typically a spider will announce itself as such in the 'user-agent' field of an HTTP reques such as 'Googlebot 1.0'. Other spider programs we observed announce themselves as a typical web browser while the tiny interval between successive requests shows they are running without user interaction. We have determined that the spamming attempts we received were caused by the presence of web forms on our honeypot. Search engines cannot be used to search for a form in a web site, therefore a spider or other parsing tool must have discovered a form on our honeypot. When discovered, spam was immediately inserted into the form, regardless of the more valuable shell access the honeypot advertised. This points to an automated, spider-based attacker as opposed to a human.