The Discoverer module (see zhongjie's blog entry) has been completed.
It consists of 2 programs, the Format Discovery and Pre-Replay processing.
Format Discovery is pretty much what i've blogged about in my earlier post.
Since that entry, I've completed the to-do tasks:
1) have a function to summarise all output for this program.
2) solve a memory leak problem in this program.
3) match replay packet to format, and if length segment changes (eg: due to shellcode change), then length field needs to change.
4) from replay ip, find IP tokens and change it.
With points 1 and 2 completed, the Format Discovery is completed.
Point 3 and 4 are performed in the Pre-Replay processing. In all,it works like this:
- Inputs needed:
1) Network traces to replay
2) Shellcode and its offset
3) Target server and Local client IP
- read each packet in the traces
- 1) Scan for presence of IP
- if present, it decides automatically to replace that IP with server or client IP based on this observation:
- clients usually make requests to servers
- hence the source IP of 1st packet will be recorded as client's ip
- IP presentation could come in 3 notations.
For example, 220.127.116.11 could be representated in "18.104.22.168", "11,22,33,44" or 0x0B 0x16 0x21 0x2C
- Find a matching format, output from Format Discovery, by searching for format with the same token pattern
- From the format found, find the length field/bytes that are affected by the change in IP length
(eg: changing IP from "22.214.171.124"(in network trace) to "126.96.36.199"(user-specified IP))
- 2) Given that shellcode is present, and also its offset, also find the matching format from Format Discoverer
- This part uses a different algorithm from earlier. This is because the presence of shellcode may have changed the token pattern of its original format.
However, we observe that the packet construction is as such: [Front of Original Token Pattern][Shellcode][Back of Original Token Pattern]
Therefore, we will find all formats that have matching front and back token patterns, since they still conform to the original format.
From these formats, we find the best-matched format by a simple scoring system:
1) Matching Constant token values will be given 2 points
2) Token value (of replay packet) which is found in FD set of values in the format is given 1 point
Reason for difference in points is because values which matches exactly and are identified as constant by Discoverer should be given more weightage.
FD set of values are the possible values which the specific token could have. Because this is reliant on the quality of sample packets which Discoverer works on, they are given lesser weightage
- After format is found, old shellcode (in original network trace) is changed to the new user-input shellcode
- Corresponding length field/tokens affected by change in shellcode is also changed accordingly
These 2 tools are tested on SMB packets and ms08067.