PEEPDF: Adding a scoring system in peepdf

Project Name: Project 13 - PEEPDF2: Adding a scoring system in peepdf
Mentor: Jose Miguel Esparza (ES)
Student: Rohit Dua
Skills required: Python, Javascript/HTML5
Project type: Improve existing tool
Project goal: Add a scoring system to give a better idea about the maliciousness of a PDF file.

peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. The aim of this tool is to provide all the necessary components that a security researcher could need in a PDF analysis without using 3 or 4 tools to make all the tasks. With peepdf it's possible to see all the objects in the document showing the suspicious elements, supports all the most used filters and encodings, it can parse different versions of a file, object streams and encrypted files. With the installation of PyV8 and Pylibemu it provides Javascript and shellcode analysis wrappers too. Apart of this it's able to create new PDF files and to modify/obfuscate existent ones.

Currently, it is possible to identify the suspicious elements in a PDF file because they are shown in a different color (yellow). While it helps for experimented analysts or users with some experience with the PDF format and/or threat analysis, it could be difficult to understand for less skilled users. The first step to accomplish this task would be identifying the elements which permit distinguish if a PDF file is malicious or not, like Javascript code, lonely objects, huge gaps between objects, detected vulnerabilities, etc. The next step would be creating the system to obtain a score out of these elements and test it with a large collection of malicious and not malicious PDF files in order to tweak it. 

Project page:
Google code:
Use cases:


Jose and Rohit compiled a great blog post summarizing the results of their GSoC.