Student: Sarthak Khattar (GitHub)
Mentors: Matteo Lodi and Eshaan Bansal
Organization: The Honeynet Project
Tag: Information Security
IntelOwl is an Open Source Intelligence, or OSINT solution to get threat intelligence data about a specific file, an IP or a domain from a single API at scale. It integrates a number of analyzers available online and is for everyone who needs a single point to query for info about a specific file or observable.
I proposed a new, more robust way of verifying analyzers’ configurations via strict rules through database models/serializers and a new configuration format, allowing support for real-time feedback of configuration status/errors in the frontend. This involved refactoring a major part of the codebase, improving and redesigning the test suite for the backend, creating a new test suite for the SDKs along with the necessary changes across all repositories for the new features added. A few new analyzers were also implemented.
The overall goal was to improve the resilience of the application and make it more configurable and accessible for the end-user.
List of pull requests merged before GSoC’s coding period:
I made over 100 commits and 30 pull requests spanning over 3 project repositories, namely: IntelOwl (Django app), IntelOwl-ng (Angular app) and pyintelowl (CLI client).
The following major tasks were completed and maintained over time:
Previously, IntelOwl used a fairly simple JSON format for storing the configuration information for each and every analyzer. While this worked fine functionally, it lacked verification for it’s contents allowing typos and unsupported values going unnoticed till the analyzers were actually executed. This also meant that certain analyzers that required API Keys and other secrets were able to run, even if they were missing those secrets.
A new format was developed after a discussion with the mentors which allowed us to better express the information analyzers required to function as well as verify their integrity.
The Connectors parts were handled by Shubham and our mentors helped out a lot with code review and clearing all the doubts throughout the process.
IntelOwl provided an /api/send_analysis_request endpoint for scanning files and observables. This was slightly convoluted and lacked verification for certain request data parameters. With the recent major refactoring, it was the perfect opportunity to split this endpoint and use DRF’s serializers to enforce verification on the request parameters.
As a result, 2 endpoints were created in place of a single one: /api/analyze_file and /api/analyze_observable. New serializer classes were made, each for file and observable analysis containing verification functions for request data.
Eshaan, one of my mentors, suggested improving the existing testing suite by making it almost completely dynamic. Earlier, everytime a developer would add a new analyzer, they would have to add a new test for it as well. However, dynamic testing would allow iterating through existing as well as new analyzers, removing the need to explicitly define new tests. Moreover, the tests were now to be run asynchronously as celery tasks, decreasing the overall time of execution significantly. This would also address a number of issues in the previous testing suite (#229) like missing coverage for certain generic functions as well as tests for groups/permissions.
For this to be implemented, the entire testing suite had to be written from scratch while keeping in mind the recent refactoring changes, new config format as well as the requirements of the new test suite.
As #558 details, several issues occurred after the testing changes were implemented. These included:
I’m grateful to our mentor, Eshaan, who spent a great amount of time and effort to solve most of the above-mentioned issues which significantly sped up the process.
IntelOwl’s Python SDK, pyintelowl, was missing a robust testing suite – Class based test cases, support for running tests on GitHub CI, lack of coverage reports etc. (#65)(#106)(#59). Moreover, previously written CLI tests were superficial and lacked coverage for major functionality of the SDK.
As a result, these tests were replaced by a new and more thorough testing suite. More than 50 new tests were implemented with a coverage of almost 80% (connector related tests were added by Shubham). Tox support was added to run the tests on GitHub with codecov.io for reporting the coverage changes.
Some important analyzers like VirusTotal needed to be updated and others like ClamAV were planned to be integrated for a long time. Thus, this opportunity was taken to implement 2 new endpoints into VirusTotal as well as integrate ClamAV into IntelOwl.
I’m very glad to have worked on this amazing project and would continue to do so in the future. It has been a major source of learning for me. Some of the things I hope to tackle in the future are:
I would like to thank The Honeynet Project and Google Summer of Code for providing me with this opportunity. Special thanks to my mentors Eshaan Bansal and Matteo Lodi for being kind and helpful to me throughout this amazing journey.