Skip to main content

A framework to create malware detectors based on machine learning.

Project description

DetEXE

Python 3.9

A tool to build and analyze static malware detectors, based on machine learning.

DetEXE allows the selection of different features to train malware detectors through the LightGBM framework. The project is developed in a way that users can contribute by adding new features and combining them. It also offers the the options to compare the created models and evaluate the detectors' robustness by perturbing malware files.

Installation

To install the latest version:

$ pip install detexe

Setup

  1. Set up DetEXE environment variable.
$ export DETEXE_ROOT=$PWD
  1. Create a project layout containing the needed directories to store the data of the project.
$ detexe setup
  1. Add executable samples to the benign and malware directories. You can obtain them from different sources. SOREL, ViruSshare... (As you are working with malware samples, please, take the safety measures).
  2. Configure the features_selection.txt file with the features you wish to extract from the files.
  3. In case you would like to select the feature OpCodeVectors, you will need to use previously the following command, to create the W2V model.
$ detexe opcodesw2v

How to use

CLI

  1. Train your model.
$ detexe train --model="foo"
  1. Execute adversarial attacks on your trained model.

    It is possible to select one specific attack, or all ddiferent attacks with one command:

$ detexe attack padding --model="foo" --malware="/malware/path.exe"
$ detexe attack all --model="foo" --malware="/malware/path.exe"
  1. Compare the trained models.
$ detexe compare 
  1. Search for optimal parameters to obtain better result in training. These parameters will be saved in the model directory.
$ detexe tune --model="foo"
  1. Scan a PE file with a trained model.
$ detexe scan --model="foo" --exe="/malware/path.exe"

Python

  1. Import functions and classes.
import os
from detexe import configure_layout, train_opcode_vectors, Detector, Attacker, compare
  1. Setup project directories.
os.environ["DETEXE_ROOT"] = os.path.dirname(os.path.abspath(__file__))
configure_layout()
  1. Configure the features_selection.txt file with the features you wish to extract from the files.
  2. In case you would like to select the feature OpCodeVectors, you will need to train previously the W2V model.
train_opcode_vectors()
  1. Instanciate a detector object
detector = Detector(model="model_foo", config_features="/path/to/features_selection.txt")
  1. With the instance of detector you will be able to train, tune and scan.
detector.train()  # Train the model
detector.tune()  # Tune the hyperparameters
detector.scan("/path/to/exe")  # Scan a file 
  1. The efficiency of the created models can be compared, and visualized in a created graph.
compare("model_comparation.png")
  1. Evaluate the robustness of a certain model.
attacker = Attacker(model="model_foo")
attacker.malware("/path/to/malware.exe")  # Choose the malware to ve modified for attacking the model.
attacker.all_attacks() # Choose one specific attack or all.

Add your own features

  1. Add new feature class in separated file under ./detexe/ped/features/your_feature.
  2. Update ./features_selection.txt file.

Built With

  • LIEF - A cross-platform library which can parse, modify and abstract ELF, PE and MachO formats.
  • EMBER - Elastic Malware Benchmark for Empowering Researchers.
  • SecML Malware - Python library for creating adversarial attacks against Windows Malware detectors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detexe-0.0.2.4.2.tar.gz (50.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page