Skip to main content

Preprocessing for jet tagging

Project description

Code style: black codecov docs

UPP: Umami PreProcessing

This is a modular preprocessing pipeline for jet tagging. It addresses several issues with the current umami preprocessing workflow, and uses the atlas-ftag-tools package extensively.

Documentation is under construction here

Comparisons with umami

Main changes

  • modular, class-based design
  • h5 virtual datasets to wrap the source files
  • 2 main stages: resample -> merge -> done!
  • parallelised processing of flavours within a sample
  • support for different resampling "regions", which is usefull for Xbb preprocessing
  • ndim sampling support, which is also useful for Xbb
  • "new" improved training file format (which is actually just the tdd output format)
    • structured arrays are smaller on disk and therefore faster to read
    • only one dataloader is needed and can be reused for training and testing
    • other plotting scripts can support a single file format
    • normalisation/concatenation is applied on the fly during training
    • training files can contain supersets of variables used for training
  • new "countup" samping which is more efficient than pdf (it uses more the available statistics and reduces duplication of jets)
  • the code estimates the number of unique jets for you and saves this number as an attribute in the output file

Performance and LOC

Compared with a comparable preprocessing config from umami:

  1. train file size decreased by 30%
  2. train read speed improved by 30% (separate from file size reduction, by using read_direct)
  3. only one command is needed to generate all preprocessing outputs (running with --split=all will produce train/val/test files)
  4. lines of code are reduced vs umami by 4x
  5. 10x faster than default umami preprocessing (0.06 vs 0.825 hours/million jets)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umami-preprocessing-0.0.1.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

umami_preprocessing-0.0.1-py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page