Skip to main content

Utilities for rapid text file processing using Intel Hyperscan in Python

Project description

HyperGrep

os: linux python: 3.10+ python style: google imports: isort code style: black code style: pycodestyle doc style: pydocstyle static typing: mypy linting: pylint testing: pytest security: bandit license: MIT

HyperGrep is a Python + Intel Hyperscan Global Regular Expression Processing library. While a standard grep is designed to print, this is designed to allow full control over processing matches. The library supports scanning plaintext, gzip, and ztsd compressed files for regular expressions, and customizing the action to take when matched.

For full information on the amazing performance that can be obtained through Intel Hyperscan with, refer to:
Hyperscan

Table Of Contents

Compatibility

  • Not all regex constructs are supported by Hyperscan in order to guarantee performance. For more information refer to Unsupported Constructs
  • Currently only supported on Linux. May be able to be built on Windows/OSX manually.

Getting Started

Installation

Install HyperGrep via pip:

pip install hypergrep

Or via git clone:

git clone <path to fork>
cd hypergrep
pip install .

Or build and install from wheel:

# Build locally.
git clone <path to fork>
cd hypergrep
make wheel

# Push dist/hypergrep*.tar.gz to environment where it will be installed.
pip install dist/hypergrep*.tar.gz

Examples

Read a file with the example single threaded command:

# hypergrep <regex> <file>
hypergrep/scanner.py pattern ./hypergrep/scanner.py

Read multiple files with the multithreaded hyperscanner example command:

# hypergrep <regex> <file(s)>
hypergrep pattern ./hypergrep/scanner.py

Perform custom operation on match:

from hypergrep.common import hyper_utils

def on_match(matches: list, count: int) -> None:
    for index in range(count):
        match = matches[index]
        line = match.line.decode(errors='ignore')
        print(f'Custom print: {line.rstrip()}')


hyper_utils.hyperscan(file, [pattern], on_match)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypergrep-3.0.0.tar.gz (3.7 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page