Utilities for rapid text file processing using Intel Hyperscan in Python

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

HyperGrep

HyperGrep is a Python + Intel Hyperscan Global Regular Expression Processing library. While a standard grep is designed to print, this is designed to allow full control over processing matches. The library supports scanning plaintext, gzip, and ztsd compressed files for regular expressions, and customizing the action to take when matched.

For full information on the amazing performance that can be obtained through Intel Hyperscan with, refer to:
Hyperscan

Compatibility
Getting Started
- Installation
Examples

Compatibility

Not all regex constructs are supported by Hyperscan in order to guarantee performance. For more information refer to Unsupported Constructs
Currently only supported on Linux. May be able to be built on Windows/OSX manually.

Getting Started

Installation

Install HyperGrep via pip:

pip install hypergrep

Or via git clone:

git clone <path to fork>
cd hypergrep
pip install .

Or build and install from wheel:

# Build locally.
git clone <path to fork>
cd hypergrep
make wheel

# Push dist/hypergrep*.tar.gz to environment where it will be installed.
pip install dist/hypergrep*.tar.gz

Examples

Read a file with the example single threaded command:

# hypergrep <regex> <file>
hypergrep/scanner.py pattern ./hypergrep/scanner.py

Read multiple files with the multithreaded hyperscanner example command:

# hypergrep <regex> <file(s)>
hypergrep pattern ./hypergrep/scanner.py

Perform custom operation on match:

from hypergrep.common import hyper_utils

def on_match(matches: list, count: int) -> None:
    for index in range(count):
        match = matches[index]
        line = match.line.decode(errors='ignore')
        print(f'Custom print: {line.rstrip()}')


hyper_utils.hyperscan(file, [pattern], on_match)

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.2.0

Mar 17, 2024

3.1.0

Feb 3, 2024

This version

3.0.0

Jan 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypergrep-3.0.0.tar.gz (3.7 MB view hashes)

Uploaded Jan 20, 2024 Source

Hashes for hypergrep-3.0.0.tar.gz

Hashes for hypergrep-3.0.0.tar.gz
Algorithm	Hash digest
SHA256	`35957a96cc53d0e6f524d1ac3a2ad7c52db4ccb35633a217b1586e850cc8a910`
MD5	`dbac7adf05ac8e1a20e6c1674a0be5bf`
BLAKE2b-256	`6d15c58a3bbc22970243a18a91d54c942f3169307c28d76f916c13b35f146584`