Utilities for rapid text file processing using Intel Hyperscan in Python
Project description
HyperGrep
HyperGrep is a Python + Intel Hyperscan Global Regular Expression Processing library. While a standard grep is designed to print, this is designed to allow full control over processing matches. The library supports scanning plaintext, gzip, and ztsd compressed files for regular expressions, and customizing the action to take when matched.
For full information on the amazing performance that can be obtained through Intel Hyperscan with, refer to:
Hyperscan
Table Of Contents
Compatibility
- Not all regex constructs are supported by Hyperscan in order to guarantee performance. For more information refer to Unsupported Constructs
- Currently only supported on Linux. May be able to be built on Windows/OSX manually.
Getting Started
Installation
Install HyperGrep via pip:
pip install hypergrep
Or via git clone:
git clone <path to fork>
cd hypergrep
pip install .
Or build and install from wheel:
# Build locally.
git clone <path to fork>
cd hypergrep
make wheel
# Push dist/hypergrep*.tar.gz to environment where it will be installed.
pip install dist/hypergrep*.tar.gz
Examples
Read a file with the example single threaded command:
# hypergrep <regex> <file>
hypergrep/scanner.py pattern ./hypergrep/scanner.py
Read multiple files with the multithreaded hyperscanner example command:
# hypergrep <regex> <file(s)>
hypergrep pattern ./hypergrep/scanner.py
Perform custom operation on match:
from hypergrep.common import hyper_utils
def on_match(matches: list, count: int) -> None:
for index in range(count):
match = matches[index]
line = match.line.decode(errors='ignore')
print(f'Custom print: {line.rstrip()}')
hyper_utils.hyperscan(file, [pattern], on_match)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.