Skip to main content

Fast multi-keyword search engine for text strings

Project description

Author: Stefan Behnel

What is Acora?

Acora is ‘fgrep’ for Python, a fast multi-keyword text search engine.

Based on a set of keywords, it generates a search automaton (DFA) and runs it over string input, either unicode or bytes.

It is based on the Aho-Corasick algorithm and an NFA-to-DFA transformation.

Features

  • works with unicode strings and byte strings

  • about 2-3x as fast as Python’s regular expression engine

  • finds overlapping matches, i.e. all matches of all keywords

  • support for case insensitive search (~10x as fast as ‘re’)

  • frees the GIL while searching

  • additional (slow but short) pure Python implementation

  • support for Python 2.5+ and 3.x

  • support for searching in files

How do I use it?

Import the package:

>>> from acora import AcoraBuilder

Collect some keywords:

>>> builder = AcoraBuilder('ab', 'bc', 'de')
>>> builder.add('a', 'b')

Generate the Acora search engine:

>>> ac = builder.build()

Search a string for all occurrences:

>>> ac.findall('abc')
[('a', 0), ('ab', 0), ('b', 1), ('bc', 1)]
>>> ac.findall('abde')
[('a', 0), ('ab', 0), ('b', 1), ('de', 2)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acora-1.0.tar.gz (49.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page