acora 1.0
Fast multi-keyword search engine for text strings
Latest Version: 1.7
Author: Stefan Behnel
What is Acora?
Acora is 'fgrep' for Python, a fast multi-keyword text search engine.
Based on a set of keywords, it generates a search automaton (DFA) and runs it over string input, either unicode or bytes.
It is based on the Aho-Corasick algorithm and an NFA-to-DFA transformation.
Features
- works with unicode strings and byte strings
- about 2-3x as fast as Python's regular expression engine
- finds overlapping matches, i.e. all matches of all keywords
- support for case insensitive search (~10x as fast as 're')
- frees the GIL while searching
- additional (slow but short) pure Python implementation
- support for Python 2.5+ and 3.x
- support for searching in files
How do I use it?
Import the package:
>>> from acora import AcoraBuilder
Collect some keywords:
>>> builder = AcoraBuilder('ab', 'bc', 'de')
>>> builder.add('a', 'b')
Generate the Acora search engine:
>>> ac = builder.build()
Search a string for all occurrences:
>>> ac.findall('abc')
[('a', 0), ('ab', 0), ('b', 1), ('bc', 1)]
>>> ac.findall('abde')
[('a', 0), ('ab', 0), ('b', 1), ('de', 2)]
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| acora-1.0.tar.gz (md5, pgp) | Source | 2010-01-29 | 48KB | 442 | |
- Author: Stefan Behnel
- Home Page: http://pypi.python.org/pypi/acora
- Download URL: http://pypi.python.org/packages/source/a/acora/acora-1.0.tar.gz
-
Categories
- Intended Audience :: Developers
- Intended Audience :: Information Technology
- License :: OSI Approved :: BSD License
- Operating System :: OS Independent
- Programming Language :: Cython
- Programming Language :: Python :: 2
- Programming Language :: Python :: 2.5
- Programming Language :: Python :: 2.6
- Programming Language :: Python :: 3
- Programming Language :: Python :: 3.1
- Topic :: Text Processing
- Package Index Owner: scoder
- DOAP record: acora-1.0.xml
