A python implementation of the GCL region algebra and query language described by Clarke et al.
Project description
pyra - Python Region Algebra
Pyra is a python implementation of the region query algebra described in [1]. Region algebras are used to efficiently query semi-structured text documents. For a quick online introduction to this region algebra, and why it is useful, visit the [Wumpus Search Docs](http://www.wumpus-search.org/docs/gcl.html). In general, region algebras are good for extracting data from documents that have lightweight structure (semi-structured), and are an alternative to more heavyweight solutions like XPath queries.
# Setup the corpus corpus = “the quick brown fox jumps over the lazy dog and the brown dog runs away” tokens = corpus.split()
# List regions starting with ‘brown’ and ending with ‘dog’, containing # the phrase ‘fox jumps over’.
iidx = InvertedIndex(tokens) g = GCL(iidx)
- for s in g.Contains( g.BoundedBy( g.Term(‘brown’), g.Term(‘dog’) ), g.Phrase(‘fox’, ‘jumps’, ‘over’) ):
print s print “’%s’” % (tokens[s],)
The above prints:
slice(2,9)
‘brown fox jumps over the lazy dog’
References
- [1] Clarke, C. L., Cormack, G. V., & Burkowski, F. J. (1995). An algebra for structured text search
and a framework for its implementation. The Computer Journal, 38(1), 43-56. Chicago
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.