Skip to main content

A python implementation of the GCL region algebra and query language described by Clarke et al.

Project description

pyra - Python Region Algebra

Pyra is a python implementation of the region query algebra described in [1]. Region algebras are used to efficiently query semi-structured text documents. For a quick online introduction to this region algebra, and why it is useful, visit the [Wumpus Search Docs](http://www.wumpus-search.org/docs/gcl.html). In general, region algebras are good for extracting data from documents that have lightweight structure (semi-structured), and are an alternative to more heavyweight solutions like XPath queries.

# Setup the corpus corpus = “the quick brown fox jumps over the lazy dog and the brown dog runs away” tokens = corpus.split()

# List regions starting with ‘brown’ and ending with ‘dog’, containing # the phrase ‘fox jumps over’.

iidx = InvertedIndex(tokens) g = GCL(iidx)

for s in g.Contains( g.BoundedBy( g.Term(‘brown’), g.Term(‘dog’) ), g.Phrase(‘fox’, ‘jumps’, ‘over’) ):

print s print “’%s’” % (tokens[s],)

The above prints:

slice(2,9)

‘brown fox jumps over the lazy dog’

References

[1] Clarke, C. L., Cormack, G. V., & Burkowski, F. J. (1995). An algebra for structured text search

and a framework for its implementation. The Computer Journal, 38(1), 43-56. Chicago

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyra-0.2.3dev.tar.gz (6.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page