skip to navigation
skip to content

Not Logged In

pyra 0.2.3dev

A python implementation of the GCL region algebra and query language described by Clarke et al.

Latest Version: 0.2.6dev

pyra - Python Region Algebra

Pyra is a python implementation of the region query algebra described in [1]. Region algebras are used to efficiently query semi-structured text documents. For a quick online introduction to this region algebra, and why it is useful, visit the [Wumpus Search Docs](http://www.wumpus-search.org/docs/gcl.html). In general, region algebras are good for extracting data from documents that have lightweight structure (semi-structured), and are an alternative to more heavyweight solutions like XPath queries.

# Setup the corpus corpus = "the quick brown fox jumps over the lazy dog and the brown dog runs away" tokens = corpus.split()

# List regions starting with 'brown' and ending with 'dog', containing # the phrase 'fox jumps over'.

iidx = InvertedIndex(tokens) g = GCL(iidx)

for s in g.Contains( g.BoundedBy( g.Term('brown'), g.Term('dog') ), g.Phrase('fox', 'jumps', 'over') ):
print s print "'%s'" % (tokens[s],)

The above prints:

slice(2,9)

'brown fox jumps over the lazy dog'

References

[1] Clarke, C. L., Cormack, G. V., & Burkowski, F. J. (1995). An algebra for structured text search
and a framework for its implementation. The Computer Journal, 38(1), 43-56. Chicago
 
File Type Py Version Uploaded on Size
pyra-0.2.3dev.tar.gz (md5) Source 2014-02-10 6KB
  • Downloads (All Versions):
  • 4 downloads in the last day
  • 47 downloads in the last week
  • 343 downloads in the last month
  • Author: Adam Fourney
  • Home Page: http://github.com/afourney/pyra
  • License:
    Copyright (c) 2014, Adam Fourney
    All rights reserved.
    
    Redistribution and use in source and binary forms, with or without modification,
    are permitted provided that the following conditions are met:
    
    * Redistributions of source code must retain the above copyright notice, this
      list of conditions and the following disclaimer.
    
    * Redistributions in binary form must reproduce the above copyright notice, this
      list of conditions and the following disclaimer in the documentation and/or
      other materials provided with the distribution.
    
    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
    ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
    WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
    ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
    (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
    LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
    ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
    SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Package Index Owner: afourney
  • DOAP record: pyra-0.2.3dev.xml