skip to navigation
skip to content

Not Logged In

pyra 0.2.5dev

A python implementation of the GCL region algebra and query language described by Clarke et al.

Latest Version: 0.2.6dev

pyra - Python Region Algebra
============================


Pyra is a python implementation of the region algebra and query language described in [1].
Region algebras are used to efficiently query semi-structured text documents. For a quick
online introduction to this region algebra, and why it is useful, visit:

[Wumpus Search Docs]http://www.wumpus-search.org/docs/gcl.html)

In general, region algebras are good for extracting data from documents that have lightweight structure
(semi-structured), and are an alternative to more heavyweight solutions like XPath queries.


Algebra and Query Language
===========================

Our region algebra consists of the following elements:
(Essentially identical to the conventions used in Wumpus [See above])

Elementary Types
—---------------
"token" Tokens are quoted strings. Use \" to escape quotes, and \\ to escape escapes
"a", "b", "c" Phrases are comma separated tokens
INT Positions are indicated as bare integers (e.g., 4071)
[INT] Lengths are indicated as integers inside square brackets

Operators (here A and B are arbitrary region algebra expressions)
-----------------------------------------------------------------
A .. B Extent that starts with A and ends with B
A > B Extent A contains extent B
A < B Extent A contain is contained in extent B

More operators will be implemented in the future. With these 3, we can do a lot (see below)


Examples
========

Suppose we have indexed the complete works of Shakespeare, as an XML-like document. We can then
run the following queries using pyra:


Return the titles of all plays, acts, scenes, etc.

""..""

Results:
slice(15,23): the tragedy of antony and cleopatra
slice(68,72): dramatis personae
slice(279,283): act i
slice(284,295): scene i alexandria a room in cleopatra s palace
slice(1097,1105): scene ii the same another room
slice(3526,3534): scene iii the same another room
slice(4889,4898): scene iv rome octavius caesar s house
slice(5885,5893): scene v alexandria cleopatra s palace

... And, many more ...


Return the titles of all plays
(i.e., the first title found in the play)

("".."") < ("<play>".."")

Results:
slice(15,23): the tragedy of antony and cleopatra
slice(40514,40522): all s well that ends well
slice(75567,75573): as you like it
slice(107909,107915): the comedy of errors
slice(130779,130785): the tragedy of coriolanus
slice(173424,173427): cymbeline
slice(214962,214969): a midsummer night s dream
slice(239304,239313): the tragedy of hamlet prince of denmark

... And, many more ...


Return the titles of all plays containing the word 'henry'

(("".."") < ("<play>".."")) > "henry"

Results:
slice(322005,322014): the second part of henry the fourth
slice(361126,361134): the life of henry the fifth
slice(399220,399229): the first part of henry the sixth
slice(431541,431550): the second part of henry the sixth
slice(469240,469249): the third part of henry the sixth
slice(505920,505932): the famous history of the life of henry the ei...<br><br> <br>Return short play titles (4 or few words)<br>(Note: We have to include the tags in the token count)<br><br> (("<title>".."") < ("<play>".."")) < [6]

Results:
slice(75567,75573): as you like it
slice(107909,107915): the comedy of errors
slice(130779,130785): the tragedy of coriolanus
slice(173424,173427): cymbeline
slice(677133,677138): measure for measure
slice(744759,744765): the tragedy of macbeth
slice(771553,771559): the merchant of venice
slice(802540,802546): much ado about nothing
slice(875994,876000): pericles prince of tyre
slice(1081750,1081754): the tempest
slice(1233968,1233974): the winter s tale


Return the title of all plays containing the phrase 'to be or not to be'

(("".."") < ("<play>".."")) < (("<play>".."</play>") > ("to", "be", "or", "not", "to", "be"))

Results:
slice(239304,239313): the tragedy of hamlet prince of denmark




References
==========

[1] Clarke, C. L., Cormack, G. V., & Burkowski, F. J. (1995). An algebra for structured text search
and a framework for its implementation. The Computer Journal, 38(1), 43-56. Chicago  
File Type Py Version Uploaded on Size
pyra-0.2.5dev.tar.gz (md5) Source 2014-02-12 11KB
  • Downloads (All Versions):
  • 44 downloads in the last day
  • 336 downloads in the last week
  • 755 downloads in the last month
  • Author: Adam Fourney
  • Home Page: http://github.com/afourney/pyra
  • License:
    Copyright (c) 2014, Adam Fourney
    All rights reserved.
    
    Redistribution and use in source and binary forms, with or without modification,
    are permitted provided that the following conditions are met:
    
    * Redistributions of source code must retain the above copyright notice, this
      list of conditions and the following disclaimer.
    
    * Redistributions in binary form must reproduce the above copyright notice, this
      list of conditions and the following disclaimer in the documentation and/or
      other materials provided with the distribution.
    
    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
    ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
    WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
    ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
    (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
    LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
    ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
    SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Package Index Owner: afourney
  • DOAP record: pyra-0.2.5dev.xml