Skip to main content

Higher level query system for the zope.catalog

Project description

Hurry Query

The hurry query system for the zope.catalog builds on its catalog indexes, as well as the indexes in zc.catalog. It is in part inspired by AdvancedQuery for Zope 2 by Dieter Maurer, though has an independent origin.

Setup

Let’s define a simple content object. First its interface:

>>> from zope.interface import Interface, Attribute, implements
>>> class IContent(Interface):
...     f1 = Attribute('f1')
...     f2 = Attribute('f2')
...     f3 = Attribute('f3')
...     f4 = Attribute('f4')
...     t1 = Attribute('t1')
...     t2 = Attribute('t2')

And its implementation:

>>> from zope.container.contained import Contained
>>> class Content(Contained):
...     implements(IContent)
...     def __init__(self, id, f1='', f2='', f3='', f4='', t1='', t2=''):
...         self.id = id
...         self.f1 = f1
...         self.f2 = f2
...         self.f3 = f3
...         self.f4 = f4
...         self.t1 = t1
...         self.t2 = t2
...     def __cmp__(self, other):
...         return cmp(self.id, other.id)

The id attribute is just so we can identify objects we find again easily. By including the __cmp__ method we make sure search results can be stably sorted.

We use a fake int id utility here so we can test independent of the full-blown zope environment:

>>> from zope import interface
>>> import zope.intid.interfaces
>>> class DummyIntId(object):
...     interface.implements(zope.intid.interfaces.IIntIds)
...     MARKER = '__dummy_int_id__'
...     def __init__(self):
...         self.counter = 0
...         self.data = {}
...     def register(self, obj):
...         intid = getattr(obj, self.MARKER, None)
...         if intid is None:
...             setattr(obj, self.MARKER, self.counter)
...             self.data[self.counter] = obj
...             intid = self.counter
...             self.counter += 1
...         return intid
...     def getObject(self, intid):
...         return self.data[intid]
...     def __iter__(self):
...         return iter(self.data)
>>> intid = DummyIntId()
>>> from zope.component import provideUtility
>>> provideUtility(intid, zope.intid.interfaces.IIntIds)

Now let’s register a catalog:

>>> from zope.catalog.interfaces import ICatalog
>>> from zope.catalog.catalog import Catalog
>>> catalog = Catalog()
>>> provideUtility(catalog, ICatalog, 'catalog1')

And set it up with various indexes:

>>> from zope.catalog.field import FieldIndex
>>> from zope.catalog.text import TextIndex
>>> catalog['f1'] = FieldIndex('f1', IContent)
>>> catalog['f2'] = FieldIndex('f2', IContent)
>>> catalog['f3'] = FieldIndex('f3', IContent)
>>> catalog['f4'] = FieldIndex('f4', IContent)
>>> catalog['t1'] = TextIndex('t1', IContent)
>>> catalog['t2'] = TextIndex('t2', IContent)

Now let’s create some objects so that they’ll be cataloged:

>>> content = [
... Content(1, 'a', 'b', 'd'),
... Content(2, 'a', 'c'),
... Content(3, 'X', 'c'),
... Content(4, 'a', 'b', 'e'),
... Content(5, 'X', 'b', 'e'),
... Content(6, 'Y', 'Z')]

And catalog them now:

>>> for entry in content:
...     catalog.index_doc(intid.register(entry), entry)

Now let’s register a query utility:

>>> from hurry.query.query import Query
>>> from hurry.query.interfaces import IQuery
>>> provideUtility(Query(), IQuery)

Set up some code to make querying and display the result easy:

>>> from zope.component import getUtility
>>> from hurry.query.interfaces import IQuery
>>> def displayQuery(q):
...     query = getUtility(IQuery)
...     r = query.searchResults(q)
...     return [e.id for e in sorted(list(r))]

FieldIndex Queries

Now for a query where f1 equals a:

>>> from hurry.query import Eq
>>> f1 = ('catalog1', 'f1')
>>> displayQuery(Eq(f1, 'a'))
[1, 2, 4]

Not equals (this is more efficient than the generic ~ operator):

>>> from hurry.query import NotEq
>>> displayQuery(NotEq(f1, 'a'))
[3, 5, 6]

Testing whether a field is in a set:

>>> from hurry.query import In
>>> displayQuery(In(f1, ['a', 'X']))
[1, 2, 3, 4, 5]

Whether documents are in a specified range:

>>> from hurry.query import Between
>>> displayQuery(Between(f1, 'X', 'Y'))
[3, 5, 6]

You can leave out one end of the range:

>>> displayQuery(Between(f1, 'X', None)) # 'X' < 'a'
[1, 2, 3, 4, 5, 6]
>>> displayQuery(Between(f1, None, 'X'))
[3, 5]

You can also use greater-equals and lesser-equals for the same purpose:

>>> from hurry.query import Ge, Le
>>> displayQuery(Ge(f1, 'X'))
[1, 2, 3, 4, 5, 6]
>>> displayQuery(Le(f1, 'X'))
[3, 5]

It’s also possible to use not with the ~ operator:

>>> displayQuery(~Eq(f1, 'a'))
[3, 5, 6]

Using and (&):

>>> f2 = ('catalog1', 'f2')
>>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b'))
[1, 4]

Using or (|):

>>> displayQuery(Eq(f1, 'a') | Eq(f2, 'b'))
[1, 2, 4, 5]

These can be chained:

>>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b') & Between(f1, 'a', 'b'))
[1, 4]
>>> displayQuery(Eq(f1, 'a') | Eq(f1, 'X') | Eq(f2, 'b'))
[1, 2, 3, 4, 5]

And nested:

>>> displayQuery((Eq(f1, 'a') | Eq(f1, 'X')) & (Eq(f2, 'b') | Eq(f2, 'c')))
[1, 2, 3, 4, 5]

“and” and “or” can also be spelled differently:

>>> from hurry.query import And, Or
>>> displayQuery(And(Eq(f1, 'a'), Eq(f2, 'b')))
[1, 4]
>>> displayQuery(Or(Eq(f1, 'a'), Eq(f2, 'b')))
[1, 2, 4, 5]

Combination of In and &

A combination of ‘In’ and ‘&’:

>>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z']))
[1, 2, 3, 4, 5, 6]
>>> displayQuery(In(f1, ['Z']))
[]
>>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z']) & In(f1, ['Z']))
[]

SetIndex queries

The SetIndex is defined in zc.catalog. Let’s make a catalog which uses it:

>>> intid = DummyIntId()
>>> provideUtility(intid, zope.intid.interfaces.IIntIds)
>>> from zope.catalog.interfaces import ICatalog
>>> from zope.catalog.catalog import Catalog
>>> catalog = Catalog()
>>> provideUtility(catalog, ICatalog, 'catalog1')
>>> from zc.catalog.catalogindex import SetIndex
>>> catalog['f1'] = SetIndex('f1', IContent)
>>> catalog['f2'] = FieldIndex('f2', IContent)

First let’s set up some new data:

>>> content = [
... Content(1, ['a', 'b', 'c'], 1),
... Content(2, ['a'], 1),
... Content(3, ['b'], 1),
... Content(4, ['c', 'd'], 2),
... Content(5, ['b', 'c'], 2),
... Content(6, ['a', 'c'], 2)]

And catalog them now:

>>> for entry in content:
...     catalog.index_doc(intid.register(entry), entry)

Now do a a ‘any of’ query, which returns all documents that contain any of the values listed:

>>> from hurry.query.set import AnyOf
>>> displayQuery(AnyOf(f1, ['a', 'c']))
[1, 2, 4, 5, 6]
>>> displayQuery(AnyOf(f1, ['c', 'b']))
[1, 3, 4, 5, 6]
>>> displayQuery(AnyOf(f1, ['a']))
[1, 2, 6]

Do a ‘all of’ query, which returns all documents that contain all of the values listed:

>>> from hurry.query.set import AllOf
>>> displayQuery(AllOf(f1, ['a']))
[1, 2, 6]
>>> displayQuery(AllOf(f1, ['a', 'b']))
[1]
>>> displayQuery(AllOf(f1, ['a', 'c']))
[1, 6]

We can combine this with other queries:

>>> displayQuery(AnyOf(f1, ['a']) & Eq(f2, 1))
[1, 2]

ValueIndex queries

The ValueIndex is defined in zc.catalog and provides a generalization of the standard field index.

>>> from hurry.query import value

Let’s set up a catalog that uses this index. The ValueIndex is defined in zc.catalog. Let’s make a catalog which uses it:

>>> intid = DummyIntId()
>>> provideUtility(intid, zope.intid.interfaces.IIntIds)
>>> from zope.catalog.interfaces import ICatalog
>>> from zope.catalog.catalog import Catalog
>>> catalog = Catalog()
>>> provideUtility(catalog, ICatalog, 'catalog1')
>>> from zc.catalog.catalogindex import ValueIndex
>>> catalog['f1'] = ValueIndex('f1', IContent)

Next we set up some content data to fill the indices:

>>> content = [
... Content(1, 'a'),
... Content(2, 'b'),
... Content(3, 'c'),
... Content(4, 'd'),
... Content(5, 'c'),
... Content(6, 'a')]

And catalog them now:

>>> for entry in content:
...     catalog.index_doc(intid.register(entry), entry)

Let’s now query for all objects where f1 equals ‘a’:

>>> f1 = ('catalog1', 'f1')
>>> displayQuery(value.Eq(f1, 'a'))
[1, 6]

Next, let’s find all objects where f1 does not equal ‘a’; this is more efficient than the generic ~ operator:

>>> displayQuery(value.NotEq(f1, 'a'))
[2, 3, 4, 5]

If all the items in the catalog satisfy the NotEq condition, the query does not crash.

>>> displayQuery(value.NotEq(f1, 'z'))
[1, 2, 3, 4, 5, 6]

You can also query for all objects where the value of f1 is in a set of values:

>>> displayQuery(value.In(f1, ['a', 'd']))
[1, 4, 6]

The next interesting set of queries allows you to make evaluations of the values. For example, you can ask for all objects between a certain set of values:

>>> displayQuery(value.Between(f1, 'a', 'c'))
[1, 2, 3, 5, 6]
>>> displayQuery(value.Between(f1, 'a', 'c', exclude_min=True))
[2, 3, 5]
>>> displayQuery(value.Between(f1, 'a', 'c', exclude_max=True))
[1, 2, 6]
>>> displayQuery(value.Between(f1, 'a', 'c',
...                            exclude_min=True, exclude_max=True))
[2]

You can also leave out one end of the range:

>>> displayQuery(value.Between(f1, 'c', None))
[3, 4, 5]
>>> displayQuery(value.Between(f1, None, 'c'))
[1, 2, 3, 5, 6]

You can also use greater-equals and lesser-equals for the same purpose:

>>> displayQuery(value.Ge(f1, 'c'))
[3, 4, 5]
>>> displayQuery(value.Le(f1, 'c'))
[1, 2, 3, 5, 6]

Of course, you can chain those queries with the others as demonstrated before.

The value module also supports zc.catalog extents. The first query is ExtentAny, which returns all douments matching the extent. If the the extent is None, all document ids are returned:

>>> displayQuery(value.ExtentAny(f1, None))
[1, 2, 3, 4, 5, 6]

If we now create an extent that is only in the scope of the first four documents,

>>> from zc.catalog.extentcatalog import FilterExtent
>>> extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(4):
...     extent.add(i, i)

then only the first four are returned:

>>> displayQuery(value.ExtentAny(f1, extent))
[1, 2, 3, 4]

The opposite query is the ExtentNone query, which returns all ids in the extent that are not in the index:

>>> id = intid.register(Content(7, 'b'))
>>> id = intid.register(Content(8, 'c'))
>>> id = intid.register(Content(9, 'a'))
>>> extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(9):
...     extent.add(i, i)
>>> displayQuery(value.ExtentNone(f1, extent))
[7, 8, 9]

CHANGES

1.0.0 (2009-11-30)

  • Refresh dependencies. Use zope.catalog and zope.intid instead of zope.app.catalog and zope.app.intid respectively. Don’t zope.app.zapi.

  • Make package description more modern.

  • Clean up the code style.

0.9.3 (2008-09-29)

  • BUG: NotEq query no longer fails when all values in the index satisfy the NotEq condition.

0.9.2 (2006-09-22)

  • First release on the cheeseshop.

0.9.1 (2006-06-16)

  • Make zc.catalog a dependency of hurry.query.

0.9 (2006-05-16)

  • Separate hurry.query from the other hurry packages. Eggification work.

  • Support for ValueIndex from zc.catalog.

0.8 (2006-05-01)

Initial public release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hurry.query-1.0.0.tar.gz (12.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page