Skip to main content

Higher level query system for the zope.catalog

Project description

Hurry Query

The hurry query system for the zope.catalog builds on its catalog indexes, as well as the indexes in zc.catalog. It is in part inspired by AdvancedQuery for Zope 2 by Dieter Maurer, though has an independent origin.

Setup

Let’s define a simple content object. First its interface:

>>> from zope.interface import Interface, Attribute, implements
>>> class IContent(Interface):
...     f1 = Attribute('f1')
...     f2 = Attribute('f2')
...     f3 = Attribute('f3')
...     f4 = Attribute('f4')
...     t1 = Attribute('t1')
...     t2 = Attribute('t2')

And its implementation:

>>> from zope.container.contained import Contained
>>> class Content(Contained):
...     implements(IContent)
...     def __init__(self, id, f1='', f2='', f3='', f4='', t1='', t2=''):
...         self.id = id
...         self.f1 = f1
...         self.f2 = f2
...         self.f3 = f3
...         self.f4 = f4
...         self.t1 = t1
...         self.t2 = t2
...     def __cmp__(self, other):
...         return cmp(self.id, other.id)

The id attribute is just so we can identify objects we find again easily. By including the __cmp__ method we make sure search results can be stably sorted.

We use a fake int id utility here so we can test independent of the full-blown zope environment:

>>> from zope import interface
>>> import zope.intid.interfaces
>>> class DummyIntId(object):
...     interface.implements(zope.intid.interfaces.IIntIds)
...     MARKER = '__dummy_int_id__'
...     def __init__(self):
...         self.counter = 0
...         self.data = {}
...     def register(self, obj):
...         intid = getattr(obj, self.MARKER, None)
...         if intid is None:
...             setattr(obj, self.MARKER, self.counter)
...             self.data[self.counter] = obj
...             intid = self.counter
...             self.counter += 1
...         return intid
...     def getObject(self, intid):
...         return self.data[intid]
...     def __iter__(self):
...         return iter(self.data)
>>> intid = DummyIntId()
>>> from zope.component import provideUtility
>>> provideUtility(intid, zope.intid.interfaces.IIntIds)

Now let’s register a catalog:

>>> from zope.catalog.interfaces import ICatalog
>>> from zope.catalog.catalog import Catalog
>>> catalog = Catalog()
>>> provideUtility(catalog, ICatalog, 'catalog1')

And set it up with various indexes:

>>> from zope.catalog.field import FieldIndex
>>> from zope.catalog.text import TextIndex
>>> catalog['f1'] = FieldIndex('f1', IContent)
>>> catalog['f2'] = FieldIndex('f2', IContent)
>>> catalog['f3'] = FieldIndex('f3', IContent)
>>> catalog['f4'] = FieldIndex('f4', IContent)
>>> catalog['t1'] = TextIndex('t1', IContent)
>>> catalog['t2'] = TextIndex('t2', IContent)

Now let’s create some objects so that they’ll be cataloged:

>>> content = [
... Content(1, 'a', 'b', 'd'),
... Content(2, 'a', 'c'),
... Content(3, 'X', 'c'),
... Content(4, 'a', 'b', 'e'),
... Content(5, 'X', 'b', 'e'),
... Content(6, 'Y', 'Z')]

And catalog them now:

>>> for entry in content:
...     catalog.index_doc(intid.register(entry), entry)

Now let’s register a query utility:

>>> from hurry.query.query import Query
>>> from hurry.query.interfaces import IQuery
>>> provideUtility(Query(), IQuery)

Set up some code to make querying and display the result easy:

>>> from zope.component import getUtility
>>> from hurry.query.interfaces import IQuery
>>> def displayQuery(q, context=None):
...     query = getUtility(IQuery)
...     r = query.searchResults(q, context)
...     return [e.id for e in sorted(list(r))]

FieldIndex Queries

Now for a query where f1 equals a:

>>> from hurry.query import Eq
>>> f1 = ('catalog1', 'f1')
>>> displayQuery(Eq(f1, 'a'))
[1, 2, 4]

Not equals (this is more efficient than the generic ~ operator):

>>> from hurry.query import NotEq
>>> displayQuery(NotEq(f1, 'a'))
[3, 5, 6]

Testing whether a field is in a set:

>>> from hurry.query import In
>>> displayQuery(In(f1, ['a', 'X']))
[1, 2, 3, 4, 5]

Whether documents are in a specified range:

>>> from hurry.query import Between
>>> displayQuery(Between(f1, 'X', 'Y'))
[3, 5, 6]

You can leave out one end of the range:

>>> displayQuery(Between(f1, 'X', None)) # 'X' < 'a'
[1, 2, 3, 4, 5, 6]
>>> displayQuery(Between(f1, None, 'X'))
[3, 5]

You can also use greater-equals and lesser-equals for the same purpose:

>>> from hurry.query import Ge, Le
>>> displayQuery(Ge(f1, 'X'))
[1, 2, 3, 4, 5, 6]
>>> displayQuery(Le(f1, 'X'))
[3, 5]

It’s also possible to use not with the ~ operator:

>>> displayQuery(~Eq(f1, 'a'))
[3, 5, 6]

Using and (&):

>>> f2 = ('catalog1', 'f2')
>>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b'))
[1, 4]

Using or (|):

>>> displayQuery(Eq(f1, 'a') | Eq(f2, 'b'))
[1, 2, 4, 5]

These can be chained:

>>> displayQuery(Eq(f1, 'a') & Eq(f2, 'b') & Between(f1, 'a', 'b'))
[1, 4]
>>> displayQuery(Eq(f1, 'a') | Eq(f1, 'X') | Eq(f2, 'b'))
[1, 2, 3, 4, 5]

And nested:

>>> displayQuery((Eq(f1, 'a') | Eq(f1, 'X')) & (Eq(f2, 'b') | Eq(f2, 'c')))
[1, 2, 3, 4, 5]

“and” and “or” can also be spelled differently:

>>> from hurry.query import And, Or
>>> displayQuery(And(Eq(f1, 'a'), Eq(f2, 'b')))
[1, 4]
>>> displayQuery(Or(Eq(f1, 'a'), Eq(f2, 'b')))
[1, 2, 4, 5]

Combination of In and &

A combination of ‘In’ and ‘&’:

>>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z']))
[1, 2, 3, 4, 5, 6]
>>> displayQuery(In(f1, ['Z']))
[]
>>> displayQuery(In(f1, ['a', 'X', 'Y', 'Z']) & In(f1, ['Z']))
[]

SetIndex queries

The SetIndex is defined in zc.catalog. Let’s make a catalog which uses it:

>>> intid = DummyIntId()
>>> provideUtility(intid, zope.intid.interfaces.IIntIds)
>>> from zope.catalog.interfaces import ICatalog
>>> from zope.catalog.catalog import Catalog
>>> catalog = Catalog()
>>> provideUtility(catalog, ICatalog, 'catalog1')
>>> from zc.catalog.catalogindex import SetIndex
>>> catalog['f1'] = SetIndex('f1', IContent)
>>> catalog['f2'] = FieldIndex('f2', IContent)

First let’s set up some new data:

>>> content = [
... Content(1, ['a', 'b', 'c'], 1),
... Content(2, ['a'], 1),
... Content(3, ['b'], 1),
... Content(4, ['c', 'd'], 2),
... Content(5, ['b', 'c'], 2),
... Content(6, ['a', 'c'], 2)]

And catalog them now:

>>> for entry in content:
...     catalog.index_doc(intid.register(entry), entry)

Now do a a ‘any of’ query, which returns all documents that contain any of the values listed:

>>> from hurry.query.set import AnyOf
>>> displayQuery(AnyOf(f1, ['a', 'c']))
[1, 2, 4, 5, 6]
>>> displayQuery(AnyOf(f1, ['c', 'b']))
[1, 3, 4, 5, 6]
>>> displayQuery(AnyOf(f1, ['a']))
[1, 2, 6]

Do a ‘all of’ query, which returns all documents that contain all of the values listed:

>>> from hurry.query.set import AllOf
>>> displayQuery(AllOf(f1, ['a']))
[1, 2, 6]
>>> displayQuery(AllOf(f1, ['a', 'b']))
[1]
>>> displayQuery(AllOf(f1, ['a', 'c']))
[1, 6]

We can combine this with other queries:

>>> displayQuery(AnyOf(f1, ['a']) & Eq(f2, 1))
[1, 2]

ValueIndex queries

The ValueIndex is defined in zc.catalog and provides a generalization of the standard field index.

>>> from hurry.query import value

Let’s set up a catalog that uses this index. The ValueIndex is defined in zc.catalog. Let’s make a catalog which uses it:

>>> intid = DummyIntId()
>>> provideUtility(intid, zope.intid.interfaces.IIntIds)
>>> from zope.catalog.interfaces import ICatalog
>>> from zope.catalog.catalog import Catalog
>>> catalog = Catalog()
>>> provideUtility(catalog, ICatalog, 'catalog1')
>>> from zc.catalog.catalogindex import ValueIndex
>>> catalog['f1'] = ValueIndex('f1', IContent)

Next we set up some content data to fill the indices:

>>> content = [
... Content(1, 'a'),
... Content(2, 'b'),
... Content(3, 'c'),
... Content(4, 'd'),
... Content(5, 'c'),
... Content(6, 'a')]

And catalog them now:

>>> for entry in content:
...     catalog.index_doc(intid.register(entry), entry)

Let’s now query for all objects where f1 equals ‘a’:

>>> f1 = ('catalog1', 'f1')
>>> displayQuery(value.Eq(f1, 'a'))
[1, 6]

Next, let’s find all objects where f1 does not equal ‘a’; this is more efficient than the generic ~ operator:

>>> displayQuery(value.NotEq(f1, 'a'))
[2, 3, 4, 5]

If all the items in the catalog satisfy the NotEq condition, the query does not crash.

>>> displayQuery(value.NotEq(f1, 'z'))
[1, 2, 3, 4, 5, 6]

You can also query for all objects where the value of f1 is in a set of values:

>>> displayQuery(value.In(f1, ['a', 'd']))
[1, 4, 6]

The next interesting set of queries allows you to make evaluations of the values. For example, you can ask for all objects between a certain set of values:

>>> displayQuery(value.Between(f1, 'a', 'c'))
[1, 2, 3, 5, 6]
>>> displayQuery(value.Between(f1, 'a', 'c', exclude_min=True))
[2, 3, 5]
>>> displayQuery(value.Between(f1, 'a', 'c', exclude_max=True))
[1, 2, 6]
>>> displayQuery(value.Between(f1, 'a', 'c',
...                            exclude_min=True, exclude_max=True))
[2]

You can also leave out one end of the range:

>>> displayQuery(value.Between(f1, 'c', None))
[3, 4, 5]
>>> displayQuery(value.Between(f1, None, 'c'))
[1, 2, 3, 5, 6]

You can also use greater-equals and lesser-equals for the same purpose:

>>> displayQuery(value.Ge(f1, 'c'))
[3, 4, 5]
>>> displayQuery(value.Le(f1, 'c'))
[1, 2, 3, 5, 6]

Of course, you can chain those queries with the others as demonstrated before.

The value module also supports zc.catalog extents. The first query is ExtentAny, which returns all douments matching the extent. If the the extent is None, all document ids are returned:

>>> displayQuery(value.ExtentAny(f1, None))
[1, 2, 3, 4, 5, 6]

If we now create an extent that is only in the scope of the first four documents,

>>> from zc.catalog.extentcatalog import FilterExtent
>>> extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(4):
...     extent.add(i, i)

then only the first four are returned:

>>> displayQuery(value.ExtentAny(f1, extent))
[1, 2, 3, 4]

The opposite query is the ExtentNone query, which returns all ids in the extent that are not in the index:

>>> id = intid.register(Content(7, 'b'))
>>> id = intid.register(Content(8, 'c'))
>>> id = intid.register(Content(9, 'a'))
>>> extent = FilterExtent(lambda extent, uid, obj: True)
>>> for i in range(9):
...     extent.add(i, i)
>>> displayQuery(value.ExtentNone(f1, extent))
[7, 8, 9]

Querying different indexes

It’s possible to specify the context when creating a query. This context determines which index will be searched.

First setup a second registry and second catalog and populate it.

>>> catalog2 = Catalog()
>>> from zope.component.registry import Components
>>> import zope.component.interfaces
>>> import zope.interface
>>> intid1 = DummyIntId()
>>> class MockSite(object):
...     zope.interface.implements(zope.component.interfaces.IComponentLookup)
...     def __init__(self):
...         self.registry = Components('components')
...     def queryUtility(self, interface, name='', default=None):
...         if name == '': return intid1
...         else: return catalog2
...     def getSiteManager(self):
...         return self.registry
>>> from zope.component.hooks import setSite
>>> site1 = MockSite()
>>> setSite(site1)
>>> catalog2['f1'] = FieldIndex('f1', IContent)
>>> content = [
... Content(1,'A'),
... Content(2,'B'),]
>>> for entry in content:
...     catalog2.index_doc(intid1.register(entry), entry)

Now we can query this catalog by specifying the context:

>>> query = getUtility(IQuery)
>>> displayQuery(Eq(f1, 'A'), context=site1)
[1]
>>> displayQuery(In(f1, ['A', 'B']), context=site1)
[1, 2]

Sorting and limiting the resultset

It’s possible to have the resultset sorted on one of the fields in the query.

>>> catalog = Catalog()
>>> provideUtility(catalog, ICatalog, 'catalog1')
>>> catalog['f1'] = FieldIndex('f1', IContent)
>>> catalog['f2'] = FieldIndex('f2', IContent)
>>> catalog['t'] = TextIndex('t1', IContent)

First let’s set up some new data:

>>> content = [
... Content(1, 'a', 2, t1='Beautiful is better than ugly.'),
... Content(2, 'a', 3, t1='Explicit is better than implicit'),
... Content(3, 'b', 9, t1='Simple is better than complex'),
... Content(4, 'c', 8, t1='Complex is better than complicated'),
... Content(5, 'c', 7, t1='Readability counts'),
... Content(6, 'a', 1, t1='Although practicality beats purity')]

And catalog them now:

>>> for entry in content:
...     catalog.index_doc(intid.register(entry), entry)

Define a convenience function for quickly displaying a result set without performing any sorting here ourselves.

>>> def displayResult(q, context=None, **kw):
...     query = getUtility(IQuery)
...     r = query.searchResults(q, context, **kw)
...     return [e.id for e in r]

Without using sorting in the query itself, the resultset has an undefined order. We “manually” sort the results here to have something testable.

>>> f1 = ('catalog1', 'f1')
>>> [r for r in sorted(displayResult(Eq(f1, 'a')))]
[1, 2, 6]

Now we sort on the f2 index.

>>> f1 = ('catalog1', 'f1')
>>> displayResult(Eq(f1, 'a'), sort_field=('catalog1', 'f2'))
[6, 1, 2]

Reverse the order.

>>> f1 = ('catalog1', 'f1')
>>> displayResult(Eq(f1, 'a'), sort_field=('catalog1', 'f2'), reverse=True)
[2, 1, 6]

We can limit the amount of found items.

>>> f1 = ('catalog1', 'f1')
>>> displayResult(Eq(f1, 'a'), sort_field=('catalog1', 'f2'), limit=2)
[6, 1]

We can limit the reversed resultset too.

>>> f1 = ('catalog1', 'f1')
>>> displayResult(
...   Eq(f1, 'a'), sort_field=('catalog1', 'f2'), limit=2, reverse=True)
[2, 1]

Whenever a field is used for sorting that does not support is, an error is raised.

>>> f1 = ('catalog1', 'f1')
>>> displayResult(Eq(f1, 'a'), sort_field=('catalog1', 't'))
Traceback (most recent call last):
...
ValueError: Index t in catalog catalog1 does not support sorting.

The resultset can still be reversed and limited even if there’s no sort_field given (Note that the actual order of the result set when not using explicit sorting is not defined. In this test it is assumed that the natural order of the tested index is deterministic enough to be used as a proper test).

>>> f1 = ('catalog1', 'f1')
>>> displayResult(Eq(f1, 'a'), limit=2)
[1, 2]
>>> f1 = ('catalog1', 'f1')
>>> displayResult(Eq(f1, 'a'), limit=2, reverse=True)
[6, 2]

CHANGES

1.1.1 (2012-06-22)

  • ExtentNone in set.py missed a parameter index_id. Thanks to Danilo Botelho for the bug report.

1.1.0 (2010-07-12)

  • Allow the searchResults method of a Query to take an additional keyword argument sort_field that defines that defines (catalog_name, index_name) to sort on. That index in that catalog should implement IIndexSort.

    In addition to this keyword argument, limit and reverse keyword arguments can be passed too, that will limit the sorted resultset and/or reverse its order.

  • Allow the searchResults method of a Query object to take an additional optional context argument. This context will determine which catalog the search is performed on.

1.0.0 (2009-11-30)

  • Refresh dependencies. Use zope.catalog and zope.intid instead of zope.app.catalog and zope.app.intid respectively. Don’t zope.app.zapi.

  • Make package description more modern.

  • Clean up the code style.

0.9.3 (2008-09-29)

  • BUG: NotEq query no longer fails when all values in the index satisfy the NotEq condition.

0.9.2 (2006-09-22)

  • First release on the cheeseshop.

0.9.1 (2006-06-16)

  • Make zc.catalog a dependency of hurry.query.

0.9 (2006-05-16)

  • Separate hurry.query from the other hurry packages. Eggification work.

  • Support for ValueIndex from zc.catalog.

0.8 (2006-05-01)

Initial public release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hurry.query-1.1.1.tar.gz (24.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page