skip to navigation
skip to content

Not Logged In

repoze.pgtextindex 1.0

Text index for repoze.catalog based on PostgreSQL 8.4+

Latest Version: 1.2

repoze.pgtextindex is an indexing plugin for repoze.catalog that provides a text search engine based on the powerful text indexing capabilities of PostgreSQL 8.4 and above. It is designed to take the place of any text search index based on zope.index. Installation typically requires few or no changes to code that already uses repoze.catalog.

The advantages of repoze.pgtextindex over zope.index.text include:

  • Performance. For large datasets, repoze.pgtextindex can be orders of magnitude faster than zope.index, mainly because repoze.pgtextindex does not have the overhead of unpickling objects that zope.index has.
  • Lower RAM consumption. Users of zope.index work around the unpickling overhead by keeping large caches of unpickled objects in RAM. Even worse, each thread keeps its own copy of the object cache. PostgreSQL, on the other hand, does not need to maintain complex structures in RAM. The PostgreSQL process size tends to be constant and reasonable.
  • Maintenance. The text indexing features of PostgreSQL are well documented and receive a great deal of active maintenance, while zope.index has not received much developer attention for years.

repoze.pgtextindex does not cause PostgreSQL to be involved in every catalog query and update. Only operations that use or change the text index hit PostgreSQL.

Usage

repoze.pgtextindex is used just like any other index in repoze.catalog:

from repoze.pgtextindex import PGTextIndex

index = PGTextIndex(
    discriminator,
    dsn,
    table='pgtextindex',
    ts_config='english',
    drop_and_create=False)

The arguments to the constructor are as follows:

discriminator
The repoze.catalog discrminator for this index. For more information on discrminators see the repoze.catalog documentation. This argument is required.
dsn
The connection string for connecting to PostgreSQL. This argument is required.
table
The table to use for the index. The default is 'pgtextindex'.
ts_config
The PostgreSQL text search configuration to use for the index. The default is 'english' which is the default built in configuration which ships with PostgreSQL. For more information on text search configuration, see the PostgreSQL full text search documentation.
drop_and_create
If True the table and index used will dropped (if it exists) and (re)created. The default is False.

1.0 (2012-09-01)

  • Retry on IntegrityError to avoid meaningless errors.
  • Added metrics using the perfmetrics package.

0.5 (2012-04-27)

  • Switched to read committed isolation and removed explicit locking. The explicit locking was reducing write performance and may have been interfering with autovacuum. This change raises the probability of temporary inconsistency, but since this package did not provide ACID compliance anyway, developers already need to be prepared for temporary inconsistency.

0.4 (2011-11-18)

  • Truncate text to 1MB per document in order to stay under (silly) limit imposed by PostgreSQL.

0.3 (2011-06-30)

  • Fixed PostgreSQL ProgrammingError when query string contains a backslash character. (LP #798725)
  • Added ability to mark content with arbitrary markers which can be used as discriminators at query time. (LP #792334)
  • Support searches for words containing an apostrophe. (LP #801265)

0.2 (2011-06-15)

  • Reworked the scoring method: added a per-document score coefficient. The score coefficient can boost the score of documents known to be trustworthy.
  • Added the IWeightedText interface. The discriminator function can return an IWeightedText instance to control the weights and coefficient.
  • Added the IWeightedQuery interface. Text index queries can pass an IWeightedQuery instance to control the weight values.
  • Allow persistent objects to be indexed, since the usual objection (accidental ZODB references) does not apply.
  • Do not drop and create the table by default, making PGTextIndex easier to use outside ZODB.
  • Added the 'get_contextual_summaries' and 'get_contextual_summary' methods to the index.
  • Compatability with repoze.catalog 0.8.0.

0.1 (2011-01-20)

  • Initial release.
 
File Type Py Version Uploaded on Size
repoze.pgtextindex-1.0.tar.gz (md5) Source 2012-09-01 22KB
  • Downloads (All Versions):
  • 6 downloads in the last day
  • 85 downloads in the last week
  • 455 downloads in the last month