collective.elastic.ingest

Addon for ElasticSearch integration with Plone

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Celery-Tasks for ElasticSearch Integration for Plone content

auto-create ElasticSearch…
- index
- mapping from Plone schema using a flexible conversions file (JSON).
- ingest-attachment pipelines using (same as above) file.
task to
- index a content object with all data given plus allowedRolesAndUsers and section (primary path)
- unindex an content object
configure from environment variables:
- celery,
- elasticsearch,
- sentry logging (optional)

Installation

Install collective.elastic.ingest (redis-ready) using pip:

pip install collective.elastic.ingest redis

collective.elastic.ingest requires elasticsearch. Specifiy the version according your Elasticsearch app version. For example:

pip install 'elasticsearch~=7.0'

Starting

Define the configuration as environment variables:

CELERY_BROKER=redis://localhost:6379/0
ELASTICSEARCH_INGEST_SERVER=localhost:9200
ELASTICSEARCH_INGEST_USE_SSL=0
PLONE_SERVICE=http://localhost:8080
PLONE_PATH=Plone
PLONE_USER=admin
PLONE_PASSWORD=admin

Optional (defaults used if not given):

ANALYSIS_FILE=/full/path/to/analysis.json
MAPPINGS_FILE=/full/path/to/mappings.json
PREPROCESSINGS_FILE=/full/path/to/preprocessings.json
SENTRY_DSN= (disabled by default)

Then run celery:

celery worker -A collective.elastic.ingest.celery.app -l info

Or with debug information:

celery worker -A collective.elastic.ingest.celery.app -l debug

Text Analysis

Test analysis is optional. Skip this on a first installation.

Search results can be enhanced with a tailored text analysis. This is an advanced topic. You can find detailed information about text analysis in ElasticSearch documentation. We provide an example analysis configuration for a better search for german compounded words.

Example: A document with the string ‘Lehrstellenbörse’ can be found by quering ‘Lehrstelle’ and also by quering ‘Börse’ with a decompounder with word list ‘Lehrstelle, Börse’ and an additional stemmer.

The example analyzer configuration also applies a stemmer, which can handle flexations of words, which is an important enhancement. Even fuzzy search, which can be used without any analysis configuration, has its limits in a nice but complex language like german.

The analysis configuration is just a configuration of analyzers. In the provided example are two of them: german_analyzer and german_exact. The first is the one to decompound words according the word list in lexicon.txt. A stemmer is added. The second one is to allow also exact queries with a quoted search string. These two analyzers are to be applied to fields. You can apply them in your mapping. Example:

"behaviors/plone.basic/title": {
    "type": "text",
    "analyzer": "german_analyzer",
    "fields": {
        "exact": {
            "type": "text",
            "analyzer": "german_exact_analyzer"
        }
    }
},

Check your configured analysis with:

POST {{elasticsearchserver}}/_analyze

{
    "text": "Lehrstellenbörse",
    "tokenizer": "standard",
    "filter": [
        "lowercase",
        "custom_dictionary_decompounder",
        "light_german_stemmer",
        "unique"
    ]
}

The response delivers the tokens for the analyzed text “Lehrstellenbörse”.

Note: The file elasticsearch-lexicon.txt with the word list used by the decompounder of the sample analysis configuration in analysis.json.example has to be located in the configuration directory of your elasticsearch server.

Source Code

The sources are in a GIT DVCS with its main branches at github. There you can report issue too.

We’d be happy to see many forks and pull-requests to make this addon even better.

Maintainers are Jens Klein, Peter Holzer and the BlueDynamics Alliance developer team. We appreciate any contribution and if a release is needed to be done on pypi, please just contact one of us. We also offer commercial support if any training, coaching, integration or adaptions are needed.

Contributions

Initial implementation was made possible by Evangelisch-reformierte Landeskirche des Kantons Zürich.

Idea and testing by Peter Holzer

Concept & code by Jens W. Klein

Text analysis code and configuration Katja Süss

Install for development

clone source code repository,
enter repository directory
recommended: create a virtualenv python -mvenv env
development install ./bin/env/pip install -e .
add redis support ./bin/env/pip install redis.
load environment configuration source .env.

Todo

query status of a task
simple statistics about tasks-count: pending, done, errored
celery retry on failure, i.e. restart of ElasticSearch, Plone, …

License

The project is licensed under the GPLv2.

Changelog

1.1 (2023-03-03)

Index allowedRolesAndUsers and section (primary path) [ksuess]

1.0 (2022-11-08)

Update to elasticsearch-py 8.x [ksuess]
Add optional configuration of text analysis (stemmer, decompunder, etc) [ksuess]
Keep source on rewrite [ksuess]
Initial release. [jensens]

Contributors

Jens W. Klein, jk@kleinundpartner.at
Katja Süss, Rohberg, @ksuess

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.1.0

Apr 19, 2024

2.0.1

Jan 16, 2024

2.0.0

Dec 5, 2023

2.0.0rc9 pre-release

Dec 4, 2023

2.0.0rc8 pre-release

Dec 1, 2023

2.0.0rc7 pre-release

Dec 1, 2023

2.0.0rc5 pre-release

Dec 1, 2023

2.0.0rc4 pre-release

Nov 30, 2023

2.0.0rc3 pre-release

Nov 28, 2023

2.0.0rc2 pre-release

Nov 28, 2023

2.0.0rc1 pre-release

Nov 27, 2023

2.0.0b11 pre-release

Nov 22, 2023

2.0.0b10 pre-release

Nov 21, 2023

2.0.0b9 pre-release

Nov 20, 2023

2.0.0b8 pre-release

Nov 20, 2023

2.0.0b7 pre-release

Nov 17, 2023

2.0.0b6 pre-release

Nov 16, 2023

2.0.0b5 pre-release

Nov 16, 2023

2.0.0b4 pre-release

Nov 16, 2023

2.0.0b3 pre-release

Nov 16, 2023

2.0.0b2 pre-release

Nov 16, 2023

2.0.0b1 pre-release

Nov 16, 2023

1.4

Aug 17, 2023

1.3

Aug 17, 2023

1.2

Jul 3, 2023

This version

1.1

Mar 3, 2023

1.0

Nov 8, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.elastic.ingest-1.1.tar.gz (25.5 kB view hashes)

Uploaded Mar 3, 2023 Source

Built Distribution

collective.elastic.ingest-1.1-py2.py3-none-any.whl (29.6 kB view hashes)

Uploaded Mar 3, 2023 Python 2 Python 3

Hashes for collective.elastic.ingest-1.1.tar.gz

Hashes for collective.elastic.ingest-1.1.tar.gz
Algorithm	Hash digest
SHA256	`5ff8760dd926ba0fcb5afbb0407e00ef260cf58fdcb98899f9fbeec3911e674f`
MD5	`aae574e1f6f0cc6712131cee4b7585c0`
BLAKE2b-256	`c25526d9b58f9e019cc3748d140b00a2c64f48b2b4580b3d5556bf915819e7b6`

Hashes for collective.elastic.ingest-1.1-py2.py3-none-any.whl

Hashes for collective.elastic.ingest-1.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`61981910f3aab253099a32632139d621e049b1e6f4909dc027e2531f69568569`
MD5	`f45a1d1effa18ea20744152fec406ad0`
BLAKE2b-256	`29b2198b3532c3f27aceda3e53e4c4f9f4e5987693c701f0c90145c95fc62f91`