elasticsearch-loader

A pythonic tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

elasticsearch\_loader |Build Status| |Can I Use Python 3?| |PyPI version|
=========================================================================

Main features:
~~~~~~~~~~~~~~

- Batch upload CSV (actually any \*SV) files to Elasticsearch
- Batch upload JSON files / JSON lines to Elasticsearch
- Batch upload parquet files to Elasticsearch
- Pre defining custom mappings
- Delete index before upload
- Index documents with \_id from the document itself
- Load data directly from url
- SSL and basic auth

Test matrix
~~~~~~~~~~~

+---------------+---------+---------+---------+
| python / es | 2.4.6 | 5.6.5 | 6.1.1 |
+===============+=========+=========+=========+
| 2.7 | ✅ | ✅ | ✅ |
+---------------+---------+---------+---------+
| 3.6 | ✅ | ✅ | ✅ |
+---------------+---------+---------+---------+

Installation
~~~~~~~~~~~~

| ``pip install elasticsearch-loader``
| *In order to add parquet support run
``pip install elasticsearch-loader[parquet]``*

Usage
~~~~~

::

(venv)/tmp $ elasticsearch_loader --help
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...

Options:
-c, --config-file TEXT Load default configuration file from esl.yml
--bulk-size INTEGER How many docs to collect before writing to
ElasticSearch (default 500)
--es-host TEXT Elasticsearch cluster entry point. (default
http://localhost:9200)
--verify-certs Make sure we verify SSL certificates
(default false)
--use-ssl Turn on SSL (default false)
--ca-certs TEXT Provide a path to CA certs on disk
--http-auth TEXT Provide username and password for basic auth
in the format of username:password
--index TEXT Destination index name [required]
--delete Delete index before import? (default false)
--progress Enable progress bar - NOTICE: in order to
show progress the entire input should be
collected and can consume more memory than
without progress bar
--type TEXT Docs type [required]
--id-field TEXT Specify field name that be used as document
id
--as-child Insert _parent, _routing field, the value is
same as _id. Note: must specify --id-field
explicitly
--with-retry Retry if ES bulk insertion failed
--index-settings-file FILENAME Specify path to json file containing index
mapping and settings, creates index if
missing
-h, --help Show this message and exit.

Commands:
csv
json FILES with the format of [{"a": "1"}, {"b": "2"}]
parquet

Examples
~~~~~~~~

Load 2 CSV to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^

``elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv``

Load JSONs to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^

``elasticsearch_loader --index incidents --type incident json *.json``

Load all git commits into elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -``

Load parquet to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``elasticsearch_loader --index incidents --type incident parquet file1.parquet``

Load CSV from github repo (actually any http/https is ok)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json``

Load data from stdin
^^^^^^^^^^^^^^^^^^^^

``generate_data | elasticsearch_loader --index data --type incident csv -``

Read \_id from incident\_id field ``elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Load custom mappings
^^^^^^^^^^^^^^^^^^^^

``elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv``

Tests and sample data
~~~~~~~~~~~~~~~~~~~~~

End to end and regression tests are located under test directory and can
run by runnig ``./test.py`` Input formats can be found under samples

.. |Build Status| image:: https://travis-ci.org/moshe/elasticsearch_loader.svg?branch=master
:target: https://travis-ci.org/moshe/elasticsearch_loader
.. |Can I Use Python 3?| image:: https://caniusepython3.com/project/elasticsearch-loader.svg
:target: https://caniusepython3.com/project/elasticsearch-loader
.. |PyPI version| image:: https://badge.fury.io/py/elasticsearch_loader.svg
:target: https://pypi.python.org/pypi/elasticsearch-loader

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.6.0

Dec 21, 2019

0.5.1

Sep 17, 2019

0.5.0

Aug 2, 2019

0.4.0

Jun 29, 2019

0.3.0

Jun 6, 2019

0.2.25

Mar 29, 2019

0.2.24

Feb 15, 2019

0.2.22

Feb 15, 2019

0.2.21

Feb 15, 2019

0.2.19

Feb 2, 2019

0.2.18

Jan 27, 2019

0.2.17

Jan 27, 2019

0.2.15

Jan 27, 2019

0.2.14

Jan 19, 2019

0.2.13

Dec 27, 2018

0.2.12

Aug 12, 2018

0.2.11

Aug 12, 2018

0.2.9

Aug 12, 2018

0.2.8

Aug 12, 2018

0.2.7

Aug 8, 2018

0.2.6

Mar 23, 2018

0.2.5

Mar 23, 2018

0.2.4

Mar 23, 2018

This version

0.2.3

Mar 23, 2018

0.2.2

May 5, 2017

0.2.0

Feb 22, 2017

0.1.8

Nov 5, 2016

0.1.7

Nov 2, 2016

0.1.2

Nov 2, 2016

0.1.1

Nov 2, 2016

0.1.0

Oct 8, 2016

0.0.4

Oct 8, 2016

0.0.3

Oct 1, 2016

0.0.2

Oct 1, 2016

0.0.1

Oct 1, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elasticsearch-loader-0.2.3.tar.gz (6.8 kB view hashes)

Uploaded Mar 23, 2018 Source

Hashes for elasticsearch-loader-0.2.3.tar.gz

Hashes for elasticsearch-loader-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`1689060c0f06931949ce6138e74f568a64434971471bd1230bb708c493675b9e`
MD5	`9afa88fe2be9cca1cbeb9868efeec135`
BLAKE2b-256	`1cc621d83b9a3b9aa695aeb0092d78f7d5b1aadc0b8d56f6195f79a1c95d2895`