csv2es

Bulk import a CSV or TSV into Elastic Search

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language
Topic
- Internet
- Utilities

Project description

https://travis-ci.org/rholder/csv2es.png?branch=master

The csv2es project is an Apache 2.0 licensed commandline utility, written in Python, to load a CSV (or TSV) file into an Elasticsearch instance. That’s pretty much it. That’s all it does. The first row of the file should contain the field names intended to be used for Elasticsearch documents otherwise things will get weird. There’s a little trick documented below to add a header row in case the file is missing it.

Features

Minimal commandline interface
Load CSV’s or TSV’s
Customize the delimiter to something else
Uses the Elasticsearch bulk API

Installation

To install csv2es, simply:

$ pip install csv2es

Examples

Let’s say we’ve got a potatoes.csv file with a nice header that looks like this:

potato_id,potato_type,description
33,sweet,"kinda oval"
17,regular,bumpy
91,regular,"perfectly round"
18,sweet,delightful
42,fried,crispy
37,"extra special",crispy

Now we can stuff it into Elasticsearch:

csv2es --index-name potatoes --doc-type potato --import-file potatoes.csv

But what if it was tomatoes.tsv and separated by tabs? Well, we can do this:

csv2es --index-name tomatoes --doc-type tomato --import-file tomatoes.tsv --tab

Advanced Examples

What if we have a super cool pipe-delimited file and want to wipe out the existing “pipes” index every time we load it up? This ought to handle that case:

csv2es --index-name pipes --delete-index --doc-type pipe --import-file pipes.psv --delimiter '|'

Elasticsearch is great, but it’s doing something strange to our documents when we try to facet by certain fields. Let’s create our own custom mapping file to specify the fields used in Elasticsearch for that potatoes.csv called potatoes.mapping.json:

{
    "dynamic": "true",
    "properties": {
        "potato_id": {"type": "long"},
        "potato_type": {"type": "string", "index" : "not_analyzed"},
        "description": {"type": "string", "index" : "not_analyzed"},
    }
}

Now let’s load the data with a custom mapping file:

csv2es --index-name potatoes --doc-type potato --mapping-file potatoes.mapping.json --import-file potatoes.csv

What if my file is missing the header row, and it’s super huge because there are so many potatoes in it, and everything is terrible? We can use sed to tack on a nice header with something like this:

sed -i 1i"potato_id,potato_type,description" potatoes.csv

As long as you have more disk space than the size of the file, this should be fine.

Contribute

Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
Fork the repository on GitHub to start making your changes to the master branch (or branch off of it).
Write a test which shows that the bug was fixed or that the feature works as expected.
Send a pull request and bug the maintainer until it gets merged and published. :) Make sure to add yourself to AUTHORS.

History

1.0.0.dev2 (2015-04-19)

Switch over to Click for handling executable
Fix import errors
Fix –delete-index flag
Add –version option

1.0.0.dev1 (2015-04-18)

Tinkering with documentation and PyPI updates

1.0.0.dev0 (2015-04-18)

First dev version now exists
Apache 2.0 license applied
Finalize commandline interface
Sanitizing some setup.py and test suite running
Added Travis CI support

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language
Topic
- Internet
- Utilities

Release history Release notifications | RSS feed

1.0.1

Jun 2, 2015

1.0.0

Apr 24, 2015

This version

1.0.0.dev3 pre-release

Apr 19, 2015

1.0.0.dev1 pre-release

Apr 19, 2015

1.0.0.dev0 pre-release

Apr 19, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv2es-1.0.0.dev3.tar.gz (10.0 kB view hashes)

Uploaded Apr 19, 2015 Source

Hashes for csv2es-1.0.0.dev3.tar.gz

Hashes for csv2es-1.0.0.dev3.tar.gz
Algorithm	Hash digest
SHA256	`2b61bc5f0e05a414af70e02ad219ecc94b05450cece1a003c8a8d48b20a81685`
MD5	`22617d178d354b48c638a1cf3f2241c0`
BLAKE2b-256	`7467db387772d74c1bcd61c8a5cf6c13587ff6682bf005aff5ebe58ec180b4a8`