Python utilities for working with inaturalist-open-data
Project description
pyinaturalist-open-data
This is a work in progress and not yet complete!
pyinaturalist-open-data is a python library and CLI tool for working with inaturalist-open-data. Its goal is to make it easy to import and use this dataset in a python application backed by any SQLAlchemy-compatible database engine (SQLite by default), or simply for local data exploration.
See the CLI in action here or on asciinema:
Installation
Install with pip:
pip install pyinaturalist-open-data
Or for local development:
git clone https://github.com/JWCook/pyinaturalist-open-data.git
cd pyinaturalist-open-data
pip install poetry && poetry install
Usage
This package provides the command pynat
. See --help
for commands and options:
Usage: pynat [OPTIONS] COMMAND [ARGS]...
Commands for working with inaturalist open data
Options:
-v, --verbose Show more detailed output
--help Show this message and exit.
Commands:
db Load contents of CSV files into a database
dl Download and extract inaturalist open data archive
init Just create tables (if they don't already exist) without populating...
load Download and load all data into a database.
Run everything
The simplest command is load
, which runs all steps:
- Download and extract the dataset
- Create database tables and indices
- Load the data into the database
Options:
Usage: pynat load [OPTIONS]
Options:
-d, --download-dir TEXT Alternate path for downloads
-u, --uri TEXT Alternate database URI to connect to
--help Show this message and exit.
By default, this will create a new SQLite database. Alternatively, you can provide a URI for any supported database.
Run individual steps
Other commands are available if you only one to run one of those steps at a time.
dl
command:
Usage: pynat dl [OPTIONS]
Download and extract all files in the inaturalist open data archive
Options:
-d, --download-dir TEXT Alternate path for downloads
--help Show this message and exit
Note: Both dl
and load
will reuse local data if already exists and is up to date.
db
command:
Usage: pynat db [OPTIONS]
Load contents of CSV files into a database. Also creates tables and
indexes, if they don't already exist.
Options:
-d, --download-dir TEXT Alternate path for downloads
-i, --init Just initialize the database with tables
+ indexes without loading data
-t, --tables [observation|photo|taxon|user]
Load only these specific tables
-u, --uri TEXT Alternate database URI to connect to
--help Show this message and exit.
Note: This can take a long time to run. Depending on the database type, you will likely get
better performance with database-specific bulk loading tools (for example, psql
with COPY for PostgreSQL)
Python package
To use as a python package instead of a CLI tool:
from pyinaturalist_open_data import download_metadata, load_all
download_metadata()
load_all()
Full package documentation on readthedocs will be coming soon.
Planned features
Some features I would ideally like to add to this:
- Performance optimizations
- Basic querying features
- Image downloads based on query results
- Integration with iNaturalist API data via pyinaturalist
- Integration with CSV data from the iNaturalist export tool
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyinaturalist-open-data-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bda40cbec21cb80076deba0651c696824614141a5b85f3f634f3d0b50bf2a689 |
|
MD5 | 92550b25937c1c54051715deb17a7f4e |
|
BLAKE2b-256 | 9d0372888af6a897ea6a93be6a86dc383efc3361a28accffc3bae3289778a41f |
Hashes for pyinaturalist_open_data-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b896272cc0fcde98fa6c42b7abb8e488778ab4246c95184c01819951fa1c4750 |
|
MD5 | 50ee13fe348f03aadd7d6f5fc2b43a94 |
|
BLAKE2b-256 | 936ebdcdd3639d816abecf40213729fce1b8c8617d5fcd89f8d3c92462923359 |