Iterable API for tabular datasets including CSV, XLSX, XML, & JSON.
Project description
IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLSX, XML, and JSON.
from itertable import load_file
for row in load_file("example.xlsx"):
print(row.date, row.name)
Note: Prior to version 2.0, IterTable was wq.io, a submodule of the wq framework. The package has been renamed to avoid confusion with the wq framework website (https://wq.io). Similarly, IterTable's
*IO
classes have been renamed to*Iter
, as the API is not intended to match that of Python'sStringIO
or otherio
classes.
- from wq.io import CsvFileIO
- data = CsvFileIO(filename='data.csv')
+ from itertable import CsvFileIter
+ data = CsvFileIter(filename='data.csv')
Getting Started
# Recommended: create virtual environment
# python3 -m venv venv
# . venv/bin/activate
python3 -m pip install itertable
# GIS support (Fiona & Shapely)
python3 -m pip install itertable[gis]
# Excel 97-2003 (.xls) support
python3 -m pip install itertable[oldexcel]
# (xlsx support is enabled by default)
# Pandas integration
python3 -m pip install itertable[pandas]
Overview
IterTable provides a general purpose API for loading, iterating over, and writing tabular datasets. The goal is to avoid needing to remember the unique usage of e.g. csv, openpyxl, or xml.etree every time one needs to work with external data. Instead, IterTable abstracts these libraries into a consistent interface that works as an iterable of namedtuples. Whenever possible, the field names for a dataset are automatically determined from the source file, e.g. the column headers in an Excel spreadsheet.
from itertable import ExcelFileIter
data = ExcelFileIter(filename='example.xlsx')
for row in data:
print(row.name, row.date)
IterTable provides a number of built-in classes like the above, including a CsvFileIter
, XmlFileIter
, and JsonFileIter
. There is also a convenience function, load_file()
, that attempts to automatically determine which class to use for a given file.
from itertable import load_file
data = load_file('example.csv')
for row in data:
print(row.name, row.date)
All of the included *FileIter
classes support both reading and writing to external files.
Network Client
IterTable also provides network-capable equivalents of each of the above classes, to facilitate loading data from third party webservices.
from itertable import JsonNetIter
class WebServiceIter(JsonNetIter):
url = "http://example.com/api"
data = WebServiceIter(params={'type': 'all'})
for row in data:
print(row.timestamp, row.value)
The powerful requests library is used internally to load data over HTTP.
Pandas Analysis
When Pandas is installed (via itertable[pandas]
), the as_dataframe()
method on itertable classes can be used to create a DataFrame, enabling more extensive analysis possibilities.
instance = WebServiceIter(params={'type': 'all'})
df = instance.as_dataframe()
print(df.value.mean())
GIS Support
When Fiona and Shapely are installed (via itertable[gis]
), itertable can also open and create shapefiles and other OGR-compatible geographic data formats.
from itertable import ShapeIter
data = ShapeIter(filename='sites.shp')
for id, site in data.items():
print(id, site.geometry.wkt)
More information on IterTable's gis support is available here.
Command-Line Interface
IterTable provides a simple CLI for rendering the content of a file or Iter class. This can be useful for e.g. inspecting a file or for integrating a shell automation workflow. The default output is CSV, but can be changed to JSON by setting -f json
.
python3 -m itertable example.json # JSON to CSV
python3 -m itertable -f json example.csv # CSV to JSON
python3 -m itertable example.xlsx "start_row=5"
python3 -m itertable http://example.com/example.csv
python3 -m itertable itertable.CsvNetIter "url=http://example.com/example.csv"
Extending IterTable
It is straightforward to extend IterTable to support arbitrary formats. Each provided class is composed of a BaseIter class and mixin classes (loaders, parsers, and mappers) that handle the various steps of the process.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for itertable-2.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ab3e0b157d388270d7859488468fda6f9a1b980b1da8233ba3068e73143f4c9 |
|
MD5 | dd3b23c145dc1b848a6d71f591a5f182 |
|
BLAKE2b-256 | 33c9c004ff2bd1c9e8ea04f46358ea5e5c95b62207941a911978881cd9da4119 |