Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
Project description
stream-read-ods
Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
To construct ODS spreadsheets on the fly, try stream-write-ods.
Installation
pip install stream-read-ods
Usage
To extract the rows you must use the stream_read_ods
function, passing it an iterable of bytes
instances, and it will return an iterable of (sheet_name, sheet_rows)
pairs.
from stream_read_ods import stream_read_ods
import httpx
def ods_chunks():
# Iterable that yields the bytes of an ODS file
with httpx.stream('GET', 'https://www.example.com/my.ods') as r:
yield from r.iter_bytes(chunk_size=65536)
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
for sheet_row in sheet_rows:
print(row) # Tuple of cells
If the spreadsheet is of a fairly simple structure, then the sheet_rows
from above can be passed to the simple_table
function to extract the names of the columns and the rows of the table.
from stream_read_ods import stream_read_ods, simple_table
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
columns, rows = simple_table(sheet_rows, skip_rows=2)
for row in rows:
print(row) # Tuple of cells
This can then be used to construct a Pandas dataframe from the ODS file (although this would store the entire sheet in memory).
import pandas as pd
from stream_read_ods import stream_read_ods, simple_table
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
columns, rows = simple_table(sheet_rows, skip_rows=2)
df = pd.DataFrame(rows, columns=columns)
print(df)
Types
There are 8 possible data types in an Open Document Spreadsheet: boolean, currency, date, float, percentage, string, time, and void. These are converted to Python types according to the following table.
ODS type | Python type |
---|---|
boolean | bool |
currency | stream_read_ods.Currency |
date | date or datetime |
float | Decimal |
percentage | stream_read_ods.Percentage |
string | str |
time | stream_read_ods.Time |
void | NoneType |
stream_read_ods.Currency
A subclass of Decimal with an additional attribute code
that contains the currency code, for example the string GBP
. This can be None
if the ODS file does not specify a code.
stream_read_ods.Percentage
A subclass of Decimal.
stream_read_ods.Time
The Python built-in timedelta type is not used since timedelta does not offer a way to store intervals of years or months, other than converting to days which would be a loss of information.
Instead, a namedtuple is defined, stream_read_ods.Time, with members:
Member | Type |
---|---|
sign | str |
years | int |
months | int |
days | int |
hours | int |
minutes | int |
seconds | Decimal |
Running tests
pip install -r requirements-dev.txt
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stream_read_ods-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1537c3457e06242313072a6e79b2206fcd55ed234399f5555f1e0536b2a3f6b |
|
MD5 | 1ecb64939e0400dd5e9188cb280ea74f |
|
BLAKE2b-256 | c5edf9d94a4a2d1ccd335e14fb11fcc15c96a0e5cfdcb4913505e88c7e282fa7 |