Abstract NetCDF data objects, providing fast data transfer between analysis packages.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
Programming Language
Topic
- Scientific/Engineering

Project description

ncdata

Generic NetCDF data in Python.

Provides fast data exchange between analysis packages, and full control of storage formatting.

Especially : Ncdata exchanges data between Xarray and Iris as efficently as possible

"lossless, copy-free and lazy-preserving".

This enables the user to freely mix+match operations from both projects, getting the "best of both worlds".

Motivation
- Primary Use
- Secondary Uses
Principles
Working Usage Examples
API documentation
Installation
Project Status
References
Developer Notes

Motivation

Primary Use

Fast and efficient translation of data between Xarray and Iris objects.

This allows the user to mix+match features from either package in code.

For example:

from ncdata.iris_xarray import dataset_to_cubes, cubes_to_dataset

# Apply Iris regridder to xarray data
dataset = xarray.open_dataset('file1.nc')
cube, = dataset_to_cubes(dataset)
cube2 = cube.regrid(grid_cube, iris.analysis.PointInCell)
dataset2 = cubes_to_dataset(cube2)

# Apply Xarray statistic to Iris data
cubes = iris.load('file1.nc')
dataset = cubes_to_dataset(cubes)
dataset2 = dataset.group_by('time.dayofyear').argmin()
cubes2 = dataset_to_cubes(dataset2)

data conversion is equivalent to writing to a file with one library, and reading it back with the other ..
- .. except that no actual files are written
both real (numpy) and lazy (dask) variable data arrays are transferred directly, without copying or computing

Secondary Uses

Exact control of file formatting

Ncdata can also be used as a transfer layer between Iris or Xarray file i/o and the exact format of data stored in files.
I.E. adjustments can be made to file data before loading it into Iris/Xarray; or Iris/Xarray saved output can be adjusted before writing to a file.

This allows the user to workaround any package limitations in controlling storage aspects such as : data chunking; reserved attributes; missing-value processing; or dimension control.

For example:

from ncdata.xarray import from_xarray
from ncdata.iris import to_cubes
from ncdata.netcdf4 import to_nc4, from_nc4

# Rename a dimension in xarray output
dataset = xr.open_dataset('file1.nc')
xr_ncdata = from_xarray(dataset)
dim = xr_ncdata.dimensions.pop('dim0')
xr_ncdata.dimensions['newdim'] = dim
for var in xr_ncdata.variables.values():
    var.dimensions = [
        'newdim' if dim == 'dim0' else dim
        for dim in var.dimensions
    ]
to_nc4(ncdata, 'file_2a.nc')

# Fix chunking in Iris input
ncdata = from_nc4('file1.nc')
for var in ncdata.variables:
    var.chunking = lambda: (
        100.e6 if dim == 'dim0' else -1
        for dim in var.dimensions
    )
cubes = to_cubes(ncdata)

Manipulation of data

ncdata can also be used for data extraction and modification, similar to the scope of CDO and NCO command-line operators but without file operations.
However, this type of usage is as yet still undeveloped : There is no inbuilt support for data consistency checking, or obviously useful operations such as indexing by dimension. This could be added in future, but it is also true that many such operations (like indexing) may be better done using Iris/Xarray.

Principles

ncdata represents NetCDF data as Python objects
ncdata objects can be freely manipulated, independent of any data file
ncdata variables can contain either real (numpy) or lazy (Dask) arrays
ncdata can be losslessly converted to and from actual NetCDF files
Iris or Xarray objects can be converted to and from ncdata, in the same way that they are read from and saved to NetCDF files
translation between Xarray and Iris is based on conversion to ncdata, which is in turn equivalent to file i/o
- thus, Iris/Xarray translation is equivalent to saving from one package into a file, then loading the file in the other package
ncdata exchanges variable data directly with Iris/Xarray, with no copying of real data or computing of lazy data
ncdata exchanges lazy arrays with files using Dask 'streaming', thus allowing transfer of arrays larger than memory

Code Examples

mostly TBD
proof-of-concept script for netCDF4 file i/o
proof-of-concept script for iris-xarray conversions

API documentation

see the ReadTheDocs build

Installation

Not yet building as a package, or uploading to PyPI / conda-forge. Though this is planned for future "proper" releases.

Code is pure Python. Download and add repo>/lib to PYTHONPATH.
An installation process will be provided shortly (Jan 2024) ...

Project Status

Code Stability

We intend to follow PEP 440 or (older) SemVer versioning principles.

Release version is at "v0.0.1".
This is a first complete implementation, with functional operational of all public APIs.
A release "v0.1.0" will follow when build and deployment mechanisms are sorted out.
The code is however still experimental, and APIs are not stable (hence no major version yet).

Iris and Xarray Compatibility

Iris

tested working with latest Iris
compatible with iris >= v3.7.0
- see : support added in v3.7.0

Xarray

appears working against 'current' Xarray at time of writing
- v2023.11.0 (November 16, 2023)

Known limitations

Unsupported features : not planned

user-defined datatypes are not supported
- this includes compound and variable-length types

Unsupported features : planned for future release

groups (not yet fully supported ?)
file output chunking control

Untested features : probably done, pending test

unlimited dimensions (not yet fully supported)
file compression and encoding options
iris and xarray load/save keywords generally

References

Iris issue : https://github.com/SciTools/iris/issues/4994
planning presentation : https://github.com/SciTools/iris/files/10499677/Xarray-Iris.bridge.proposal.--.NcData.pdf
in-Iris code workings : https://github.com/pp-mo/iris/pull/75

Developer Notes

for a full docs-build, a simple make html will do for now.
- the docs/Makefile wipes the API docs and invokes sphinx-apidoc for a full rebuild
- results are then available at docs/_build/html/index.html

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Natural Language
- English
Operating System
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.1.1

Mar 19, 2024

0.1.0

Jan 24, 2024

This version

0.0.2

Jan 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncdata-0.0.2.tar.gz (420.4 kB view hashes)

Uploaded Jan 3, 2024 Source

Built Distribution

ncdata-0.0.2-py3-none-any.whl (24.6 kB view hashes)

Uploaded Jan 3, 2024 Python 3

Hashes for ncdata-0.0.2.tar.gz

Hashes for ncdata-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`3bbf08adf9f9ddd6b3d12c39159427bb3f6115adc4a013e125043405dd7f09f5`
MD5	`2eb73809deeb4c084693fa307b1634af`
BLAKE2b-256	`f1b675e7e828dcde72030cd67e98e21a98d54886811e4c52307f62371b34e611`

Hashes for ncdata-0.0.2-py3-none-any.whl

Hashes for ncdata-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d15864959b1c5af377016730a38fd18ad9c5fa938ffa05ec167fddf08f195178`
MD5	`74969fe97187d8e787478f14a887072b`
BLAKE2b-256	`13ef2274e9d0eba246e5eb27a81bed1262a3fa3ae14bad4dd81edcd80972a915`