An open source dataset transformation, standardization, and normalization python library.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

Elwood

An open source dataset transformation, standardization, and normalization python library.

Usage

To use start using Elwood, simply run:

pip install elwood

Now you are able to use any of the dataset transformation, standardization, or normalization functions exposed through this library. To start, simply include from elwood import elwood in your python file.

Standardization

elwood.process(args)

Given an arbitrary dataset containing geospatial data (with columns and rows) with arbitrary non-standard format, and given some annotations/dictionary about the dataset, Elwood can standardize. Standardization means creating an output dataset with stable and predictable columns. The data can be normalized, regridded, scaled, and resolved using GADM to standard country names, as well as resolve the latitude,longitude of the event/measurement. A usual standard output will contain the following columns: timestamp, country, admin1, admin2, admin3, lat, lng, alongside other measurements/events/features of interest (additional columns to the right of the standard ones) contained within the input dataset.

#TODO document standardization further

Transformation

The transformation functions include geographical extent clipping (latitude/longitude), geographical regridding (gridded data such as NetCDF or GeoTIFF), temporal clipping, and temporal scaling.

Geospatial Clipping

elwood.clip_geo(dataframe, geo_columns, polygons_list)

This function takes a pandas dataframe, a geo_columns dict of the column names for latitude and longitude, ex: {'lat_column': 'latitude', 'lon_column': 'longitude'}, and a list containing lists of objects representing the polygons to clip the data to. ex:

[
     [
        {
            "lat": 11.0,
            "lng": 42.0
        },
        {
            "lat": 11.0,
            "lng": 43.0
        },
        {
            "lat": 12.0,
            "lng": 43.0
        },
        {
            "lat": 12.0,
            "lng": 42.0
        }
    ],
    ...
]

Geospatial regridding

elwood.regrid_dataframe_geo(dataframe, geo_columns, scale_multi)

This function takes a dataframe and regrids it's geography by some scale multiplier that is provided. This multiplier will be used to divide the current geographical scale in order to make a more coarse grained resolution dataset. The dataframe must have a detectable geographical scale, meaning each lat/lon represents a point in the middle of a gridded cell for the data provided. Lat and lon and determined by the geo_columns passed in: a dict of the column names ex: {'lat_column': 'my_latitude', 'lon_column': 'my_longitude'}

Temporal Clipping

elwood.clip_dataframe_time(dataframe, time_column, time_ranges)

This function will produce a dataframe that only includes rows with time_column values contained within time_ranges. The time_ranges argument is a list of objects containing a start and end time. ex: [{"start": datetime, "end": datetime}, ...]

Temporal Scaling

elwood.rescale_dataframe_time(dataframe, time_column, time_bucket, aggregation_function_list)

This function will produce a dataframe who's rows are the aggregated data based on some time bucket and some aggregation function list provided. The time_column is the name of the column containing targeted time values for rescaling. The time_bucket is some DateOffset, Timedelta or str representing the desired time granularity, ex. 'M', 'A', '2H'. The aggregation_function_list is a list of aggregation functions to apply to the data. ex. ['sum'] or ['sum', 'min', 'max']

0 to 1 Normalization

elwood.normalize_features(dataframe, output_file)

This function expects a dataframe with a "feature" column and a "value" column, or long data. Each entry for a feature has its own feature/value row. This function returns a dataframe in which all numerical values under the "value" column for each "feature" have been 0 to 1 scaled. Optionally you may specify an output_file name to generate a parquet file of the dataframe.

History

0.1.4 Added new regridding functionality, updated existing numpy and pandas based regridding.

0.1.4

0.1.2

0.1.1

0.1.0

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.1.4

Aug 7, 2023

0.1.3

Jul 18, 2023

0.1.2

Apr 19, 2023

0.1.1

Feb 17, 2023

0.1.0

Feb 17, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

elwood-0.1.4-py2.py3-none-any.whl (67.1 kB view hashes)

Uploaded Aug 7, 2023 Python 2 Python 3

elwood-0.1.4-11-py2.py3-none-any.whl (73.3 kB view hashes)

Uploaded Jan 19, 2024 Python 2 Python 3

elwood-0.1.4-10-py2.py3-none-any.whl (73.3 kB view hashes)

Uploaded Jan 19, 2024 Python 2 Python 3

elwood-0.1.4-9-py2.py3-none-any.whl (73.3 kB view hashes)

Uploaded Dec 14, 2023 Python 2 Python 3

elwood-0.1.4-8-py2.py3-none-any.whl (73.3 kB view hashes)

Uploaded Dec 14, 2023 Python 2 Python 3

elwood-0.1.4-7-py2.py3-none-any.whl (73.3 kB view hashes)

Uploaded Dec 13, 2023 Python 2 Python 3

elwood-0.1.4-6-py2.py3-none-any.whl (73.3 kB view hashes)

Uploaded Dec 13, 2023 Python 2 Python 3

elwood-0.1.4-4-py2.py3-none-any.whl (72.5 kB view hashes)

Uploaded Aug 10, 2023 Python 2 Python 3

elwood-0.1.4-3-py2.py3-none-any.whl (72.6 kB view hashes)

Uploaded Aug 10, 2023 Python 2 Python 3

elwood-0.1.4-2-py2.py3-none-any.whl (72.6 kB view hashes)

Uploaded Aug 8, 2023 Python 2 Python 3

elwood-0.1.4-1-py2.py3-none-any.whl (72.6 kB view hashes)

Uploaded Aug 8, 2023 Python 2 Python 3

Hashes for elwood-0.1.4-py2.py3-none-any.whl

Hashes for elwood-0.1.4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`4541e92580a60f1f08eedc4fa29b3f273acff8211fd72f44ea79702a890ef2a6`
MD5	`2887b8d77d3dc45e1ae8cb31dfa3061d`
BLAKE2b-256	`6fb9d638840a020594ae984d638c6222a77c4aa77617c02a0afe846f7034ef52`

Hashes for elwood-0.1.4-11-py2.py3-none-any.whl

Hashes for elwood-0.1.4-11-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`8fdf86a9ffb537641fde7e2261ee60288298b8a6d062172cafce556bea9b039f`
MD5	`3be7d636d57f04e7c5f3a7b95e8ffc42`
BLAKE2b-256	`5ff54ffe6f21cad240529baa0eee92a5a30d3b2321909584e5578efcd4196073`

Hashes for elwood-0.1.4-10-py2.py3-none-any.whl

Hashes for elwood-0.1.4-10-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`bb0c1afcd8c23d1c28c122b45a0316e3112004fe8a8d38786c7fbcf9820b5054`
MD5	`89c42f5d2f0e2901be7730b0baebbc96`
BLAKE2b-256	`ffb00ec9fb559ff476013650ae6219161f15897097d47d9066aba11fa73fdc8d`

Hashes for elwood-0.1.4-9-py2.py3-none-any.whl

Hashes for elwood-0.1.4-9-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f76b00a800684cf23b94eb692af07e77a8a7754c1f42b23401c6459c320fc42`
MD5	`19687db3b7c1a19f77d698832161b76e`
BLAKE2b-256	`0208e52600fd19504ee6f57f2dc25ee952510378c4e8ea9fd5787d19dd891035`

Hashes for elwood-0.1.4-8-py2.py3-none-any.whl

Hashes for elwood-0.1.4-8-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e097f45712b33f24ee30dfe87a1015d34bf1954286f82a5e443925e359c97a1c`
MD5	`7bd5113ee93264f6a719d7ae88cbc9f6`
BLAKE2b-256	`be257db86aa285059e373bde74884d5df7d5651248928474d663a8dba46f8c69`

Hashes for elwood-0.1.4-7-py2.py3-none-any.whl

Hashes for elwood-0.1.4-7-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`86579b19b06038f98eb5aff076ee47330211c18a1653dd341e82fadcdd72fd84`
MD5	`ee6405622302d257b2c38a97bb127229`
BLAKE2b-256	`5c209d084226891d338e52aa1df6c8de23342103dd6f9652ce154b96de5c2a8b`

Hashes for elwood-0.1.4-6-py2.py3-none-any.whl

Hashes for elwood-0.1.4-6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`7632ca41b6e4494e2aded49dadae6de7a0e35cf7d0b613b877dfaea73355c89e`
MD5	`8f3786aa230663145b30942f99fb266b`
BLAKE2b-256	`8c1522131f4cf0afdb38e3dcafd61f0bfd002d28d9d340d705d9b9118e517631`

Hashes for elwood-0.1.4-4-py2.py3-none-any.whl

Hashes for elwood-0.1.4-4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`df581062df5626a80cbd6e318c09e43212d478a7f031110d1319fb2b06111cd7`
MD5	`b1d58473c3360a49686b954c7883fc4c`
BLAKE2b-256	`3de24c5b6de66ad9fffcd3aaed365e09f865a1efe719326d7d619dc3d406e0bf`

Hashes for elwood-0.1.4-3-py2.py3-none-any.whl

Hashes for elwood-0.1.4-3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1b7975954a4fbb0723471163933e9f3a4439fce2d3fcd977935afd1b9cdf72a`
MD5	`82fd9e278322db66d5553cd0b58760aa`
BLAKE2b-256	`dcdf545c02c1cf7f9be98a5e0de43308fb4684a7e0494ddf74f051b28c5a4bd2`

Hashes for elwood-0.1.4-2-py2.py3-none-any.whl

Hashes for elwood-0.1.4-2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f8acf82d2f9e3c6084f34ba1ac1bdf526b77ee7afdf29191ec163707efaf4cb`
MD5	`31c4d8a02d8be67f175d2804cb008702`
BLAKE2b-256	`d5eff49eb6b0934f2dc38bca9cf774c2d06faeb5f242879f7aa770b49f192d00`

Hashes for elwood-0.1.4-1-py2.py3-none-any.whl

Hashes for elwood-0.1.4-1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e48dc1a921d02619c996c8ea249d915eff8ba8f00f7949eaec73b4d263b15d1`
MD5	`951a77ef42484d6fab03ad03965285f2`
BLAKE2b-256	`cde558a4943f80d37fde57a731fe0a1a973cdfebe219d47c1c319103bf587961`