YData SDK allows to use the *Data-Centric* tools from the YData ecosystem to accelerate AI development

These details have not been verified by PyPI

Project links

Home

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

YData SDK

YData Logo

Pythonversion

🎊 YData SDK for improved data quality everywhere!

ydata-sdk v0.1.0 is here! Create a YData account so you can start using today!

Documentation | More on YData

Overview

The YData SDK is an ecosystem of methods that allows users to, through a python interface, adopt a Data-Centric approach towards the AI development. The solution includes a set of integrated components for data ingestion, standardized data quality evaluation and data improvement, such as synthetic data generation, allowing an iterative improvement of the datasets used in high-impact business applications.

Synthetic data can be used as Machine Learning performance enhancer, to augment or mitigate the presence of bias in real data. Furthermore, it can be used as a Privacy Enhancing Technology, to enable data-sharing initiatives or even to fuel testing environments.

Under the YData SDK hood, you can find a set of algorithms and metrics based on statistics and deep learning based techniques, that will help you to accelerate your data preparation.

What you can expect:

YData SDK is composed by the following main modules:

Datasources
- YData’s SDK includes several connectors for easy integration with existing data sources. It supports several storage types, like filesystems and RDBMS. Check the list of connectors.
- SDK’s Datasources run on top of Dask, which allows it to deal with not only small workloads but also larger volumes of data.
Synthesizers
- Simplified interface to train a generative model and learn in a data-driven manner the behavior, the patterns and original data distribution. Optimize your model for privacy or utility use-cases.
- From a trained synthesizer, you can generate synthetic samples as needed and parametrise the number of records needed.
Synthetic data quality report Coming soon
- An extensive synthetic data quality report that measures 3 dimensions: privacy, utility and fidelity of the generated data. The report can be downloaded in PDF format for ease of sharing and compliance purposes or as a JSON to enable the integration in data flows.
Profiling Coming soon
- A set of metrics and algorithms summarizes datasets quality in three main dimensions: warnings, univariate analysis and a multivariate perspective.

Supported data formats

Tabular The RegularSynthesizer is perfect to synthesize high-dimensional data, that is time-independent with high quality results.
Time-Series The TimeSeriesSynthesizer is perfect to synthesize both regularly and not evenly spaced time-series, from smart-sensors to stock.

Project details

These details have not been verified by PyPI

Project links

Home

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.12.1

Apr 15, 2024

0.12.0

Mar 12, 2024

0.11.1

Mar 4, 2024

0.11.1rc1 pre-release

Mar 4, 2024

0.11.0

Feb 19, 2024

0.11.0rc1 pre-release

Feb 19, 2024

0.10.1rc1 pre-release

Feb 6, 2024

0.10.0

Jan 18, 2024

0.9.0

Jan 17, 2024

0.8.0

Jan 16, 2024

0.7.0

Dec 21, 2023

0.6.1

Sep 8, 2023

0.6.0

Jun 20, 2023

0.5.0

May 22, 2023

0.4.0

Apr 11, 2023

0.3.0

Mar 29, 2023

0.2.1

Mar 21, 2023

0.2.0

Mar 13, 2023

0.1.0

Mar 7, 2023

0.1.0rc1 pre-release

Mar 7, 2023

0.0.0.dev1 pre-release

Feb 25, 2023

0.0.0.dev0 pre-release

Feb 25, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

ydata_sdk-0.12.1-py310-none-any.whl (122.3 kB view hashes)

Uploaded Apr 15, 2024 Python 3.10

ydata_sdk-0.12.1-py39-none-any.whl (121.3 kB view hashes)

Uploaded Apr 15, 2024 Python 3.9

ydata_sdk-0.12.1-py38-none-any.whl (121.5 kB view hashes)

Uploaded Apr 15, 2024 Python 3.8

Hashes for ydata_sdk-0.12.1-py310-none-any.whl

Hashes for ydata_sdk-0.12.1-py310-none-any.whl
Algorithm	Hash digest
SHA256	`019a825e81d4dab1611cc52b853297a599e3e153353c26644e6e5134cb21d919`
MD5	`696f9e97aca0d99fbb00956547c98992`
BLAKE2b-256	`6401deb5b4e6843e845b9a3799abc1252832074170868c1361a238748f6b670d`

Hashes for ydata_sdk-0.12.1-py39-none-any.whl

Hashes for ydata_sdk-0.12.1-py39-none-any.whl
Algorithm	Hash digest
SHA256	`3c97ff43277d16692568eefc26cb5bb9394e3216a449358d522870e4cf93b6ab`
MD5	`e0250d236051f048fd720f0f836c6fca`
BLAKE2b-256	`ee86f59ce3d856ba1bd908872422ad203785f1a57cca97bd4f28658fff2be12f`

Hashes for ydata_sdk-0.12.1-py38-none-any.whl

Hashes for ydata_sdk-0.12.1-py38-none-any.whl
Algorithm	Hash digest
SHA256	`9fd6892a62838d5eef36bb9292339bd42441888a501b12b297979281c6b8455e`
MD5	`70090cd7b436c3a4d7670a9e26dac8dc`
BLAKE2b-256	`e020c376d0f768033a361eda8d7dd8da039d97de5563b916283f1a35e2a2fcc2`