No project description provided
Project description
data-toolset
data-toolset is designed to simplify your data processing tasks by providing a more user-friendly alternative to the traditional JAR utilities like avro-tools and parquet-tools. With this Python package, you can effortlessly handle various data file formats, including Avro and Parquet, using a simple and intuitive command-line interface.
Installation
Python 3.9 and 3.10 are supported and tested (to some extent).
pip install --user data-toolset
Usage
$ data-toolset -h
usage: data-toolset [-h] {head,tail,meta,schema,stats,query,validate,merge,count,to_json,to_csv} ...
positional arguments:
{head,tail,meta,schema,stats,query,validate,merge,count,to_json,to_csv}
commands
head Print the first N records from a file
tail Print the last N records from a file
meta Print a file's metadata
schema Print the Avro schema for a file
stats Print statistics about a file
query Query a file
validate Validate a file
merge Merge multiple files into one
count Count the number of records in a file
to_json Convert a file to JSON format
to_csv Convert a file to CSV format
optional arguments:
-h, --help show this help message and exit
Examples
Print the first 10 records of a Parquet file:
data-toolset head my_data.parquet -n 10
Query a Parquet file using a SQL-like expression:
data-toolset query my_data.parquet "SELECT * FROM 'my_data.parquet' WHERE age > 25"
Merge multiple Avro files into one:
data-toolset merge file1.avro file2.avro file3.avro merged_file.avro
Contributing
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue on GitHub.
TODO
- proper online documentation
- update README
- add tests for merge
- create random_sample function
- create schema_evolution function
- mature create_sample function
- optimizations TBD
- support 3.11+
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for data_toolset-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74b05899c553a24e2418b6f5c0c6b43f7d93390b3d1876da45232f1b3db54166 |
|
MD5 | ae28eab0aab3e75389f887beb8e8175d |
|
BLAKE2b-256 | b4d419a08f4608e61d16eeb32dfaef2b5e299fece533150a69e53bc1d55b3f7f |