Package Placeholder

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Reason this release was yanked:

testing

Project description

Dataset API

Dataset API structure

dataset_api
├── conda
│   └── recipes
│       ├── py38_recipe
│       └── py39_recipe
├── src
│   └── dataset_librarian
│       ├── dataset_api
│       ├── scripts
│       ├── __init__.py
│       ├── dataset.py
│       ├── datasets_urls.json
├── MANIFEST.in
├── README.md
├── pyproject.toml
└── requirements.txt

Environment setup

Clone the Model Zoo for Intel® Architecture repository and navigate to the dataset_api directory.

# Step 1 (recommended): Create and activate a virtual environment
## Option 1: Using virtualenv
virtualenv -p python3 venv
. venv/bin/activate
## Option 2: Using conda
conda create -n venv python=<3.8 or 3.9> -c conda-forge
conda activate venv

# Step 2: Installing package
## Option 1: Installing from source code
cd models/datasets/dataset_api
python -m pip install --upgrade pip build setuptools wheel
python -m pip install .
## Option 2: Installing from PyPI
python -m pip install dataset-librarian

PyPI package can be found here.

Datasets

Dataset name	Description	Download	Preprocessing	command
`brca`	Breast Cancer dataset that contains categorized contrast enhanced mammography data and radiologists’ notes.	supported	A prerequisite: Use a browser, download the Low Energy and Subtracted images, then provide the path to the directory that contains the downloaded images using `--directory` argument.	`python -m dataset_librarian.dataset -n brca --download --preprocess -d <path to the dataset directory>`
`tabformer`	Credit card data for TabFormer	supported	not supported	`python -m dataset_librarian.dataset -n tabformer --download`
`dureader-vis`	DuReader-vis for document automation. Chinese Open-domain Document Visual Question Answering (Open-Domain DocVQA) dataset, containing about 15K question-answering pairs and 158K document images from the Baidu search engine.	supported	not supported	`python -m dataset_librarian.dataset -n dureader-vis --download`
`msmarco`	MS MARCO is a collection of datasets focused on deep learning in search	supported	not supported	`python -m dataset_librarian.dataset -n msmarco --download`
`mvtec-ad`	MVTEC Anomaly Detection DATASET for industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories.	supported	supported	`python -m dataset_librarian.dataset -n mvtec-ad --download --preprocess -d <path to the dataset directory>`

Command-line Interface

Input Arguments	Description
--list (-l)	list the supported datasets.
--name (-n)	dataset name
--directory (-d)	directory location where the raw dataset will be saved on your system. It's also where the preprocessed dataset files will be written. If not set, a directory with the dataset name will be created.
--download	download the dataset specified.
--preprocess	preprocess the dataset if supported.

Python API

from dataset_librarian.dataset_api.download import download_dataset
from dataset_librarian.dataset_api.preprocess import preprocess_dataset

# Download the datasets
download_dataset('brca', <path to the raw dataset directory>)

# Preprocess the datasets
preprocess_dataset('brca', <path to the raw dataset directory>)

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.4

Aug 10, 2023

1.0.3

Jun 13, 2023

1.0.2

Jun 5, 2023

1.0.1 yanked

Jun 5, 2023

Reason this release was yanked:

Incorrect python compatibility versions

1.0.0

May 30, 2023

This version

0.0.0.dev1 pre-release yanked

May 26, 2023

Reason this release was yanked:

testing

0.0.0.dev0 pre-release yanked

May 5, 2023

Reason this release was yanked:

testing

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset_librarian-0.0.0.dev1.tar.gz (3.2 kB view hashes)

Uploaded May 26, 2023 Source

Built Distribution

dataset_librarian-0.0.0.dev1-py3-none-any.whl (3.5 kB view hashes)

Uploaded May 26, 2023 Python 3

Hashes for dataset_librarian-0.0.0.dev1.tar.gz

Hashes for dataset_librarian-0.0.0.dev1.tar.gz
Algorithm	Hash digest
SHA256	`f2bf1a54db8ee573e4bc850ed22c533a1f3eac8eb72f2a79cfc1577d796a49ac`
MD5	`c013458ecf01377612aa39bb21b9143d`
BLAKE2b-256	`96c61847a77c29ab6f3567ea8b8f2f4e571d82e403dd324de1ccf1db9416ce63`

Hashes for dataset_librarian-0.0.0.dev1-py3-none-any.whl

Hashes for dataset_librarian-0.0.0.dev1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e8e95a44385a69ed0979af6d9b05fd170398c54504bf97edbc58e39a4cdcb27a`
MD5	`760c031e83691c0fd6e4266d60c80c1c`
BLAKE2b-256	`d0ca2e39e4ff7b35125206391775ae51c712568681d899614deb80c67797945e`