Automated Audio Captioning datasets in Pytorch.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Automated Audio Captioning datasets for Pytorch

Automated Audio Captioning Unofficial datasets source code for AudioCaps [1], Clotho [2], and MACS [3], designed for Pytorch.

Installation

pip install git+https://github.com/Labbeti/aac_datasets

or clone the repository :

git clone https://github.com/Labbeti/aac_datasets
pip install -e aac_datasets

Examples

Create Clotho dataset

from aac_datasets import Clotho

dataset = Clotho(root=".", subset="dev", download=True)
audio, captions, *_ = dataset[0]
# audio: Tensor of shape (n_channels=1, audio_max_size)
# captions: list of str captions

Build Pytorch dataloader with MACS

from torch.utils.data.dataloader import DataLoader
from aac_datasets import MACS
from aac_datasets.utils import BasicCollate

dataset = MACS(root=".", download=True)
dataloader = DataLoader(dataset, batch_size=4, collate_fn=BasicCollate())

for audio_batch, captions_batch in dataloader:
    # audio_batch: Tensor of shape (batch_size=4, n_channels=2, audio_max_size)
    # captions_batch: list of list of str captions
    ...

Datasets stats

Here is the statistics for each dataset :

	AudioCaps	Clotho	MACS
Subset(s)	train, val, test	dev, val, eval, test, analysis	full
Sample rate	32000	44100	48000
Estimated size	43GB	27GB	13GB
Audio source	AudioSet (youtube)	Freesound	TAU Urban Acoustic Scenes 2019

Here is the train subset statistics for each dataset :

	AudioCaps/train	Clotho/dev	MACS/full
Nb audios	49838	3840	3930
Total audio duration	136.6h¹	24.0h	10.9h
Audio duration range	0.5-10s	15-30s	10s
Nb captions per audio	1	5	2-5
Nb captions	49838	19195	17275
Total nb words²	402482	217362	160006
Nb words range²	1-52	8-20	5-40

¹ This duration is estimated on the total duration of 46230/49838 files of 126.7h.

² The sentences are cleaned (lowercase+remove punctuation) and tokenized using the spacy tokenizer to count the words.

Requirements

Python packages

The requirements are automatically installed when using pip on this repository.

torch >= 1.10.1
torchaudio >= 0.10.1
py7zr >= 0.17.2
pyyaml >= 6.0
tqdm >= 4.64.0

External requirements (AudioCaps only)

The external requirements needed to download AudioCaps are ffmpeg and youtube-dl. These two programs can be download on Ubuntu using sudo apt install ffmpeg youtube-dl.

You can also override their paths for AudioCaps:

from aac_datasets import AudioCaps
AudioCaps.FFMPEG_PATH = "/my/path/to/ffmpeg"
AudioCaps.YOUTUBE_DL_PATH = "/my/path/to/youtube_dl"
_ = AudioCaps(root=".", download=True)

Command line download

To download a dataset, you can use download=True argument in dataset construction. However, if you want to download datasets separately, you can also use the following command :

python -m aac_datasets.download --root "./data" clotho --version "v2.1"

References

[1] C. D. Kim, B. Kim, H. Lee, and G. Kim, “Audiocaps: Generating captions for audios in the wild,” in NAACL-HLT, 2019. Available: https://aclanthology.org/N19-1011/

[2] K. Drossos, S. Lipping, and T. Virtanen, “Clotho: An Audio Captioning Dataset,” arXiv:1910.09387 [cs, eess], Oct. 2019, Available: http://arxiv.org/abs/1910.09387

[3] F. Font, A. Mesaros, D. P. W. Ellis, E. Fonseca, M. Fuentes, and B. Elizalde, Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021). Barcelona, Spain: Music Technology Group - Universitat Pompeu Fabra, Nov. 2021. Available: https://doi.org/10.5281/zenodo.5770113

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.2

Mar 23, 2024

0.5.1

Mar 5, 2024

0.5.0

Jan 5, 2024

0.4.1

Oct 25, 2023

0.4.0

Sep 25, 2023

0.3.3

May 11, 2023

0.3.2

Jan 30, 2023

0.3.1

Oct 31, 2022

0.3.0

Sep 28, 2022

0.2.0

Aug 30, 2022

This version

0.1.1

Jun 10, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aac_datasets-0.1.1.tar.gz (23.9 kB view hashes)

Uploaded Jun 10, 2022 Source

Hashes for aac_datasets-0.1.1.tar.gz

Hashes for aac_datasets-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`caafd8b68a3701a5353530fc37aa3dfbd883234953f3d7d89a6038003707d79a`
MD5	`7b3a8129f7059e29cd3bf392f9697942`
BLAKE2b-256	`ad129cadd4f23a0472dc2636a07b5390f2bdc1ddac7fa7dcc62e6c738ca6d7db`