A Pytorch library for audio data augmentation. Inspired by audiomentations. Useful for deep learning.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

torch-audiomentations

Build status

Audio data augmentation in PyTorch. Inspired by audiomentations.

âœ… Supports CPU and GPU - speed is a priority
âœ… Supports batches of multichannel (or mono) audio
âœ… Transforms extend nn.Module, so they can be integrated as a part of a pytorch neural network model
âœ… Most transforms are differentiable
âœ… Three modes: per_batch, per_example and per_channel
âœ… Cross-platform compatibility
âœ… Permissive MIT license
âœ… High test coverage

Setup

Python version support

pip install torch-audiomentations

Usage example

import torch
from torch_audiomentations import Compose, Gain, PolarityInversion


# Initialize augmentation callable
apply_augmentation = Compose(
    transforms=[
        Gain(
            min_gain_in_db=-15.0,
            max_gain_in_db=5.0,
            p=0.5,
        ),
        PolarityInversion(p=0.5)
    ]
)

torch_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Make an example tensor with white noise.
# This tensor represents 8 audio snippets with 2 channels (stereo) and 2 s of 16 kHz audio.
audio_samples = torch.rand(size=(8, 2, 32000), dtype=torch.float32, device=torch_device) - 0.5

# Apply augmentation. This varies the gain and polarity of (some of)
# the audio snippets in the batch independently.
perturbed_audio_samples = apply_augmentation(audio_samples, sample_rate=16000)

Contribute

Contributors welcome! Join the Asteroid's slack to start discussing about torch-audiomentations with us.

Motivation: Speed

We don't want data augmentation to be a bottle neck in model training speed. Here is a comparison of the time it takes to run 1D convolution:

Convolve execution times

Current state

torch-audiomentations is in an early development stage, so the APIs are subject to change.

Waveform transforms

ApplyBackgroundNoise

Not released yet

Add background noise to the input audio.

Gain

Added in v0.1.0

Multiply the audio by a random amplitude factor to reduce or increase the volume. This technique can help a model become somewhat invariant to the overall gain of the input audio.

Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping

ApplyImpulseResponse

Not released yet

Convolve the given audio with impulse responses.

PeakNormalization

Added in v0.2.0

Apply a constant amount of gain, so that highest signal level present in each audio snippet in the batch becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between -1 and 1.

This transform has an alternative mode (apply_to="only_too_loud_sounds") where it only applies to audio snippets that have extreme values outside the [-1, 1] range. This is useful for avoiding digital clipping in audio that is too loud, while leaving other audio untouched.

PolarityInversion

Added in v0.1.0

Flip the audio samples upside-down, reversing their polarity. In other words, multiply the waveform by -1, so negative values become positive, and vice versa. The result will sound the same compared to the original when played back in isolation. However, when mixed with other audio sources, the result may be different. This waveform inversion technique is sometimes used for audio cancellation or obtaining the difference between two waveforms. However, in the context of audio data augmentation, this transform can be useful when training phase-aware machine learning models.

Version history

v0.4.0 (2020-11-10)

Implement Compose for applying multiple transforms
Implement utility functions from_dict and from_yaml for loading data augmentation configurations from dict, json or yaml
Officially support differentiability in most transforms

v0.3.0 (2020-10-27)

Transforms now return the input unchanged when they are in eval mode
Add support for alternative modes per_batch and per_channel

v0.2.0 (2020-10-19)

Simplify API for using CUDA tensors. The device is now inferred from the input tensor.
Implement PeakNormalization
Expose convolve in the API

v0.1.0 (2020-10-12)

Initial release with Gain and PolarityInversion

Development

Setup

A GPU-enabled development environment for torch-audiomentations can be created with conda:

conda create --name torch-audiomentations python=3.7.3
conda activate torch-audiomentations
conda install pytorch cudatoolkit=10.1 -c pytorch
conda env update

Run tests

pytest

Conventions

Format python code with black
Use Google-style docstrings
Use explicit relative imports, not absolute imports

Acknowledgements

The development of torch-audiomentations is kindly backed by Nomono

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.11.1

Feb 7, 2024

0.11.0

Jun 29, 2022

0.10.1

Mar 24, 2022

0.10.0

Feb 11, 2022

0.9.1

Dec 20, 2021

0.9.0

Oct 11, 2021

0.8.0

Jun 15, 2021

0.7.0

Apr 16, 2021

0.6.0

Feb 22, 2021

0.5.1

Dec 18, 2020

0.5.0

Dec 8, 2020

This version

0.4.0

Nov 10, 2020

0.3.0

Oct 27, 2020

0.2.0

Oct 19, 2020

0.1.0

Oct 12, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch-audiomentations-0.4.0.tar.gz (15.7 kB view hashes)

Uploaded Nov 10, 2020 Source

Built Distribution

torch_audiomentations-0.4.0-py3-none-any.whl (19.3 kB view hashes)

Uploaded Nov 10, 2020 Python 3

Hashes for torch-audiomentations-0.4.0.tar.gz

Hashes for torch-audiomentations-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`892938eb022e85b2574e2cdbb9d9174607de702029af2d8bd8ba085436e0b454`
MD5	`ba5b3e109b83d733fe8262795ba961a2`
BLAKE2b-256	`6b548ccc03490e181b686604191b31904b436a7afa903b0d8e9821c97c16c5ee`

Hashes for torch_audiomentations-0.4.0-py3-none-any.whl

Hashes for torch_audiomentations-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec1d219751675d64c3a9e0470906bfc46674a42f18963eeed2d144429d5cda48`
MD5	`1bbb617bf668c84a09b92285ad710254`
BLAKE2b-256	`1805ad2a48df83cda18c2bdbfe6ef5e3eb93e3507358022def014d117e64843b`

torch-audiomentations 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

torch-audiomentations

Setup

Usage example

Contribute

Motivation: Speed

Current state

Waveform transforms

ApplyBackgroundNoise

Gain

ApplyImpulseResponse

PeakNormalization

PolarityInversion

Version history

v0.4.0 (2020-11-10)

v0.3.0 (2020-10-27)

v0.2.0 (2020-10-19)

v0.1.0 (2020-10-12)

Development

Setup

Run tests

Conventions

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution