A Pytorch library for audio data augmentation. Inspired by audiomentations. Useful for deep learning.
Project description
torch-audiomentations
Audio data augmentation in PyTorch. Inspired by audiomentations.
Setup
pip install torch-audiomentations
Usage example
import torch
from torch_audiomentations import Gain
# Initialize augmentation callable
apply_gain_augmentation = Gain(
min_gain_in_db=-15.0,
max_gain_in_db=5.0,
p=0.5,
)
# Note: torch-audiomentations can run on CPU or GPU
torch_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Make an example tensor with white noise.
# This tensor represents 8 audio snippets with 2 channels (stereo) and 2 seconds of 16 kHz audio.
audio_samples = torch.rand(size=(8, 2, 32000), dtype=torch.float32, device=torch_device) - 0.5
# Apply gain augmentation. This varies the gain of (some of) the audio snippets in the batch independently.
perturbed_audio_samples = apply_gain_augmentation(audio_samples, sample_rate=16000)
Contribute
Contributors welcome!
Join the Asteroid's slack
to start discussing about torch-audiomentations
with us.
Motivation: Speed
We don't want data augmentation to be a bottle neck in model training speed. Here is a comparison of the time it takes to run 1D convolution:
Current state
torch-audiomentations is in a very early development stage, so it's not ready for prime time yet. Meanwhile, star the repo and stay tuned!
Waveform transforms
ApplyBackgroundNoise
Not released yet
Add background noise to the input audio.
Gain
Added in v0.1.0
Multiply the audio by a random amplitude factor to reduce or increase the volume. This technique can help a model become somewhat invariant to the overall gain of the input audio.
Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping
ApplyImpulseResponse
Not released yet
Convolve the given audio with impulse responses.
PeakNormalization
Added in v0.2.0
Apply a constant amount of gain, so that highest signal level present in each audio snippet in the batch becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between -1 and 1.
This transform has an alternative mode (apply_to="only_too_loud_sounds") where it only applies to audio snippets that have extreme values outside the [-1, 1] range. This is useful for avoiding digital clipping in audio that is too loud, while leaving other audio untouched.
PolarityInversion
Added in v0.1.0
Flip the audio samples upside-down, reversing their polarity. In other words, multiply the waveform by -1, so negative values become positive, and vice versa. The result will sound the same compared to the original when played back in isolation. However, when mixed with other audio sources, the result may be different. This waveform inversion technique is sometimes used for audio cancellation or obtaining the difference between two waveforms. However, in the context of audio data augmentation, this transform can be useful when training phase-aware machine learning models.
Version history
v0.3.0 (2020-10-27)
- Transforms now return the input unchanged when they are in eval mode
- Add support for alternative modes
per_batch
andper_channel
v0.2.0 (2020-10-19)
- Simplify API for using CUDA tensors. The device is now inferred from the input tensor.
- Implement
PeakNormalization
- Expose
convolve
in the API
v0.1.0 (2020-10-12)
- Initial release with
Gain
andPolarityInversion
Development
Setup
A GPU-enabled development environment for torch-audiomentations can be created with conda:
conda create --name torch-audiomentations python=3.7.3
conda activate torch-audiomentations
conda install pytorch cudatoolkit=10.1 -c pytorch
conda env update
Run tests
pytest
Conventions
- Format python code with black
- Use Google-style docstrings
- Use explicit relative imports, not absolute imports
Acknowledgements
The development of torch-audiomentations is kindly backed by Nomono
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for torch-audiomentations-0.3.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c854ea68e9a54bf2380ed9e6680ba51be32a0edb036d2821d2eed78a27963ca0 |
|
MD5 | c29cd49b807eb005f3a353355355072e |
|
BLAKE2b-256 | d7e91f0fe1fd1ce3e340d8f7fa7a402439c60b50d95880dc3bb371614c03df51 |
Hashes for torch_audiomentations-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3573aacd21b290f67fc67b0974be8e59e79e286e2cddb701ed336cf09b93f0f |
|
MD5 | 4dfc6601d36e5bf7295df8d921d855c7 |
|
BLAKE2b-256 | 23f788fb834b55c2c15697164809bd6735f29f1b6f3773c451122173693ec9cf |