Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.
Project description
Wav2Vec2 STT Python
Beta Software
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.
Requirements:
- Python 3.7+
- Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
- Python package requirements:
cffi
,numpy
- Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.
Models:
Model | Download Size |
---|---|
Facebook Wav2Vec2 2.0 Base (960h) | 360 MB |
Facebook Wav2Vec2 2.0 Large (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 Self (960h) | 1.18 GB |
Usage
from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')
import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())
assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'
Also contains a simple CLI interface for recognizing wav
files:
$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...
positional arguments:
{decode} sub-command
decode decode one or more WAV files
optional arguments:
-h, --help show this help message and exit
Installation/Building
Recommended installation via wheel from pip (requires a recent version of pip):
python -m pip install wav2vec2_stt
See setup.py for more details on building it yourself.
Author
- David Zurow (@daanzu)
License
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.
Acknowledgments
- Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for wav2vec2_stt-0.2.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 231df1c52cb3aaf3e36edc4a2fd710e68b23d775b5d07e1d0562aca5a8aecfab |
|
MD5 | 76372978cdba7115336969cdb4148b08 |
|
BLAKE2b-256 | 337534edab90ccc60170d7f522dabd20c33b42baec256d025aa03d06e0186e4a |