Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Wav2Vec2 STT Python

Beta Software

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.

Requirements:

Python 3.7+
Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
Python package requirements: cffi, numpy
Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.

Models:

Model	Download Size
Facebook Wav2Vec2 2.0 Base (960h)	360 MB
Facebook Wav2Vec2 2.0 Large (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 (960h)	1.18 GB
Facebook Wav2Vec2 2.0 Large LV60 Self (960h)	1.18 GB

Usage

from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')

import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())

assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'

Also contains a simple CLI interface for recognizing wav files:

$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...

positional arguments:
  {decode}    sub-command
    decode    decode one or more WAV files

optional arguments:
  -h, --help  show this help message and exit

Installation/Building

Recommended installation via wheel from pip (requires a recent version of pip):

python -m pip install wav2vec2_stt

See setup.py for more details on building it yourself.

Author

David Zurow (@daanzu)

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.

Acknowledgments

Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.2.0

Aug 16, 2021

0.1.0

Aug 14, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

wav2vec2_stt-0.2.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (101.1 MB view hashes)

Uploaded Aug 16, 2021 Python 2 Python 3 manylinux: glibc 2.17+ x86-64

Hashes for wav2vec2_stt-0.2.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for wav2vec2_stt-0.2.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`231df1c52cb3aaf3e36edc4a2fd710e68b23d775b5d07e1d0562aca5a8aecfab`
MD5	`76372978cdba7115336969cdb4148b08`
BLAKE2b-256	`337534edab90ccc60170d7f522dabd20c33b42baec256d025aa03d06e0186e4a`