Speechbox

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

🤗 Speechbox offers a set of speech processing tools, such as punctuation restoration.

Installation

With pip (official package)

pip install speechbox

Contributing

We ❤️ contributions from the open-source community! If you want to contribute to this library, please check out our Contribution guide. You can look out for issues you'd like to tackle to contribute to the library.

See Good first issues for general opportunities to contribute
See New Task for more advanced contributions. Make sure to have read the Philosophy guide to succesfully add a new task.

Also, say 👋 in our public Discord channel under ML for Audio and Speech. We discuss the new trends about machine learning methods for speech, help each other with contributions, personal projects or just hang out ☕.

Tasks

Task	Description	Author
Punctuation Restoration	Punctuation restoration allows one to predict capitalized words as well as punctuation by using Whisper.	Patrick von Platen

Punctuation Restoration

Punctuation restoration relies on the premise that Whisper can understand universal speech. The model is forced to predict the passed words, but is allowed to capitalized letters, remove or add blank spaces as well as add punctuation. Punctuation is simply defined as the offial Python string.Punctuation characters.

Note: For now this package has only been tested with:

and only on some 80 audio samples of patrickvonplaten/librispeech_asr_dummy.

See some transcribed results here.

Web Demo

If you want to try out the punctuation restoration, you can try out the following 🚀 Spaces:

Example

In order to use the punctuation restoration task, you need to install Transformers:

pip install --upgrade transformers

For this example, we will additionally make use of datasets to load a sample audio file:

pip install --upgrade datasets

Now we stream a single audio sample, load the punctuation restoring class with "openai/whisper-tiny.en" and add punctuation to the transcription.

from speechbox import PunctuationRestorer
from datasets import load_dataset

streamed_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)

# get first sample
sample = next(iter(streamed_dataset))

# print out normalized transcript
print(sample["text"])
# => "HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE"

# load the restoring class
restorer = PunctuationRestorer.from_pretrained("openai/whisper-tiny.en")
restorer.to("cuda")

restored_text, log_probs = restorer(sample["audio"]["array"], sample["text"], sampling_rate=sample["audio"]["sampling_rate"], num_beams=1)

print("Restored text:\n", restored_text)

See examples/restore for more information.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.1

Jan 27, 2023

0.2.0

Jan 27, 2023

0.1.2

Dec 29, 2022

This version

0.1.1

Dec 28, 2022

0.1.0

Dec 28, 2022

0.0.2

Dec 27, 2022

0.0.1

Dec 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechbox-0.1.1.tar.gz (17.6 kB view hashes)

Uploaded Dec 28, 2022 Source

Built Distribution

speechbox-0.1.1-py3-none-any.whl (15.5 kB view hashes)

Uploaded Dec 28, 2022 Python 3

Hashes for speechbox-0.1.1.tar.gz

Hashes for speechbox-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`ac1aaead34c503e618d8c676afbaaa8928dd2090efff228d7626cc06f2cd19b7`
MD5	`0181efb836769db04dd3e930929c8147`
BLAKE2b-256	`bb7c2f3b113a365015a9ebd4808e269e88dd6b9f98792d24597fb1a1dd60358d`

Hashes for speechbox-0.1.1-py3-none-any.whl

Hashes for speechbox-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8fa0ada369e67c6e3ee47bcd5f4347746fb5f32a8058a65d2f546595063cf8f3`
MD5	`b06badbbebb20f59c21f7c358b7d67df`
BLAKE2b-256	`62330ea4c0d15b772996103a4bb266f734b067773b808dd323578675547913b9`