No project description provided

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

ASRP: Automatic Speech Recognition Preprocessing Utility

ASRP is a python package that offers a set of tools to preprocess and evaluate ASR (Automatic Speech Recognition) text. The package also provides a speech-to-text transcription tool and a text-to-speech conversion tool. The code is open-source and can be installed using pip.

Key Features

Preprocess ASR text with ease
Evaluate ASR output quality
Transcribe speech to Hubert code
Convert unit code to speech
Enhance speech quality with a noise reduction tool
LiveASR tool for real-time speech recognition
Speaker Embedding Extraction (x-vector/d-vector)

install

pip install asrp

Preprocess

ASRP offers an easy-to-use set of functions to preprocess ASR text data.
The input data is a dictionary with the key 'sentence', and the output is the preprocessed text.
You can either use the fun_en function or use dynamic loading. Here's how to use it:

import asrp

batch_data = {
    'sentence': "I'm fine, thanks."
}
asrp.fun_en(batch_data)

dynamic loading

import asrp

batch_data = {
    'sentence': "I'm fine, thanks."
}
preprocessor = getattr(asrp, 'fun_en')
preprocessor(batch_data)

Evaluation

ASRP provides functions to evaluate the output quality of ASR systems using
the Word Error Rate (WER) and Character Error Rate (CER) metrics.
Here's how to use it:

import asrp

targets = ['HuggingFace is great!', 'Love Transformers!', 'Let\'s wav2vec!']
preds = ['HuggingFace is awesome!', 'Transformers is powerful.', 'Let\'s finetune wav2vec!']
print("chunk size WER: {:2f}".format(100 * asrp.chunked_wer(targets, preds, chunk_size=None)))
print("chunk size CER: {:2f}".format(100 * asrp.chunked_cer(targets, preds, chunk_size=None)))

Speech to Discrete Unit

import asrp
import nlp2

# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md
# https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/gslm/ulm
nlp2.download_file(
    'https://huggingface.co/voidful/mhubert-base/resolve/main/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', './')
hc = asrp.HubertCode("voidful/mhubert-base", './mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', 11,
                     chunk_sec=30,
                     worker=20)
hc('voice file path')

Discrete Unit to speech

import asrp

code = []  # discrete unit
# https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/unit2speech
# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md
cs = asrp.Code2Speech(tts_checkpoint='./tts_checkpoint_best.pt', waveglow_checkpint='waveglow_256channels_new.pt')
cs(code)

# play on notebook
import IPython.display as ipd

ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)

mhubert English hifigan vocoder example

import asrp
import nlp2
import IPython.display as ipd
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
nlp2.download_file(
    'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',
    './')


tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts")
model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts")
model.eval()
cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')

inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt")
code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]
code = [int(i) for i in code.replace("</s>","").replace("<s>","").split("v_tok_")[1:]]
print(code)
ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)

Speech Enhancement

ASRP also provides a tool to enhance speech quality with a noise reduction tool.
from https://github.com/facebookresearch/fairseq/tree/main/examples/speech_synthesis/preprocessing/denoiser

from asrp import SpeechEnhancer

ase = SpeechEnhancer()
print(ase('./test/xxx.wav'))

LiveASR - huggingface's model

modify from https://github.com/oliverguhr/wav2vec2-live

from asrp.live import LiveSpeech

english_model = "voidful/wav2vec2-xlsr-multilingual-56"
asr = LiveSpeech(english_model, device_name="default")
asr.start()

try:
    while True:
        text, sample_length, inference_time = asr.get_last_text()
        print(f"{sample_length:.3f}s"
              + f"\t{inference_time:.3f}s"
              + f"\t{text}")

except KeyboardInterrupt:
    asr.stop()

LiveASR - whisper's model

from asrp.live import LiveSpeech

whisper_model = "tiny"
asr = LiveSpeech(whisper_model, vad_mode=2, language='zh')
asr.start()
last_text = ""
while True:
    asr_text = ""
    try:
        asr_text, sample_length, inference_time = asr.get_last_text()
        if len(asr_text) > 0:
            print(asr_text, sample_length, inference_time)
    except KeyboardInterrupt:
        asr.stop()
        break

Speaker Embedding Extraction - x vector

from https://speechbrain.readthedocs.io/en/latest/API/speechbrain.lobes.models.Xvector.html

from asrp.speaker_embedding import extract_x_vector

extract_x_vector('./test/xxx.wav')

Speaker Embedding Extraction - d vector

from https://github.com/yistLin/dvector

from asrp.speaker_embedding import extract_d_vector

extract_d_vector('./test/xxx.wav')

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.74

May 1, 2023

0.0.73

May 1, 2023

0.0.72

Apr 7, 2023

0.0.71

Apr 6, 2023

0.0.70

Apr 6, 2023

0.0.69

Apr 6, 2023

0.0.68

Apr 6, 2023

0.0.67

Mar 23, 2023

0.0.66

Feb 23, 2023

0.0.65

Feb 19, 2023

0.0.64

Feb 19, 2023

0.0.63

Feb 19, 2023

0.0.62

Feb 19, 2023

0.0.61

Feb 19, 2023

0.0.60

Feb 19, 2023

0.0.59

Feb 19, 2023

0.0.58

Feb 19, 2023

0.0.57

Feb 19, 2023

0.0.56

Feb 17, 2023

0.0.55

Dec 26, 2022

0.0.54

Nov 18, 2022

0.0.53

Nov 16, 2022

0.0.52

Oct 23, 2022

0.0.51

Oct 19, 2022

0.0.50

Oct 14, 2022

0.0.49

Oct 10, 2022

0.0.48

Sep 29, 2022

0.0.47

Sep 29, 2022

0.0.46

Aug 9, 2022

0.0.45

Aug 9, 2022

0.0.44

Aug 1, 2022

0.0.43

Jul 28, 2022

0.0.42

Jul 21, 2022

0.0.41

Jul 18, 2022

0.0.39

Jul 14, 2022

0.0.38

Jul 14, 2022

0.0.37

Jul 11, 2022

0.0.36

Jun 27, 2022

0.0.35

Jun 22, 2022

0.0.33

Apr 10, 2022

0.0.32

Apr 5, 2022

0.0.31

Apr 5, 2022

0.0.30

Apr 4, 2022

0.0.29

Apr 4, 2022

0.0.28

Apr 3, 2022

0.0.27

Apr 3, 2022

0.0.26

Nov 28, 2021

0.0.25

Nov 28, 2021

0.0.24

Nov 28, 2021

0.0.23

Nov 28, 2021

0.0.22

Nov 27, 2021

0.0.21

Nov 27, 2021

0.0.20

Nov 27, 2021

0.0.19

Nov 27, 2021

0.0.18

Nov 27, 2021

0.0.17

Nov 27, 2021

0.0.16

Nov 27, 2021

0.0.15

Nov 27, 2021

0.0.14

Nov 27, 2021

0.0.13

Nov 27, 2021

0.0.12

Nov 27, 2021

0.0.11

Nov 18, 2021

0.0.10

Nov 18, 2021

0.0.9

Nov 1, 2021

0.0.8

Aug 27, 2021

0.0.7

Aug 26, 2021

0.0.6

Aug 2, 2021

0.0.5

Jul 18, 2021

0.0.4

Jul 12, 2021

0.0.3

May 18, 2021

0.0.2

May 18, 2021

0.0.1

May 18, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asrp-0.0.74.tar.gz (51.9 kB view hashes)

Uploaded May 1, 2023 Source

Built Distribution

asrp-0.0.74-py3-none-any.whl (53.5 kB view hashes)

Uploaded May 1, 2023 Python 3

Hashes for asrp-0.0.74.tar.gz

Hashes for asrp-0.0.74.tar.gz
Algorithm	Hash digest
SHA256	`33ceb55cd6c92128d38ab0f7ff9222f03b99484cde6e608009320c528f213e59`
MD5	`e26c984561f22394c352db77ee2b604d`
BLAKE2b-256	`62597929ee67d3e466c73931f22fef8640aa21332228b00ab8f237b2f0e004a2`

Hashes for asrp-0.0.74-py3-none-any.whl

Hashes for asrp-0.0.74-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d32a43097a56f64815ce6c4a1b743baab11a3b78ddb3338e968e99397997056c`
MD5	`927fbd923d91dfe18278b41c2ea9acdb`
BLAKE2b-256	`f4d5a8e40d6dfd20d79c6b82dce8dfc15aed73144b79daf89a4001253aa4e079`