WhisperPlus: A Python library for WhisperPlus API.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

WhisperPlus: Advancing Speech-to-Text Processing 🚀

🛠️ Installation

pip install whisperplus

🤗 Model Hub

You can find the models on the HuggingFace Spaces or on the HuggingFace Model Hub

🎙️ Usage

To use the whisperplus library, follow the steps below for different tasks:

🎵 Youtube URL to Audio

from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3

# Define the URL of the YouTube video that you want to convert to text.
url = "https://www.youtube.com/watch?v=di3rHkEZuUw"

# Initialize the Speech to Text Pipeline with the specified model.
audio_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")

# Run the pipeline on the audio file.
transcript = pipeline(
    audio_path=audio_path, model_id="openai/whisper-large-v3", language="english"
)

# Print the transcript of the audio.
print(transcript)

Summarization

from whisperplus.pipelines.summarization import TextSummarizationPipeline

summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])

Speaker Diarization

from whisperplus import (
    ASRDiarizationPipeline,
    download_and_convert_to_mp3,
    format_speech_to_dialogue,
)

audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

Chat with Video - RAG

pip install -r dev-requirements

from wihsperplus.pipelines.chatbot import ChatWithVideo

input_file = "trascript.text"
llm_model_name = "TheBloke/Mistral-7B-v0.1-GGUF"
llm_model_file = "mistral-7b-v0.1.Q4_K_M.gguf"
llm_model_type = "mistral"
embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"
chat = ChatWithVideo(
    input_file, llm_model_name, llm_model_file, llm_model_type, embedding_model_name
)
query = "what is this video about ?"
response = chat.run_query(query)
print(response)

Contributing

pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files

📜 License

This project is licensed under the terms of the Apache License 2.0.

🤗 Acknowledgments

This project is based on the HuggingFace Transformers library.

🤗 Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.4

May 7, 2024

0.3.3

May 6, 2024

0.3.2

May 6, 2024

0.3.1

May 5, 2024

0.3.0

May 4, 2024

0.2.8.1

May 4, 2024

0.2.8

May 2, 2024

0.2.7.2.dev1 pre-release

May 3, 2024

0.2.7.1.dev1 pre-release

May 3, 2024

0.2.7

Jan 21, 2024

0.2.7.0.dev1 pre-release

May 3, 2024

0.2.6

Jan 21, 2024

0.2.5

Jan 11, 2024

0.2.4

Jan 11, 2024

This version

0.2.3

Jan 10, 2024

0.2.2

Jan 10, 2024

0.2.1

Jan 10, 2024

0.2.0

Jan 10, 2024

0.1.0

Dec 29, 2023

0.0.9

Nov 27, 2023

0.0.8

Nov 27, 2023

0.0.7

Nov 27, 2023

0.0.6

Nov 24, 2023

0.0.5

Nov 23, 2023

0.0.4

Nov 22, 2023

0.0.3

Nov 22, 2023

0.0.2

Nov 22, 2023

0.0.1

Nov 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperplus-0.2.3.tar.gz (17.9 kB view hashes)

Uploaded Jan 10, 2024 Source

Hashes for whisperplus-0.2.3.tar.gz

Hashes for whisperplus-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`569ec4d67a3568a44582a881c89ba5b49fd3c54a06f2c64963cf69ab66e3e883`
MD5	`189bbce535454e5fc49edc9c90f5b25a`
BLAKE2b-256	`7cccea2fa764a2de21372a39900c8a96edaee6850355b987393686a66c248c97`