whisperplus

WhisperPlus: A Python library for WhisperPlus API.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

WhisperPlus: Advancing Speech2Text and Text2Speech Feature 🚀

🛠️ Installation

pip install whisperplus
pip install flash-attn --no-build-isolation

🤗 Model Hub

You can find the models on the HuggingFace Model Hub

🎙️ Usage

To use the whisperplus library, follow the steps below for different tasks:

🎵 Youtube URL to Audio

from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3

url = "https://www.youtube.com/watch?v=di3rHkEZuUw"

audio_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")
transcript = pipeline(audio_path, "openai/whisper-large-v3", "english")

print(transcript)

📰 Summarization

from whisperplus import TextSummarizationPipeline

summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])

📰 Long Text Support Summarization

from whisperplus import LongTextSummarizationPipeline

summarizer = LongTextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary_text = summarizer.summarize(transcript)
print(summary_text)

💬 Speaker Diarization

from whisperplus import (
    ASRDiarizationPipeline,
    download_and_convert_to_mp3,
    format_speech_to_dialogue,
)

audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

⭐ RAG - Chat with Video(LanceDB)

from whisperplus.pipelines.chatbot import ChatWithVideo

chat = ChatWithVideo(
    input_file="trascript.txt",
    llm_model_name="TheBloke/Mistral-7B-v0.1-GGUF",
    llm_model_file="mistral-7b-v0.1.Q4_K_M.gguf",
    llm_model_type="mistral",
    embedding_model_name="sentence-transformers/all-MiniLM-L6-v2",
)

query = "what is this video about ?"
response = chat.run_query(query)
print(response)

🌠 RAG - Chat with Video(AutoLLM)

from whisperplus import AutoLLMChatWithVideo

# service_context_params
system_prompt = """
You are an friendly ai assistant that help users find the most relevant and accurate answers
to their questions based on the documents you have access to.
When answering the questions, mostly rely on the info in documents.
"""
query_wrapper_prompt = """
The document information is below.
---------------------
{context_str}
---------------------
Using the document information and mostly relying on it,
answer the query.
Query: {query_str}
Answer:
"""

chat = AutoLLMChatWithVideo(
    input_file="input_dir",  # path of mp3 file
    openai_key="YOUR_OPENAI_KEY",  # optional
    huggingface_key="YOUR_HUGGINGFACE_KEY",  # optional
    llm_model="gpt-3.5-turbo",
    llm_max_tokens="256",
    llm_temperature="0.1",
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    embed_model="huggingface/BAAI/bge-large-zh",  # "text-embedding-ada-002"
)

query = "what is this video about ?"
response = chat.run_query(query)
print(response)

🎙️ Speech to Text

from whisperplus import TextToSpeechPipeline

tts = TextToSpeechPipeline(model_id="suno/bark")
audio = tts(text="Hello World", voice_preset="v2/en_speaker_6")

🎥 AutoCaption

from whisperplus import WhisperAutoCaptionPipeline

caption = WhisperAutoCaptionPipeline(model_id="openai/whisper-large-v3")
caption(video_path="test.mp4", output_path="output.mp4", language="turkish")

😍 Contributing

pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files

📜 License

This project is licensed under the terms of the Apache License 2.0.

🤗 Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.4

May 7, 2024

0.3.3

May 6, 2024

0.3.2

May 6, 2024

0.3.1

May 5, 2024

0.3.0

May 4, 2024

0.2.8.1

May 4, 2024

This version

0.2.8

May 2, 2024

0.2.7.2.dev1 pre-release

May 3, 2024

0.2.7.1.dev1 pre-release

May 3, 2024

0.2.7

Jan 21, 2024

0.2.7.0.dev1 pre-release

May 3, 2024

0.2.6

Jan 21, 2024

0.2.5

Jan 11, 2024

0.2.4

Jan 11, 2024

0.2.3

Jan 10, 2024

0.2.2

Jan 10, 2024

0.2.1

Jan 10, 2024

0.2.0

Jan 10, 2024

0.1.0

Dec 29, 2023

0.0.9

Nov 27, 2023

0.0.8

Nov 27, 2023

0.0.7

Nov 27, 2023

0.0.6

Nov 24, 2023

0.0.5

Nov 23, 2023

0.0.4

Nov 22, 2023

0.0.3

Nov 22, 2023

0.0.2

Nov 22, 2023

0.0.1

Nov 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperplus-0.2.8.tar.gz (20.8 kB view hashes)

Uploaded May 2, 2024 Source

Hashes for whisperplus-0.2.8.tar.gz

Hashes for whisperplus-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`bcb7f86e0c0b8e3241184edfb47a6fd4ade9a8f98c15d9a502c4b3ff9a3a0e17`
MD5	`a934103643f8364e0194b2d303b17f71`
BLAKE2b-256	`9a4942da8dc1dfaa872ba8d15a9a2f7786964130a99fd933dae9f35c17cae8f2`