inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# inaSpeechSegmenter

inaSpeechSegmenter is a framework for speech segmentation in Python 3.
It provides methods for speech music segmentation allowing to split audio signal into homogenous zones of speech and music.
It provides methods for speaker gender segmentation allowing to split speech excerpts into men and women speech.

## Installation

inaSpeechSegmenter is a framework in python 3.
It can be installed using the following procedure:

### Prerequisites

inaSpeechSegmenter requires ffmpeg for decoding any type of format.
Installation of ffmpeg for ubuntu can be done using the following commandline:
```bash
$ sudo apt-get install ffmpeg
```

### Installing from from sources

```bash
# clone git repository
$ git clone https://github.com/ina-foss/inaSpeechSegmenter.git
# create a python 3 virtual environement and activate it
$ virtualenv -p python3 inaSpeechSegEnv
$ source inaSpeechSegEnv/bin/activate
# install a backend for keras (tensorflow, theano, cntk...)
$ pip install tensorflow-gpu # if you wish GPU implementation (recommended)
$ pip install tensorflow # for a CPU implementation
# install framework and dependencies
$ cd inaSpeechSegmenter
$ python setup.py install
```

### PIP installation
TODO: Not Yet managed
```bash
# create a python 3 virtual environement and activate it
$ virtualenv -p python3 inaSpeechSegEnv
$ source inaSpeechSegEnv/bin/activate
# install a backend for keras (tensorflow, theano, cntk...)
$ pip install tensorflow-gpu # if you wish GPU implementation (recommended)
$ pip install tensorflow # for a CPU implementation
# install framework and dependencies
$ pip install inaSpeechSegmenter
```

## Using inaSpeechSegmenter

### Speech Segmentation Program
Binary program ina_speech_segmenter.py may be used to segment multimedia archives encoded in any format supported by ffmpeg. It requires input media and output csv files corresponding to the segmentation. Corresponding csv may be visualised using softwares such as https://www.sonicvisualiser.org/
```bash
# get help
$ ina_speech_segmenter.py --help
usage: ina_speech_segmenter.py [-h] -i INPUT [INPUT ...] -o OUTPUT_DIRECTORY

Do Speech/Music and Male/Female segmentation. Store segmentations into CSV
files

optional arguments:
-h, --help show this help message and exit
-i INPUT [INPUT ...], --input INPUT [INPUT ...]
Input media to analyse. May be a full path to a media
(/home/david/test.mp3), a list of full paths
(/home/david/test.mp3 /tmp/mymedia.avi), or a regex
input pattern ("/home/david/myaudiobooks/*.mp3")
-o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
Directory used to store segmentations. Resulting
segmentations have same base name as the corresponding
input media, with csv extension. Ex: mymedia.MPG will
result in mymedia.csv
```
### Using Speech Segmentation API

InaSpeechSegmentation API is very simple to use!
See the following notebook for a comprehensive example: [API Tutorial Here!](API_Tutorial.ipynb)

## Citing

inaSpeechSegmenter has been presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018 conference in Calgary, Canada. If you use this toolbox in your research, you can cite the following work in your publications :

```bibtex
@inproceedings{ddoukhanicassp2018,
author = {Doukhan, David and Carrive, Jean and Vallet, Félicien and Larcher, Anthony and Meignier, Sylvain},
title = {An Open-Source Speaker Gender Detection Framework for Monitoring Gender Equality},
year = {2018},
organization={IEEE},
booktitle={Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on}
}
```

## CREDITS

This work was realized in the framework of MeMAD project.
https://memad.eu/
MeMAD is an EU funded H2020 research project.
It has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.7.8

Mar 15, 2024

0.7.7

Oct 25, 2023

0.7.6

Feb 8, 2023

0.7.4

Feb 8, 2023

0.7.3

Feb 13, 2022

0.7.2

Feb 13, 2022

0.7.1

Feb 13, 2022

0.6.8

Jul 21, 2021

0.6.7

Apr 12, 2021

0.6.6

Feb 22, 2021

0.6.5

Feb 22, 2021

0.6.4

Feb 8, 2021

0.6.2

May 14, 2020

0.6.1

May 14, 2020

0.6.0

May 10, 2020

0.5.1

Feb 11, 2020

0.5.0

Feb 10, 2020

0.4.2

Jan 16, 2020

0.4.1

Jan 3, 2020

0.4.0

Dec 10, 2019

0.3.3

Nov 18, 2019

0.3.2

Oct 15, 2019

0.3.1

Oct 15, 2019

0.3.0

Oct 15, 2019

0.1.1

Apr 15, 2019

0.1.0

Oct 11, 2018

0.0.5

Oct 11, 2018

0.0.4

Oct 4, 2018

0.0.3

Apr 6, 2018

0.0.2

Apr 6, 2018

This version

0.0.1

Apr 5, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inaSpeechSegmenter-0.0.1.tar.gz (6.9 kB view hashes)

Uploaded Apr 5, 2018 Source

Built Distribution

inaSpeechSegmenter-0.0.1-py3-none-any.whl (32.5 MB view hashes)

Uploaded Apr 5, 2018 Python 3

Hashes for inaSpeechSegmenter-0.0.1.tar.gz

Hashes for inaSpeechSegmenter-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`cbab84aa8ec0e6f69b40d5fea3dc64ccbeb0ca248ba8413ff6d895e599abc17e`
MD5	`d584922d351f284b1889e0a3e739f5fc`
BLAKE2b-256	`59beb5702b723c300864a190ceae20d8f5846354c3389a7bf170965f7bc898c5`

Hashes for inaSpeechSegmenter-0.0.1-py3-none-any.whl

Hashes for inaSpeechSegmenter-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b8cfc23df6310908b0e29fb60170854e426dd7527fc63d8ad7b96b95faaa952`
MD5	`d2d69531897059c22d62446cfa121b35`
BLAKE2b-256	`0bd41519e40dab2b16379f9b49bb825e2e056033021b7aec890f3502e771d89d`