skip to navigation
skip to content

Not Logged In

xbob.spkrec 0.0.1a1

Speaker recognition toolkit

Latest Version: 1.0.3

This is the speaker recognition toolkit, designed to run speaker verification/recognition experiments . It’s originally based on facereclib tool:

xbob.spkrec is designed in a way that it should be easily possible to execute experiments combining different mixtures of:

  • Speaker Recognition databases and their according protocols
  • Voice activity detection
  • Feature extraction
  • Recognition/Verification tools

In any case, results of these experiments will directly be comparable when the same dataset is employed.

xbob.spkrec is adapted to run speaker verification/recognition experiments with the SGE grid infrastructure at Idiap.

If you use this package and/or its results, please cite the following publications:

  1. The original paper presented at the NIST SRE 2012 workshop:

       author = {Khoury, Elie and El Shafey, Laurent and Marcel, S{\'{e}}bastien},
       month = {dec},
       title = {The Idiap Speaker Recognition Evaluation System at NIST SRE 2012},
       booktitle = {NIST Speaker Recognition Conference},
       year = {2012},
       location = {Orlando, USA},
       organization = {NIST},
       pdf = {}
  2. Bob as the core framework used to run the experiments:

      author = {A. Anjos and L. El Shafey and R. Wallace and M. G\"unther and C. McCool and S. Marcel},
      title = {Bob: a free signal processing and machine learning toolbox for researchers},
      year = {2012},
      month = oct,
      booktitle = {20th ACM Conference on Multimedia Systems (ACMMM), Nara, Japan},
      publisher = {ACM Press},
      url = {},


Just download this package and decompress it locally:

$ wget
$ unzip
$ cd xbob.spkrec

xbob.spkrec is based on the BuildOut python linking system. You only need to use buildout to bootstrap and have a working environment ready for experiments:

$ python bootstrap
$ ./bin/buildout

This also requires that bob (>= 1.2.0) is installed.

Running experiments

These two commands will automatically download all desired packages (gridtk, pysox and xbob.db.verification.filelist ) from GitHub or pypi and generate some scripts in the bin directory, including the following scripts:

$ bin/
$ bin/
$ bin/
$ bin/
$ bin/
$ bin/

These scripts can be used to employ different To use them you have to specify at least four command line parameters (see also the --help option):

  • --database: The configuration file for the database
  • --preprocessing: The configuration file for Voice Activity Detection
  • --feature-extraction: The configuration file for feature extraction
  • --tool-chain: The configuration file for the face verification tool chain

If you are not at Idiap, please precise the TEMP and USER directories:

  • --temp-directory: This typically contains the features, the UBM model, the client models, etc.
  • --user-directory: This will contain the output scores (in text format)

If you want to run the experiments in the GRID at Idiap or any equivalent SGE, you can simply specify:

  • --grid: The configuration file for the grid setup.

If no grid configuration file is specified, the experiment is run sequentially on the local machine. For several datasets, feature types, recognition algorithms, and grid requirements the xbob.spkrec provides these configuration files. They are located in the config/… directories. It is also safe to design one experiment and re-use one configuration file for all options as long as the configuration file includes all desired information:

  • The database: name, db, protocol; wav_input_dir, wav_input_ext;
  • The preprocessing: preprocessor = spkrec.preprocessing.<PREPROCESSOR>;
  • The feature extraction: extractor = spkrec.feature_extraction.<EXTRACTOR>;
  • The tool: tool =<TOOL>; plus configurations of the tool itself
  • Grid parameters: They help to fix which queues are used for each of the steps, how much files per job, etc.

By default, the ZT score normalization is activated. To deactivate it, please add the -z to the command line.

One way to compute the final result is to use the script from your Bob installation, e.g., by calling:

$ bin/ -d PATH/TO/USER/DIRECTORY/scores-dev -t PATH/TO/USER/DIRECTORY/scores-eval

Experiment design

To be very flexible, the tool chain in the xbob.spkrec is designed in several stages:

1. Signal Preprocessing (Voice Activity Detection)
2  Feature Extraction
3. Feature Projection
4. Model Enrollment
5. Scoring

Note that not all tools implement all of the stages.

Voice Activity Detection

This step aims to filter out the non speech part. Depending on the configuration file, several routines can be enabled or disabled.

  • Energy-based VAD
  • 4Hz Modulation energy based VAD

Feature Extraction

This step aims to extract features. Depending on the configuration file, several routines can be enabled or disabled.

  • LFCC/MFCC feature extraction
  • Spectrogram extraction
  • Feature normalization

Feature Projection

Some provided tools need to process the features before they can be used for verification. In the xbob.spkrec, this step is referenced as the projection step. Again, the projection might require training, which is executed using the extracted features from the training set. Afterward, all features are projected (using the previously trained Projector).

Model Enrollment

Model enrollment defines the stage, where several (projected or unprojected) features of one identity are used to enroll the model for that identity. In the easiest case, the features are simply averaged, and the average feature is used as a model. More complex procedures, which again might require a model enrollment training stage, create models in a different way.


In the final scoring stage, the models are compared to probe features and a similarity score is computed for each pair of model and probe. Some of the models (the so-called T-Norm-Model) and some of the probe features (so-called Z-Norm-probe-features) are split up, so they can be used to normalize the scores later on.

Command line options

Additionally to the required command line options discussed above, there are several options to modify the behavior of the xbob.spkrec experiments. One set of command line options change the directory structure of the output:

  • --temp-directory: Base directory where to write temporary files into (the default is /idiap/temp/$USER/<DATABASE> when using the grid or /scratch/$USER/<DATABASE> when executing jobs locally)
  • --user-directory: Base directory where to write the results, default is /idiap/user/$USER/<DATABASE>
  • --sub-directory: sub-directory into <TEMP_DIR> and <USER_DIR> where the files generated by the experiment will be put
  • --score-sub-directory: name of the sub-directory in <USER_DIR>/<PROTOCOL> where the scores are put into

If you want to re-use parts previous experiments, you can specify the directories (which are relative to the <TEMP_DIR>, but you can also specify absolute paths):

  • --preprocessed-image-directory
  • --features-directory
  • --projected-directory
  • --models-directories (one for each the Models and the T-Norm-Models)

or even trained Extractor, Projector, or Enroler (i.e., the results of the extraction, projection, or enrollment training):

  • --extractor-file
  • --projector-file
  • --enroler-file

For that purpose, it is also useful to skip parts of the tool chain. To do that you can use:

  • --skip-preprocessing
  • --skip-feature-extraction-training
  • --skip-feature-extraction
  • --skip-projection-training
  • --skip-projection
  • --skip-enroler-training
  • --skip-model-enrolment
  • --skip-score-computation
  • --skip-concatenation

although by default files that already exist are not re-created. To enforce the re-creation of the files, you can use the --force option, which of course can be combined with the --skip...-options (in which case the skip is preferred).

There are some more command line options that can be specified:

  • --no-zt-norm: Disables the computation of the ZT-Norm scores.
  • --groups: Enabled to limit the computation to the development (‘dev’) or test (‘eval’) group. By default, both groups are evaluated.
  • --preload-probes: Speeds up the score computation by loading all probe features (by default, they are loaded each time they are needed). Use this option only, when you are sure that all probe features fit into memory.
  • --dry-run: When the grid is enabled, only print the tasks that would have been sent to the grid without actually send them. WARNING This command line option is ignored when no --grid option was specified!


For the moment, there are 4 databases that are tested in xbob.spkrec. Their protocols are also shipped with the tool. You can use the script to compute EER and HTER on DEV and EVAL as follows:

$ bin/ -d scores-dev -t scores-eval

By default, this script will also generate the DET curve in a PDF file.

In this README, we give examples of different toolchains applied on different databases: Voxforge, BANCA, TIMIT, MOBIO, and NIST SRE 2012.

Voxforge dataset

Voxforge is a free database used in free speech recognition engines. We randomly selected a small part of the english corpus (< 1GB). It is used as a toy example for our speaker recognition tool since experiment can be easily run on a local machine, and the results can be obtained in a reasonnable amount of time (< 2h).

Unlike TIMIT and BANCA, this dataset is completely free of charge.

More details about how to download the audio files used in our experiments, and how the data is split into Training, Development and Evaluation set can be found here:

One example of command line is:

$ ./bin/ -d config/database/ -p config/preprocessing/ -f config/features/ -t config/tools/ --user-directory PATH/TO/USER/DIR --temp-directory PATH/TO/TEMP/DIR -z
-b ubm_gmm

In this example, we used the following configuration:

  • Energy-based VAD,
  • (19 MFCC features + Energy) + First and second derivatives,
  • UBM-GMM Modelling (with 256 Gaussians), the scoring is done using the linear approximation of the LLR.

The performance of the system on DEV and EVAL are:

  • DEV: EER = 2.00%
  • EVAL: HTER = 1.65%

Another example is to use ISV toolchain instead of UBM-GMM:

$ ./bin/ -d config/database/ -p config/preprocessing/ -f config/features/ -t config/tools/isv/  --user-directory PATH/TO/USER/DIR --temp-directory PATH/TO/TEMP/DIR  -z -b isv
  • DEV: EER = 1.41%
  • EVAL: HTER = 1.56%

or also IVector toolchain where Whitening, L-Norm, LDA, WCCN are used like in this example where the score computation is done using Cosine distance:

$ ./bin/ -d config/database/ -p config/preprocessing/ -f config/features/ -t config/tools/ivec/ --user-directory PATH/TO/USER/DIR --temp-directory PATH/TO/TEMP/DIR -z -b ivector_cosine
  • DEV: EER = 15.33%
  • EVAL: HTER = 15.78%

The scoring computation can also be done using PLDA:

$ ./bin/ -d config/database/ -p config/preprocessing/ -f config/features/ -t config/tools/ivec/ --user-directory PATH/TO/USER/DIR --temp-directory PATH/TO/TEMP/DIR -z -b ivector_plda
  • DEV: EER = 15.33%
  • EVAL: HTER = 16.93%

Note that in the previous examples, our goal is not to optimize the parameters on the DEV set but to provide examples of use.

BANCA dataset

BANCA is a simple bimodal database with relatively clean data. The results are already very good with a simple baseline UBM-GMM system. An example of use can be:

$ bin/ -d config/database/ -t config/tools/  -p config/preprocessing/ -f config/features/ --user-directory PATH/TO/USER/DIR --temp-directory PATH/TO/TEMP/DIR -z

The configuration in this example is similar to the previous one with the only difference of using the regular LLR instead of its linear approximation.

Here is the performance of this system:

  • DEV: EER = 1.66%
  • EVAL: EER = 0.69%

TIMIT dataset

TIMIT is one of the oldest databases (year 1993) used to evaluate speaker recognition systems. In the following example, the processing is done on the development set, and LFCC features are used:

$ ./bin/ -d config/database/ -t config/tools/ -p config/preprocessing/ -f config/features/ --user-directory PATH/TO/USER/DIR --temp-directory PATH/TO/TEMP/DIR -b lfcc -z --groups dev

Here is the performance of the system on the Development set:

  • DEV: EER = 2.68%

MOBIO dataset

This is a more challenging database. The noise and the short duration of the segments make the task of speaker recognition relatively difficult. The following experiment on male group uses the 4Hz modulation energy based VAD, and the ISV (with dimU=50) modelling technique:

$ ./bin/ -d config/database/ -p config/preprocessing/ -f config/features/ -t config/tools/ --user-directory PATH/TO/USER/DIR --temp-directory PATH/TO/TEMP/DIR -z

Here is the performance of this system:

  • DEV: EER = 10.40%
  • EVAL: EER = 10.36%


We first invite you to read the paper describing our system submitted to the NIST SRE 2012 Evaluation. The protocols on the development set are the results of a joint work by the I4U group. To reproduce the results, please check this dedicated package:
File Type Py Version Uploaded on Size (md5) Source 2013-09-04 303KB
  • Downloads (All Versions):
  • 14 downloads in the last day
  • 211 downloads in the last week
  • 804 downloads in the last month