SAMuSA: Speech And Music Segmenter and Annotator

Target users and customers

The targeted users and customers are the multimedia industry actors, and all academic or industrial laboratories interested in audio document processing.

Application sectors

  • Audio and multi-media document processing


As shown on Figure below, the SAMuSA module takes an audio file or stream as an input, and returns a text file containing detected segments of: speech, music and silence.

To perform segmentation, SAMuSA use audio class models as external resources. It also calls external tools for audio feature extraction (Spro software [1]), and for audio segmentation and classification (Audioseg software [2]). These tools are included in the SAMuSA package.

Trained on hours of various TV and radio programs, this module provides efficient results: 95% of speech and 90% of music are correctly detected.

One hour of audio can be computed in approximately one minute on standard computers.


SAMuSA was developed in Irisa/INRIA Rennes by the Metiss team.

  • The SAMuSA authors are: Frédéric Bimbot, Guillaume Gravier, Olivier Le Blouch.
  • The Spro author is: Guillaume Gravier
  • The Audioseg authors are: Mathieu Ben, Michaël Betser, Guillaume Gravier

Technical requirements:

  • PC with Unix/Linux OS

Conditions for access and use:

SAMuSA is a software that has been developed at Irisa in Rennes and is the property of CNRS and Inria.
SAMuSA is currently available as a prototype only. It can be released and supplied under license on a case-by-case basis.



  • Inria

Contact details:

General issues:
Patrick GROS

Technical issues:
Sebastien CAMPION

IRISA/Metiss Team
Campus de Beaulieu
35042 Rennes Cedex