The module aims at performing automatic segmentation and clustering of
an input audio according to speaker identity using acoustic cues.
Multimedia document indexing and archiving services.
Speaker diarization is the process of partitioning an input audio stream
into homogeneous segments according to their speaker identity. This partitioning is a useful preprocessing step for an automatic speech
transcription system, but it can also improve the readability of the
transcription by structuring the audio stream into speaker turns. One of the major issues is that the number of speakers in the audio
stream is generally unknown a priori and needs to be automatically
Given samples of known speaker’s voices, speaker verification techniques
can be further applied and provide clusters of identified speaker.
The LIMSI multi-stage speaker diarization system combines an
agglomerative clustering based on Bayesian information criterion (BIC)
with a second clustering stage using speaker identification (SID)
techniques with more complex models.
This system participated to several evaluations on acoustic speaker
diarization, on US English Broadcast News for NIST Rich Transcription
2004 Fall (NIST RT’04F) and on French broadcast radio and TV news and
conversations for the ESTER-1 and ESTER-2 evaluation campaigns,
providing state-of-the-art performances. Within the QUAERO program, LIMSI is developing improved speaker
diarization and speaker tracking systems for broadcast news but also for
more interactive data like talk shows.
It is a building block of the system presented by QUAERO partners to the REPERE challenge on multimodal person identification.
A standard PC with Linux operating system.
The technology developed at LIMSI-CNRS is available for licensing on a case-by-case basis.