IRINTS: Irisa News Topic Segmenter

 
 
 
 
 
 
 
 
 
 
Technicolor
IRINTS: Irisa News Topic Segmenter
 

Topic segmentation of automatic speech transcripts

Target users and customers

The targeted users and customers are the multimedia industry actors, and any content and service provider with speech data.

Application sectors

  • Spoken document processing

Description:

IRINTS (Irisa News Topic Segmenter) was designed for topic segmentation of broadcast news transcripts.

  • The distribution includes a front-end script, ‘irints’, which is merely a wrapper to the main ‘topic-segmenter’ program included herein (topic-segmenter, release 1.1 [1] ).
  • The topic-segmenter program is a software dedicated to topic segmentation of texts and (automatic) transcripts, mostly based on lexical cohesion, implementing (and extending) a method described in [2].
  • A bunch of goodies, such as the use of alternate knowledge sources, were added. For more details (and assuming you can read the French language), please refer to [3].

As shown on figure 1 below, input to IRINTS is an automatic transcript (in Vecsys’s VOX format or IRISA’s SSD format). The output is an XML file in SSD format specifying topic segments.

[1] http://gforge.inria.fr/projects/topic-segmenter/
[2] Masao Utiyama and Hitoshi Isahara, «A Statistical Model for Domain-Independent Text Segmentation», ACL, 491–498, 2001
[3] S. Huet, G. Gravier and P. Sébillot, «Un modèle multisources pour la segmentation en sujets de journaux radiophoniques», in Proc. Traitement Automatique des Langues Naturelles, 2008.

IRINTS was developed at Irisa in Rennes by the Texmex and Metiss teams.
The IRINTS authors are: Guillaume Gravier, Camille Guinaudeau

Technical requirements:

  • SPC with Unix/Linux OS
  • IRINTS requires a C compiler, Perl [1], the libxml2 [2] library, and the TreeTagger [3] software to be installed on the system

[1] http://www.perl.org/
[2] http://xmlsoft.org/
[3] http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Conditions for access and use:

IRINTS is a software that has been developed at Irisa in Rennes and is the property of CNRS (DI 03033-01) and Inria. Registration At the Agency for Program Protection (APP) in France, is currently under process.
License can be supplied under request on a case-by-case basis.

Q-Tech-INRIA-AACI-visuel

Partners:

  • Inria

Contact details:

General issues:
Patrick GROS
patrick.gros@irisa.fr

Technical issues:
Sebastien CAMPION
scampion@irisa.fr

IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex
France

http://www.irisa.fr/