Two Quaero corpora now available in the Linguistic Resources catalogue of ELRA
Two corpora of the program, a corpus of newspapers and a corpus of TV-broadcasted news, annotated in structured named entities, are now disseminated (in free access for academic research):
- ELRA-W0073 Quaero Old Press Extended Named Entity Corpus
The Quaero Old Press Extended Named Entity Corpus consists in the manually annotation of 76 newspapers, published between1890 and 1891, and provided by the National Library of France. Three newspapers are used (Time, La Croix and Le Figaro) for a total of 295 pages.
The corpus is fully manually annotated according to the Quaero extended and structured named entity definition which differentiates entity "types" and "components".
More information on this corpus: http://catalog.elra.info/product_info.php?products_id=1194
- ELRA-S0349 Quaero Broadcast News Extended Named Entity Corpus
The Quaero Broadcast News Extended Named Entity corpus consists of the
manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii)
the Quaero Speech Recognition Evaluation corpus (manual and automatic
transcriptions coming from 3 different ASR systems).
The corpus is fully manually annotated according to the Quaero extended
and structured named entity definition, which differentiates entity
"types" and "components".
More information on this corpus: http://catalog.elra.info/product_info.php?products_id=1195
These two corpora are described in:
S. Rosset, C. Grouin, K. Extremely, O. Galibert, J. Kahn, P. Zweigenbaum. „Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers” In Proc. of LAW VI, 2012.
For more information on the catalogue, please contact Valérie Mapelli mailto:email@example.com
Online ELRA Catalogue : http://catalog.elra.info/index.php?language=en
Universal ELRA Catalogue : http://universal.elra.info
Archives of updates made to the ELRA Linguistic Resources Catalogue : http://www.elra.info/LRs-Announcements.html