ParlaMintCAT corpus

DOI

Parliamentary speeches are considered to be of interest for different research areas because they are publicly available transcriptions, produced under controlled and regulated procedures that add totally reliable sociodemographic data like gender, age, and other details of the speakers. Moreover, speeches are rich in topics and domains, and they are actually public domain data, not subject to copyright restrictions. The ParlaMint project: Towards Comparable Parliamentary Corpora is developing a comparable and uniformly annotated multilingual corpus with the data from 33 different parliaments in Europe. This paper describes the details of building the ParlaMintCAT corpus, for which the transcriptions of the Catalan Parliament General Assembly sessions from 2015 to 2022 have been compiled, processed and annotated.

Identifier
DOI https://doi.org/10.34810/data1137
Metadata Access https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data1137
Provenance
Creator Pisani, Marilina; Zevallos Salazar, Rodolfo ORCID logo; Bel, Núria ORCID logo
Publisher CORA.Repositori de Dades de Recerca
Contributor Bel, Núria; Universitat Pompeu Fabra
Publication Year 2024
Rights CC BY 4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Bel, Núria (Universitat Pompeu Fabra)
Representation
Resource Type Textual data; Dataset
Format application/x-compressed; text/plain
Size 70039316; 2688
Version 2.0
Discipline Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Humanities; Life Sciences; Linguistics; Social Sciences; Social and Behavioural Sciences; Soil Sciences