ParlaMintCAT corpus

Dataset

DOI

Parliamentary speeches are considered to be of interest for different research areas because they are publicly available transcriptions, produced under controlled and regulated procedures that add totally reliable sociodemographic data like gender, age, and other details of the speakers. Moreover, speeches are rich in topics and domains, and they are actually public domain data, not subject to copyright restrictions. The ParlaMint project: Towards Comparable Parliamentary Corpora is developing a comparable and uniformly annotated multilingual corpus with the data from 33 different parliaments in Europe. This paper describes the details of building the ParlaMintCAT corpus, for which the transcriptions of the Catalan Parliament General Assembly sessions from 2015 to 2022 have been compiled, processed and annotated.

Identifier
DOI	https://doi.org/10.34810/data1137
Metadata Access	https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data1137

Provenance
Creator	Pisani, Marilina; Zevallos Salazar, Rodolfo ; Bel, Núria
Publisher	CORA.Repositori de Dades de Recerca
Contributor	Bel, Núria; Universitat Pompeu Fabra
Publication Year	2024
Rights	CC BY 4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess	true
Contact	Bel, Núria (Universitat Pompeu Fabra)

Representation
Resource Type	Textual data; Dataset
Format	application/x-compressed; text/plain
Size	70039316; 2688
Version	2.0
Discipline	Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Humanities; Life Sciences; Linguistics; Social Sciences; Social and Behavioural Sciences; Soil Sciences