Annotated corpus of Serbian language-related news comments MetaLangNEWS-COMMENTS-Sr

PID

A comprehensive corpus of user comments on online news articles on the topic of language from major Serbian daily newspapers and news portals, published in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologies, language policy and planning, as well as the specific contemporary debates on language defining, naming, and standardisation, from the bottom-up perspective. The corpus has been tagged using the CLASSLA-StanfordNLP models for morphosyntactic annotation and lemmatisation of non-standard Serbian. The corpus is available in plain text version, XML with full metadata, and tagged CONLL-U format. This collection is complementary to the corpus of news articles MetaLangNEWS-Sr (http://hdl.handle.net/11356/1371). Parallel versions from Croatia (http://hdl.handle.net/11356/1370) and Slovenia (http://hdl.handle.net/11356/1362) are also available.

Identifier
PID http://hdl.handle.net/11356/1372
Related Identifier https://ikss.zrc-sazu.si/en/programi-in-projekti/re-imagining-language-nation-and-collective-identity-in-the-21st-century#v
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1372
Provenance
Creator Bogetić, Ksenija; Batanović, Vuk
Publisher ZRC SAZU; Regional Linguistic Data Initiative Centre ReLDI
Publication Year 2020
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Serbian
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 3
Discipline Linguistics