-
The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1
The model for lemmatisation of non-standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus... -
Croatian Twitter training corpus ReLDI-NormTagNER-hr 2.0
ReLDI-NormTagNER-hr 2.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.1
The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR... -
The CLASSLA-StanfordNLP model for named entity recognition of non-standard Cr...
This model for named entity recognition of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the... -
CMC training corpus Janes-Syn 1.0
Janes-Syn is a syntactically annotated corpus of Slovene tweets and is meant as a gold-standard training and testing dataset for syntactic annotation of Slovene... -
The CLASSLA-Stanza model for lemmatisation of non-standard Serbian 2.1
The model for lemmatisation of non-standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus... -
Tweet comma corpus Janes-Vejica 1.0
Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to the supplied typology. The corpus was sampled from... -
CMC training corpus Janes-Tag 2.0
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
The CLASSLA-StanfordNLP model for named entity recognition of non-standard Sl...
This model for named entity recognition of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on... -
Annotated corpus of Slovenian language-related news comments MetaLangNEWS-COM...
A comprehensive corpus of user comments on online news articles on the topic of language from major Slovenian daily newspapers and news portals, published in the five-year... -
Dataset of normalised Slovene text KonvNormSl 1.0
Data used in the experiments described in: Nikola Ljubešić, Katja Zupan, Darja Fišer and Tomaž Erjavec: Normalising Slovene data: historical texts vs. user-generated content.... -
Wikipedia talk corpus Janes-Wiki 1.0
Janes-Wiki is an annotated corpus of discussion pages from the Slovene Wikipedia from the period 2003-08 to 2017-06. The corpus contains page and user talks and is structured... -
CMC training corpus Janes-Norm 1.0
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,... -
CMC training corpus Janes-Tag 1.0
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
ReLDI-NormTag-sr 1.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word... -
Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.0
ReLDI-NormTagNER-sr 2.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
Croatian Twitter training corpus ReLDI-NormTagNER-hr 3.0
ReLDI-NormTagNER-hr 3.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
Use of Social Media and Online Applications in Science 2014
Use of social media and online-based applications in the context of scientific activities at German universities and research institutions. Topics: assessment of activities and... -
Use of Social Media and Online Applications in Science 2013
Use of social media and online-based applications in the context of scientific activities at German universities and research institutions. Topics: assessment of activities and... -
Replication data for: Slangs go online, or the rise and fall of the Olbanian ...
All the data were taken from the website udaff.com (the center of the padonki culture and one of the cradles of the Olbanian language), from the section kreativy ('creative...