-
Character-level part-of-speech tagger of Slovene language
Part-of-speech tagger for Slovene language implemented using convolutional and LSTM neural networks. Tagger uses character-level representation of sentences. The tagger has been... -
xLiMe Twitter Corpus XTC 1.0.1
The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total,... -
Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.1
ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
Training corpus jos1M 1.2
The jos1M corpus contains 1 million words of sampled paragraphs from the Gigafida corpus. It is meant to serve as a training corpus for word-level tagging of Slovene. This... -
MULTEXT-East "1984" annotated corpus 4.0
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original... -
Training corpus SUK 1.1
The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with... -
CMC training corpus Janes-Tag 2.1
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
Training corpus hr500k 1.0
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and... -
The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard ...
This model for morphosyntactic annotation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on... -
The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Serbi...
This model for morphosyntactic annotation of non-standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR... -
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slov...
This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the... -
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croa...
The model for morphosyntactic annotation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the... -
Word embeddings CLARIN.SI-embed.hr 1.0
CLARIN.SI-embed.hr contains word embeddings induced from a large collection of Croatian texts composed of the Croatian web corpus hrWaC and a 400-million-token-heavy collection... -
Annotated Corpus of Pre-Standardized Balkan Slavic Literature 1.1
The corpus contains 23 linguistically annotated samples of "damaskini" and other Balkan Slavic manuscripts and print editions from the 15th-19th century, together with over 50... -
The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Croat...
This model for morphosyntactic annotation of non-standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k... -
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serb...
The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the... -
The Trankit model for linguistic processing of standard Slovenian
This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,... -
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Mace...
This model for morphosyntactic annotation of standard Macedonian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the... -
The CLASSLA-Stanza model for morphosyntactic annotation of standard Serbian 2.1
The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training... -
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slov...
This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...