Dataset - B2FIND

Training corpus SUK 1.0

The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slov...

This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Slove...

This model for morphosyntactic annotation of non-standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard ...

This model for morphosyntactic annotation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on...

The Trankit model for linguistic process of standard written Slovenian 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the reference SSJ...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian

The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

Serbian Twitter training corpus ReLDI-NormTagNER-sr 3.0

ReLDI-NormTagNER-sr 3.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Bulg...

This model for morphosyntactic annotation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croa...

The model for morphosyntactic annotation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Mace...

This model for morphosyntactic annotation of standard Macedonian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

Trankit model for linguistic processing of spoken Slovenian

This is a retrained Slovenian spoken language model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,...

Trankit model for SST 2.15 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Bulg...

This model for morphosyntactic annotation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

The Trankit model for linguistic processing of written and spoken Slovenian 1.2

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation...

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slov...

The model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

Trankit model for SST 2.15

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank...

Training corpus ssj500k 2.3

The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....

The CLASSLA-StanfordNLP model for morphosyntactic annotation of non-standard ...

This model for morphosyntactic annotation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on...

Training corpus SETimes.SR 1.0

The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic...

MorphoDiTa-based tagger for Polish language

MorphoDiTa-based tagger for Polish language. It is a tool for morphosyntactic unification for the Polish language, according to the NKJP tagset.

64 datasets found