Dataset - B2FIND

Training corpus ssj500k 2.2

The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....

The Trankit model for linguistic processing of spoken and written Slovenian 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation...

The CLASSLA-StanfordNLP model for JOS dependency parsing of standard Slovenia...

The model for JOS dependency parsing of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

Training corpus ssj500k 1.4

The ssj500k training corpus contains 500,000 words, manually annotated on the levels of tokenization, sentence segmentation, morphosyntactic tagging, lemmatisation, named...

The CLASSLA-StanfordNLP model for UD dependency parsing of standard Bulgarian...

The model for UD dependency parsing of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

The CLASSLA-StanfordNLP model for UD dependency parsing of standard Slovenian

The model for UD dependency parsing of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

ReLDI tag+lemma+parse web service for WebLicht

WebLicht (https://weblicht.sfs.uni-tuebingen.de/) registry entry for webservice comprising tokenisation, PoS tagging, lemmatisation and dependency parsing. Tool source files...

Training corpus SUK 1.1

The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...

Training corpus hr500k 1.0

The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and...

The CLASSLA-Stanza model for UD dependency parsing of standard Serbian 2.1

The model for UD dependency parsing of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training...

The Trankit model for linguistic processing of standard Slovenian

This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,...

The CLASSLA-StanfordNLP model for UD dependency parsing of standard Croatian

The model for UD dependency parsing of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

Training corpus SUK 1.0

The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...

The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.0

This model for UD dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus...

The Trankit model for linguistic process of standard written Slovenian 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the reference SSJ...

Trankit model for linguistic processing of spoken Slovenian

This is a retrained Slovenian spoken language model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,...

The CLASSLA-Stanza model for JOS dependency parsing of standard Slovenian 2.0

This model for JOS dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus...

Trankit model for SST 2.15 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank...

The CLASSLA-Stanza model for UD dependency parsing of standard Croatian 2.1

The model for UD dependency parsing of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the UD-parsed portion of the...

The Trankit model for linguistic processing of written and spoken Slovenian 1.2

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation...

45 datasets found