Dataset - B2FIND

tweeDe

A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework

ENIAMtoolkit (2017-03-06)

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences; - LCG...

The CLASSLA-Stanza model for UD dependency parsing of standard Bulgarian 2.1

The model for UD dependency parsing of standard Bulgarian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the UD-parsed portion of...

Training corpus SETimes.SR 1.0

The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic...

Training corpus ssj500k 2.3

The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....

Trankit model for SST 2.15

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank...

The Trankit model for linguistic processing of written and spoken Slovenian 1.2

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation...

The CLASSLA-Stanza model for UD dependency parsing of standard Croatian 2.1

The model for UD dependency parsing of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the UD-parsed portion of the...

Trankit model for SST 2.15 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank...

The CLASSLA-Stanza model for JOS dependency parsing of standard Slovenian 2.0

This model for JOS dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus...

Trankit model for linguistic processing of spoken Slovenian

This is a retrained Slovenian spoken language model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,...

The Trankit model for linguistic process of standard written Slovenian 1.1

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the reference SSJ...

The CLASSLA-Stanza model for UD dependency parsing of standard Slovenian 2.0

This model for UD dependency parsing of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus...

Training corpus SUK 1.0

The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...

The CLASSLA-StanfordNLP model for UD dependency parsing of standard Croatian

The model for UD dependency parsing of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

The Trankit model for linguistic processing of standard Slovenian

This is a retrained Slovenian standard model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization,...

The CLASSLA-Stanza model for UD dependency parsing of standard Serbian 2.1

The model for UD dependency parsing of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training...

Training corpus hr500k 1.0

The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and...

Training corpus SUK 1.1

The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...

ReLDI tag+lemma+parse web service for WebLicht

WebLicht (https://weblicht.sfs.uni-tuebingen.de/) registry entry for webservice comprising tokenisation, PoS tagging, lemmatisation and dependency parsing. Tool source files...

45 datasets found