Dataset - B2FIND

tweeDe

A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework

Open SDP

The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data...

MSTperl delexicalized parser transfer scripts and configuration files

This is a set of MSTperl parser configuration files and scripts for delexicalized parser transfer. They were used in the work reported in arXiv:1506.04897...

Slavic Forest, Norwegian Wood (models)

Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on...

CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to...

Open SDP 1.2

The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data...

MSTperl parser (2015-05-19)

MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser)...

Depfix: Automatic Post-editing of SMT

Depfix, a tool for Automatic Post-editing of SMT. See the project website for more information.

Slavic Forest, Norwegian Wood (scripts)

Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the...

LiStr: Linguistic Structure Induction Tookit

This toolkit comprises the tools and supporting scripts for unsupervised induction of dependency trees from raw texts or texts with already assigned part-of-speech tags. There...

IWPT 2021 Shared Task Data and System Outputs

This package contains data used in the IWPT 2021 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal...

IWPT 2020 Shared Task Data and System Outputs

This package contains data used in the IWPT 2020 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal...

MSTperl parser

MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser)...

tweeDe

A German UD Twitter treebank, with >12,000 tokens from 519 tweets, annotated in the Universal Dependencies framework

Croatian linguistic training corpus hr500k 2.0

The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and...

Training corpus ssj500k 1.3

The ssj500k training corpus is based on two training corpora built within the JOS project (https://nl.ijs.si/jos/). It contains the jos100k corpus and additional material from...

Serbian linguistic training corpus SETimes.SR 2.0

The SETimes.SR training corpus contains around 100,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation,...

The CLASSLA-StanfordNLP model for UD dependency parsing of standard Serbian

The model for UD dependency parsing of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the...

Training corpus ssj500k 2.0

The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....

Training corpus ssj500k 2.1

The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation....

45 datasets found