Dataset - B2FIND

Universal Dependencies 2.0

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Czech Legal Text Treebank

The Czech Legal Text Treebank (CLTT) is a collection of 1133 manually annotated dependency trees. CLTT consists of two legal documents: The Accounting Act (563/1991 Coll., as...

Universal Dependencies 2.8.1

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.12

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.1

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

PML Tree Query

System for querying annotated treebanks in PML format. The querying uses it own query language with graphical representation. It has two different implementations (SQL and Perl)...

Czech Legal Text Treebank 2.0

The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. These texts come from the legal domain and they are manually syntactically annotated. The...

Czech-English Parallel Corpus 1.0 (CzEng 1.0)

CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL) freely available for...

IWPT 2020 Shared Task Data and System Outputs

This package contains data used in the IWPT 2020 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal...

Prague Dependency Treebank 3.5

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied...

Deep Sequoia corpus - PARSEME-FR corpus - FrSemCor

The Sequoia corpus is a set of 3,099 linguistically-annotated French sentences, originating from four sources (Europarl, European Agency Reports, French regional journal L'Est...

Prague Dependency Treebank 2.5

The Prague Dependency Treebank 2.5 annotates the same texts as the PDT 2.0. The annotation on the original four layers was fixed or improved in various aspects (see...

CoNLL 2017 Shared Task - UDPipe Baseline Models and Supplementary Materials

Baseline UDPipe models for CoNLL 2017 Shared Task in UD Parsing, and supplementary material. The models require UDPipe version at least 1.1 and are evaluated using the official...

Slovak Dependency Treebank

Slovak Dependency Treebank (Slovenský závislostný korpus) was created as part of the Slovak National Corpus at the Ľ. Štúr Institute of the Slovak Academy of Sciences. The...

Universal Dependencies 2.10

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated...

Prague Discourse Treebank 2.0

PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.

Coreference in Universal Dependencies 1.1 (CorefUD 1.1)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Universal Dependencies 2.3

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

FAUST 0.5

Syntactic (including deep-syntactic - tectogrammatical) annotation of user-generated noisy sentences. The annotation was made on Czech-English and English-Czech Faust Dev/Test...

56 datasets found