Dataset - B2FIND

SemKpv - Semantic Database for Komi-Zyrian

This SQLite database contains Komi-Zyrian lemmas and their frequencies in a big corpus. The lemmas are linked to each other based on the syntactic relations they have had in the...

Finnish Semantic Relatedness Model

This model is a semantic model that captures the relatedness of Finnish words as word vectors. This model can be used in various tasks such as metaphor interpretation. For...

SemMyv - Semantic Database for Erzya

This SQLite database contains Erzya lemmas and their frequencies in a big corpus. The lemmas are linked to each other based on the syntactic relations they have had in the...

SemSms - Semantic Database for Skolt Sami

This SQLite database contains Skolt Sami lemmas and their frequencies in a big corpus. The lemmas are linked to each other based on the syntactic relations they have had in the...

VALLEX 3.0

VALLEX 3.0 provides information on the valency structure (combinatorial potential) of verbs in their particular senses, which are characterized by glosses and examples. VALLEX...

Prague Dependency Treebank 3.5

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied...

NomVallex I.

The NomVallex I. lexicon describes valency of Czech deverbal nouns belonging to three semantic classes, i.e. Communication (dotaz 'question'), Mental Action (plán 'plan') and...

PDT-Vallex: Czech Valency lexicon linked to treebanks

The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague...

EngVallex - English Valency Lexicon 2.0

EngVallex 2.0 as a slightly updated version of EngVallex. It is the English counterpart of the PDT-Vallex valency lexicon, using the same view of valency, valency frames and the...

Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0)

The Prague Dependency Treebank of Spoken Czech 2.0 (PDTSC 2.0) is a corpus of spoken language, consisting of 742,316 tokens and 73,835 sentences, representing 7,324 minutes...

NomVallex 2.0

NomVallex 2.0 is a manually annotated valency lexicon of Czech nouns and adjectives, created in the theoretical framework of the Functional Generative Description and based on...

SIMPLE-LOD

This resource contains the "LODification" of the names contained into the PAROLE SIMPLE CLIPS (PSC) Italian lexicon (http://hdl.handle.net/20.500.11752/ILC-88). The resource is...

LexicO

LexicO is a resource deriving from Parole-Simple-Clips (http://hdl.handle.net/20.500.11752/ILC-88). This resource contains all four levels of linguistic information represented...

A syntax/semantic confusion amidst our increasing science data-to-knowledge s...

Computers have increased the binding between science data and knowledge. However, what current computers can and can’t do is often unclear. This distinction can be clarified by...

34 datasets found