2,651 datasets found

  • Manual Re-evaluation of Translation Quality of WMT 2018 English-Czech systems

    This data set contains four types of manual annotation of translation quality, focusing on the comparison of human and machine translation quality (aka human-parity). The...
  • Large-Scale Colloquial Persian 0.5

    "Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a...
  • LatinISE corpus (version 4)

    The LatinISE corpus is a text corpus collected from the LacusCurtius, Intratext and Musisque Deoque websites. Corpus texts have rich metadata containing information as genre,...
  • SLäNDa

    SLäNDa, the Swedish literature corpus of narrative and dialogue, is a corpus made up of eight Swedish literary novels from the late 19th and early 20th centuries, manually...
  • Digital Humanities Courses at Czech Colleges 2017/2018

    Titles of courses possibly relevant to the Digital Humanities for 2017-2018, manually gathered from course catalogues of most Czech state colleges, including the names of the...
  • Czech image captioning, machine translation, sentiment analysis and summariza...

    This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving four NLP tasks: machine translation, image captioning, sentiment...
  • Database of speech corpora of Czech laryngectomy patients

    The corpus contains Czech speech of laryngectomy patients recorded before a surgery causing their voice to be lost in order to preserve the voice which can be later used for...
  • VIADAT (2019-12-31)

    This component integrates other VIADAT modules; together with VIADAT-REPO this composes the Virtual Assistant for accessing historical audiovisual data. The zip archive contains...
  • VIADAT-STAT (2019-12-31)

    A VIADAT module; the purpose of VIADAT-STAT is statistical analysis of recordings stored by the platform. Developed in cooperation with ÚSD AV ČR and NFA.
  • VIADAT-ANALYZE (2019-12-31)

    A VIADAT module; VIADAT-ANALYZE is an interactive environment that enables the end user to work with stored, annotated and indexed audio recordings. Allowing visualization and...
  • VIADAT-GIS (2019-12-31)

    A VIADAT module; VIADAT-GIS connects the platform with maps. Developed in cooperation with ÚSD AV ČR and NFA.
  • VIADAT-ANNOTATE (2019-12-31)

    A VIADAT module; VIADAT-ANNOTATE is an interactive annotation environment. Developed in cooperation with ÚSD AV ČR and NFA.
  • Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data...
  • Large Corpus of Czech Parliament Plenary Hearings

    We present a large corpus of Czech parliament plenary sessions. The corpus consists of approximately 444 hours of speech data and corresponding text transcriptions. The whole...
  • SynSemClass 1.0

    The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language...
  • COSTRA 1.0: A Dataset of Complex Sentence Transformations

    COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard...
  • English Model (CoNLL-2003) for NameTag

    English model for NameTag, a named entity recognition tool. The model is trained on CoNLL-2003 training data. Recognizes PER, ORG, LOC and MISC named entities. Achieves...
  • VIADAT-GIS

    A VIADAT module; VIADAT-GIS connects the platform with maps. Developed in cooperation with ÚSD AV ČR and NFA.
  • VIADAT-STAT

    A VIADAT module; the purpose of VIADAT-STAT is statistical analysis of recordings stored by the platform. Developed in cooperation with ÚSD AV ČR and NFA.
  • VIADAT-ANALYZE

    A VIADAT module; VIADAT-ANALYZE is an interactive environment that enables the end user to work with stored, annotated and indexed audio recordings. Allowing visualization and...