4,412 datasets found

Filter Results
  • LEKO v1.0

    The LEKO corpora LEKO_Kolipsi and LEKO_Merlin provide lexical annotations for phraseological elements in Italian L2 writing on the basis of a subset of the texts of the...
  • KoKo German L1 Learner Corpus v3

    The KoKo Corpus is an error-annotated learner corpus of L1 German speakers. It has been created with the aim to investigate and describe the writing skills of German-speaking...
  • Kolipsi-1 Corpus v1.1

    The Kolipsi-1 L2 is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI project...
  • KrdWrd CANOLA Corpus 1.1

    The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and...
  • MERLIN Written Learner Corpus for Czech, German, Italian 1.0

    The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR)...
  • MERLIN Written Learner Corpus for Czech, German, Italian 1.1

    The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR)...
  • Beldeko Summary Corpus v1.0.0

    Beldeko Summary Corpus v1.0.0 The Beldeko (Belgisches Deutschkorpus) Summary Corpus is a learner corpus that consists of summaries written by advanced L2 German learners (CEF...
  • AThEME Verona-Trento Corpus

    The AThEME Verona-Trento Corpus is a spoken corpus composed of data collected during the AThEME project in Work Package 2 ‘Regional Languages’ by the units of Verona and Trento...
  • DIDI - The DiDi Corpus of South Tyrolean CMC 1.0.0

    The DiDi corpus has an overall size of around 600.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook...
  • google22

    gggggggggggggggg
  • ASPAC – Swedish-Molise Slavic (2017-10-16) ASPAC – svenska-moliseslaviska (2...

    Part of The Amsterdam Slavic Parallel Aligned Corpus. The material is sentence scrambled. Del av The Amsterdam Slavic Parallel Aligned Corpus. Materialet är meningsomkastat.
  • the Morphologically Annotated Part of BulTreeBank

    This distribution represents only the morphological information encoded in BulTreeBank - HPSG-based Treebank of Bulgarian. It contains about 214.000 tokens. It was used for the...