32 datasets found

Keywords: Lithuanian

Filter Results
  • Lithuanian Treebank ALKSNIS (2019-10-24)

    ALKSNIS v3.0. ALKSNIS v3,0 consists of 3,643 syntactically annotated sentences in the PML (Prague Mark-up Language) format. The format allows researchers to visualise and edit...
  • JABLONSKIS tagset v2

    JABLONSKIS VERSION 2 is a Lithuanian standard morphologiclal tagset that is based on the abbreviations of parts of speech and other grammatical categories commonly used in...
  • Language Technology Research Bibliography for Lithuanian 2016-2020

    The language technology bibliography for Lithuanian language in the period 2016-2020. The resource is in BibTex format and it contains: 1) 91 references of research...
  • ORVELIT v3

    ORVELIT v3 (Lith.Originalios ir Vertimų Lietuvių Kalbos Tekstynas) is a comparable monolingual corpus of original and translated Lithuanian consisting of four sub-corpora of...
  • Lithuanian keyboard for macOS users

    This keyboard driver allows easy access of the Lithuanian letters via conventional keyboard layout a.k.a. „Lithuanian letters instead of numbers“. Essential new feature of this...
  • Corpus of the Contemporary Lithuanian Language

    Corpus of the Contemporary Lithuanian Language, which comprises 208 million words, is a collection of texts designed to represent the current Lithuanian. The corpus has been...
  • Lemmatised Wordlist of 1 m. Corpus of Contemporary Lithuanian

    The lemmatised wordlist of 1 m. word Lithuanian corpus. The structure of the tab delimited text file (dazninis.txt): HeadwordPart of SpeechWordformFrequency of Occurrence. The...
  • Lithuanian speech-to-text Transcriber

    Speech to text automatic transcriber for Lithuanian is a containerized application implemented into 17 containers. It covers four areas: administrative, legal, medical and...
  • Corpus of Discourse on Crime

    Specialised "Corpus of Discourse on Crime" is synchronic, monolingual, unannotated, consists of two subcorpora. Subcorpus 1: all texts on crime, published in criminal columns on...
  • Lithuanian Word embeddings

    GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1)...
  • Assessment Data of the Dictionary of Modern Lithuanian versus Joint Corpora

    The resource is the assessment data of The Dictionary of Modern Lithuanian, 6th edition (DML6) [1], from the point of view of its coverage in the Joint Corpus of Lithuanian...
  • Lithuanian Coreference Corpus

    Lithuanian Coreference Corpus The corpus is made out of 100 articles from news portals focusing on political news, as such texts are rich in quotations and named entity...
  • DELFI.lt corpus

    DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date,...
  • Lithuanian Spelling Checker V.1.0.45 for macOS

    Lithuanian spelling checker for macOS 2020-04-10 version 1.0.45
  • Lithuanian morphologically annotated corpus - MATAS v1.0

    MATAS corpus (version 1.0) DESCRIPTION Manually checked, morphologically annotated corpus MATAS FORMATS 1. CoNLL-U (CONLLU, conllu) 2. SketchEngine - tab delimited word per...
  • Lithuanian 3-gram dataset

    Dataset of 3-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then symbol...
  • Lithuanian 1-gram dataset

    Dataset of 1-grams with frequencies extracted from Delfi.lt corpus (~ 70 million words, period: March 2014 - November 2016). Firstly corpus was split into sentences, then...
  • Database of Lithuanian Multiword Expressions

    Database of Lithuanian multiword expressions (MWE) contains bi-gram and tri-gram MWE that occured in DELFI.lt corpus (http://tekstynas.mwe.lt/) at least 10 times. In the...
  • Lithuanian Corpus of the EU Primary and Secondary Law Acts of the Period 2015...

    274,460 word corpus comprised of selected primary and secondary law acts of the EU of the period 2015-2017. The corpus was compiled of documents containing words with the root...
  • Dual Pronoun Translation Concordances

    The resource offers two data sets: concordances of dual pronoun translations from Lithuanian into English (942 concordance lines) and translations of English pronouns into...
You can also access this registry using the API (see API Docs).