-
Finnish Words and their Concreteness Values
Context This data has been produced for poem generation in Finnish. If you use this dataset in your publication, please cite: Hämäläinen, M., & Alnajjar, K. (2019). Let’s... -
Cases of Complements of Finnish Verbs
Context Cases of the complements of Finnish verbs. The data is useful for natural language generation (NLG). The data is described in the following paper, which should also be... -
Annotated Route Description
This file set existing of a video stream, an audio stream and a multimodal annotation file is a frequently used as show case of how to do complex multimodal annotations with the... -
s.morfcorpus.6ec19594.20131227-2309
WMT 2013 Crawled News monolingual corpus, Czech, segmented by Morfessor -
Finnish Semantic Relatedness Model
This model is a semantic model that captures the relatedness of Finnish words as word vectors. This model can be used in various tasks such as metaphor interpretation. For... -
Creative Dialog Generation for Fallout 4
Mika Hämäläinen and Khalid Alnajjar. 2019. Creative contextual dialog adaptation in an open world RPG. In Proceedings of the 14th International Conference on the Foundations of... -
Replication of part of the IFA corpus
The IFA Spoken Language corpus is a free (GPL) database of hand-segmented Dutch speech. It was constructed with off-the-shelf software using speech from 8 speakers in a variety... -
B2 eta C1 mailetako azterketen etiketatzea eta analisia
Hizkuntza ikasleen azterketak bildu ditugu. Europar markoko B2 eta C1 mailetako probak dira, sail bakoitzetik 20 ale. Horiek etiketatu eta ondoren esleitutako etiketekin analisi... -
Orthography-based dating and localisation of Middle Dutch charters
In this study we build models for the localisation and dating of Middle Dutch charters. First, we extract character trigrams and use these to train a machine learner (K Nearest... -
SIgn Language Recording
This is a Sign Language Recording made for scientific purposes. -
UralicNLP - The NLP library for Uralic languages
UralicNLP is a natural language processing library targeted mainly for Uralic languages. UralicNLP can produce morphological analysis, generate morphological forms, lemmatize... -
SemMyv - Semantic Database for Erzya
This SQLite database contains Erzya lemmas and their frequencies in a big corpus. The lemmas are linked to each other based on the syntactic relations they have had in the... -
Psycholinguistic Experiment Video
This is a video recording that is being used in psycholinguistic experiments. -
Natas - Python 3 library for processing historical English
This library will have methods for processing historical English corpora, especially for studying neologisms. The first functionalities to be released relate to normalization of... -
TXM_0.7.7_Win64.exe
TXM 0.7.7 for Windows 64-bit setup file TXM is a free and open-source (GPL v3) textual corpora analysis platform. It combines five key components: a) the ability to import and... -
Model for Normalizing Historical English
This is an OpenNMT-py model for normalizing historical English into modern spelling. For usage, please see: https://github.com/mikahama/natas This has been described in the... -
Skolt Sami - North Sami Cognates
A human curated list of Skolt Sami (sms) - North Sami (sme) cognates found with an automatic method described in: Hämäläinen, M., & Rueter, J. (2019). Finding Sami Cognates... -
SemSms - Semantic Database for Skolt Sami
This SQLite database contains Skolt Sami lemmas and their frequencies in a big corpus. The lemmas are linked to each other based on the syntactic relations they have had in the... -
Comparison of the usage of nouns by female and male members of the Polish par...
Dataset based on the Polish Parliamentary Corpus: utterances from male and female Members of Parliament (MP), extracted from the current cadency (8th) of Sejm, between... -
FinMeter - Tools for assessing Finnish poetry
FinMeter is a library for analyzing poetry in Finnish. It handles typical rhyming such as alliteration, assonance and consonance, Japanese meters and Kalevala meter. It can also...