7,004 datasets found

  • Tagger SentiOne - version 1

    The SentiOne tagger is a tagger for the Polish language adapted to processing of user-generated content. It was trained on the Polish UGC-corpus (prepared within the same...
  • poznan lista 2

    próba 2
  • probka lista poznan

    próba test poznań
  • EWBST tests for english

    Submission contains test generated for EWBST test of English word embedding models. Tests were created with princeton wordnet and plWN english synsts.
  • Potchefstroom demo 2.0

    Potchefstroom (North-West University)
  • Potchefstroom demo

    Test Corpus (North-West University, Potchefstroom, South Africa)
  • JSlisty

    prv
  • PolEval 2019 Task 1: Lemmatization of proper names and multi-word phrases — t...

    The task consists in developing a tool for lemmatization of proper names and multi-word phrases. The generated lemmas should follow the KPWr guidelines...
  • KPWr annotation guidelines - named entity and phrase lemmatization 2.0

    Guidelines for named entity and multi-word phrase lemmatization used in in KPWr (Polish Corpus of Wrocław University of Technology).
  • Speech activity annotation for a subset of the Clarin-PL studio corpus

    This is a hand-checked annotation of speech activity within a subset of the Clarin-PL studio corpus, containing 20 session with 619 recordings. This submission does not contain...
  • Assamese POS-Tagged Text

    Assamese POS tagger is a CRF++ based POS Tagger. Raw text is given to this CRF++ based POS tagger to get POS tagged data. Standard POS tagset is used. These Assamese NLP...
  • Assamese POS Tagger

    Assamese POS tagger is a CRF++ based POS Tagger. CRF++ is a customizable open source Conditional Random Fields for tagging/labeling continuos text. CRF++ is implemented for...
  • Assamese Corpus

    Assamese Corpus was developed in the NLP Lab of Gauhati University. Total size of Assamese Corpus (in terms of words) is 1.6 million (1613551 words). The Corpus is prepared...
  • Assamese Root Words

    This list comprises of Assamese root words. Size of the Assamese Root Word List is 15,750 words These Assamese NLP resources including the Tools and Applications are...
  • Assamese-English Bilingual Dictionary

    The Bilingual dictionary is created for Assamese-English.. In the Bilingual dictionary English meaning of Assamese words are given with POS of the words. These Assamese NLP...
  • Assamese Multi Word Expressions

    Multiword Expressions are sequence of words, separated by space delimiter (or any) which determines a unique meaning instead of words' individual meanings. A list comprising of...
  • Assamese Named Entities

    A list comprising of 104138 Assamese named entities was developed. The list also comprises of NEs which are categorized as Organization(সদৌ অসম ছাত্ৰ সন্থা), Person...
  • Assamese Stopwords

    The most frequently occurring words in a context are the stopwords. They do not play an important role in retrieving information. As Stopwords do not contribute any important...
  • Assamese spell variation list

    A spelling variant of a word occurs when a word may not have only a single correct spelling. There are many different ways in which it can be spelled in linguistics. A spell...
  • Assamese WSD List

    WSD is the process of identifying the proper sense of an ambiguous word depending on the particular context. Assamese WSD list comprise of more than 100 words with their...