1,108,827 datasets found

  • Żeromski

  • Stylo

    Stylometria, pisarze.
  • Open license texts sample

    Sample corpus of texts distributed under open license. It consists of 20 documents in TXT, DOCX, DOC or ODT format.
  • Late 19th- and Early 20th-Century Polish Novels

    Corpus of late 19th- and early 20th-century literary texts intended as benchmark collection for text categorization. It contains 100 Polish novels written by various authors....
  • The 8' th of March Corpus

    The articles are especially concerned with the International Women's Day
  • International Women's Day Corpus

    The corpus contains articles form the daily "Trybuna Ludu" from years 1949-1956.The articles dealt with the situation of women, they were especially concerned with the...
  • Liner2.4

    A framework for multitask sequence labeling dedicated for natural language processing tasks.
  • Spokes search engine for Polish conversational data

    Spokes is an online service for conversational corpus data search and exploration as part of the Polish CLARIN infrastructure. The underlying corpus contains more than 2 million...
  • WCRFT Webservice (2014-10-24)

    Webservice for Weblicht
  • WCRFT Webservice (2014-10-24)

    Webservice for Weblicht
  • WCRFT Webservice

    Webservice for Weblicht
  • Spartan

    Ekstrakcja słów kluczowych
  • LexCSD

    Dostarcza wspólny interfejs dla kilku pakietów zawierających klasyfikatory, m.in. Weka, TiMBL, chyba też Orange i NLTK.
  • Pytania i odpowiedzi z serwisu wikipedyjnego "Czy wiesz", wersja 1.1

    Czy wiesz” (pol. “Did you know”) is a set of 4721 questions, each linked to a Wikipedia article that contains the answer. For 250 questions a detailed manual analysis has been...
  • WordnetLoom

    WordnetLoom – is an wordnet editor application built for the needs of the construction of a the largest Polish wordnet called plWordNet. WordnetLoom provides two means of...
  • WCRFT2

    WCRFT is a morphosyntactic tagger for Polish. The tagger brings together Conditional Random Fields (CRF) and tiered tagging of plain tekst.
  • WCRFT

    WCRFT (Wrocław CRF Tagger) is a simple morpho-syntactic tagger for Polish producing state-of-the-art results. The tagger combines tiered tagging, conditional random fields (CRF)...
  • WCCL

    WCCL (Wrocław Corpus Constraint Language) is a formalism for writing functional expressions evaluated on morpho-syntactically annotated text. These expressions may be used...
  • Vector Extractor

    Collocations presented are based on co-occurrences of a selected noun with several features describing it and linked with it by syntactic dependencies. The recognised features...