-
CorpoGrabber
CorpoGrabber: The Toolchain to Automatic Acquiring and Extraction of the Website Content Jan Kocoń, Wroclaw University of Technology CorpoGrabber is a pipeline of tools to get... -
MWE Świętochowski
Aleksander Świętochowski -
AspectEmo 1.0: Multi-Domain Corpus of Consumer Reviews for Aspect-Based Senti...
AspectEmo 1.0 Corpus is an extended version of a publicly available PolEmo 2.0 corpus of Polish customer reviews, that was used in many projects on the use of different methods... -
Wikinews_luty_marzec_2020
Test corpus _ 3_03_20 -
Corpus of the Colloquial Polish Language
The Corpus of the Colloquial Polish Language (CCPL) is a UGC-based corpus tagged with morpho-syntactic features by the team of professional linguists from the Wrocław University... -
Polish Spatial Texts (PST) 2.0
The extended version of Polish Spatial Text corpus. Texts derived from polish travel blogs manually annotated with spatial expressions. A spatial expression is a text fragment... -
MWE Wiek XX
berent_diogenes_1937.txt berent_kamienie_1918.txt berent_prochno_1903.txt dabrowska_nocednie1_1931.txt dabrowska_nocednie2_1932.txt dabrowska_nocednie3_1933.txt... -
WUT Relations Between Sentences Corpus
WUT Relations Between Sentences Corpus contains 2827 pairs of related sentences. Relationships are derived from Cross-document Structure Theory (CST), which enables... -
MWE Mostowicz
Tadeusz Dołęga-Mostowicz -
CEN
Corpus of Economic News (CEN) contains 797 documents from Polish Wikipedia annotated with 65 categories of proper names in ccl format.... -
MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of ...
MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine,... -
MWE 10 Największych
dabrowska_nocednie3_1933.txt prus_emancypantki_1894.txt sienkiewicz_ogniem_1884.txt kaczkowski_grob_1857.txt prus_faraon_1897.txt sienkiewicz_rodzina_1894.txt... -
MWE Sygietyński
Antoni Sygietyński -
Polish corpus of plWordNet usage examples
Corpus of 83k usage examples taken from plWordNet 3.0. All annotated with specific sense. All published on open licences. -
Big Data language model - subword - BPE - ARPA
Big data language model based on subword units, based on byte pair encoding in ARPA format -
MWE Żeromski
Stefan Żeromski -
Cleaned Polish Oscar corpus (128M lines)
Cleaned Polish Oscar corpus (part: 128M lines, 3.53 GB). Data was prepared with a few cleaning heuristics: - remove sentences shorter than - remove non-polish sentences... -
Big data language model stemmed with BPE in RAW format
Big data language model stemmed with BPE in RAW format -
MWE Dygasiński
Antoni Dygasiński -
MWE Wiek XIX
balucki_burmistrz_1887.txt balucki_murzyn_1875.txt balucki_przebudzeni_1864.txt beczkowska_bedzie_1897.txt beczkowska_droga_1898.txt beczkowska_gniezdzie_1899.txt...