Word representations for multiple languages


Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using different morphological analyzers. All representations cover the most frequent 250,000 word types on the Wikipedia version of the respective language.

Analzers used: MAGYARLANC (Hungarian, Zsibrita et al. (2013)), FREELING (English and Spanish, Padro and Stanilovsky (2012)), SMOR (German, Schmid et al. (2004)), an MA from Charles University (Czech, Hajic (2001)) and LATMOR (Latin, Springmann et al. (2014)).

PID http://hdl.handle.net/11234/LRT-1483
Related Identifier http://cistern.cis.lmu.de/marmot/naacl2015/
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/LRT-1483
Creator Müller, Thomas; Schütze, Hinrich
Publisher Center for Information and Language Processing, University of Munich
Publication Year 2015
Rights Creative Commons - Attribution 3.0 Unported (CC BY 3.0); http://creativecommons.org/licenses/by/3.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Language English; German; Latin; Hungarian; Spanish; Castilian; Czech
Resource Type corpus
Format application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 36
Discipline Linguistics