Word representations for multiple languages

PID

Dictionaries with different representations for various languages. Representations include brown clusters of different sizes and morphological dictionaries extracted using different morphological analyzers. All representations cover the most frequent 250,000 word types on the Wikipedia version of the respective language.

Analzers used: MAGYARLANC (Hungarian, Zsibrita et al. (2013)), FREELING (English and Spanish, Padro and Stanilovsky (2012)), SMOR (German, Schmid et al. (2004)), an MA from Charles University (Czech, Hajic (2001)) and LATMOR (Latin, Springmann et al. (2014)).

Identifier
PID http://hdl.handle.net/11234/LRT-1483
Related Identifier http://cistern.cis.lmu.de/marmot/naacl2015/
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/LRT-1483
Provenance
Creator Müller, Thomas; Schütze, Hinrich
Publisher Center for Information and Language Processing, University of Munich
Publication Year 2015
Rights Creative Commons - Attribution 3.0 Unported (CC BY 3.0); http://creativecommons.org/licenses/by/3.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language English; German; Latin; Hungarian; Spanish; Castilian; Czech
Resource Type corpus
Format application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 36
Discipline Linguistics