Universal Segmentations 1.0 (UniSegments 1.0)

PID

Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.

Identifier
PID http://hdl.handle.net/11234/1-4629
Related Identifier https://ufal.mff.cuni.cz/techrep/tr69.pdf
Related Identifier https://ufal.mff.cuni.cz/universal-segmentations
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-4629
Provenance
Creator Žabokrtský, Zdeněk; Bafna, Nyati; Bodnár, Jan; Kyjánek, Lukáš; Svoboda, Emil; Ševčíková, Magda; Vidra, Jonáš; Angle, Sachi; Ansari, Ebrahim; Arkhangelskiy, Timofey; Batsuren, Khuyagbaatar; Bella, Gábor; Bertinetto, Pier Marco; Bonami, Olivier; Celata, Chiara; Daniel, Michael; Fedorenko, Alexei; Filko, Matea; Giunchiglia, Fausto; Haghdoost, Hamid; Hathout, Nabil; Khomchenkova, Irina; Khurshudyan, Victoria; Levonian, Dmitri; Litta, Eleonora; Medvedeva, Maria; Muralikrishna, S. N.; Namer, Fiammetta; Nikravesh, Mahshid; Padó, Sebastian; Passarotti, Marco; Plungian, Vladimir; Polyakov, Alexey; Potapov, Mihail; Pruthwik, Mishra; Rao B, Ashwath; Rubakov, Sergei; Samar, Husain; Sharma, Dipti Misra; Šnajder, Jan; Šojat, Krešimir; Štefanec, Vanja; Talamo, Luigi; Tribout, Delphine; Vodolazsky, Daniil; Vydrin, Arseniy; Zakirova, Aigul; Zeller, Britta
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2022
Rights Universal Segmentations 1.0 License Terms; https://lindat.mff.cuni.cz/repository/xmlui/page/licence-unisegs-1.0; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech; Catalan; Valencian; German; English; Persian; Farsi; Finnish; French; Croatian; Hungarian; Italian; Latin; Moksha; Mari; Mongolian; Erzya; Polish; Portuguese; Russian; Spanish; Castilian; Swedish; Tajik; Udmurt; Armenian; Bengali; Bangla; Hindi; Malayalam; Marathi; Marāṭhī; Kannada
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; application/x-gzip; downloadable_files_count: 1
Discipline Linguistics