The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1

PID

This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~97.06. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).

Identifier
PID http://hdl.handle.net/11356/1312
Related Identifier https://www.aclweb.org/anthology/W19-3704/
Related Identifier http://hdl.handle.net/11356/1251
Related Identifier http://hdl.handle.net/11356/1391
Related Identifier https://github.com/clarinsi/classla-stanfordnlp
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1312
Provenance
Creator Ljubešić, Nikola
Publisher Jožef Stefan Institute
Publication Year 2020
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type toolService
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 2
Discipline Linguistics