The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.1

Dataset

PID

The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~95.2.

The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).

Identifier
PID	http://hdl.handle.net/11356/1349
Related Identifier	http://dx.doi.org/10.18653/v1/W19-3704
Related Identifier	http://hdl.handle.net/11356/1253
Related Identifier	http://hdl.handle.net/11356/1392
Related Identifier	https://github.com/clarinsi/classla-stanfordnlp
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1349

Provenance
Creator	Ljubešić, Nikola
Publisher	Jožef Stefan Institute
Publication Year	2020
Rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Serbian
Resource Type	toolService
Format	text/plain; charset=utf-8; application/octet-stream; application/zip; downloadable_files_count: 2
Discipline	Linguistics