Word embeddings CLARIN.SI-embed.sr 1.0

PID

CLARIN.SI-embed.sr contains word embeddings induced from the srWaC web corpus. The embeddings are based on the skip-gram model of fastText trained on 554,606,544 tokens of running text for (1) 881,150 lowercased surface forms (e.g., "srbije") and (2) 599,416 lowercased lemmas with added part-of-speech information (e.g., "srbija#Np").

Identifier
PID http://hdl.handle.net/11356/1206
Related Identifier http://hdl.handle.net/11356/1789
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1206
Provenance
Creator Ljubešić, Nikola
Publisher Jožef Stefan Institute
Publication Year 2018
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Serbian
Resource Type lexicalConceptualResource
Format application/octet-stream; application/gzip; text/plain; charset=utf-8; downloadable_files_count: 4
Discipline Linguistics