ELMo embeddings model, Slovenian

PID

ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the training. The model can also infer OOV words, since the neural network input is on the character level.

Identifier
PID http://hdl.handle.net/11356/1257
Related Identifier http://hdl.handle.net/11356/1277
Related Identifier http://embeddia.eu
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1257
Provenance
Creator Ulčar, Matej
Publisher Faculty of Computer and Information Science, University of Ljubljana
Publication Year 2019
Funding Reference info:eu-repo/grantAgreement/EC/H2020/825153
Rights Apache License 2.0; PUB; https://opensource.org/licenses/Apache-2.0
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type toolService
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 2
Discipline Linguistics