ELMo embeddings model, Slovenian

Dataset

PID

ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus (https://viri.cjvt.si/gigafida/System/Impressum) for 10 epochs. 1,364,064 most common tokens were provided as vocabulary during the training. The model can also infer OOV words, since the neural network input is on the character level.

Identifier
PID	http://hdl.handle.net/11356/1257
Related Identifier	http://hdl.handle.net/11356/1277
Related Identifier	http://embeddia.eu
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1257

Provenance
Creator	Ulčar, Matej
Publisher	Faculty of Computer and Information Science, University of Ljubljana
Publication Year	2019
Funding Reference	info:eu-repo/grantAgreement/EC/H2020/825153
Rights	Apache License 2.0; PUB; https://opensource.org/licenses/Apache-2.0
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	toolService
Format	text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 2
Discipline	Linguistics