CroSloEngual BERT 1.1

Dataset

PID

Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by finetuning the model end-to-end. CroSloEngual BERT are neural network weights and configuration files in pytorch format (i.e. to be used with pytorch library).

Changes in version 1.1: fixed vocab.txt file, as previous verson had an error causing very bad results during fine-tuning and/or evaluation.

Identifier
PID	http://hdl.handle.net/11356/1330
Related Identifier	https://arxiv.org/abs/2006.07890
Related Identifier	http://hdl.handle.net/11356/1317
Related Identifier	http://embeddia.eu
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1330

Provenance
Creator	Ulčar, Matej; Robnik-Šikonja, Marko
Publisher	Faculty of Computer and Information Science, University of Ljubljana
Publication Year	2020
Funding Reference	info:eu-repo/grantAgreement/EC/H2020/825153
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Croatian; Slovenian; Slovene; English
Resource Type	toolService
Format	application/octet-stream; text/plain; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline	Linguistics