CroSloEngual BERT 1.1

PID

Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by finetuning the model end-to-end. CroSloEngual BERT are neural network weights and configuration files in pytorch format (i.e. to be used with pytorch library).

Changes in version 1.1: fixed vocab.txt file, as previous verson had an error causing very bad results during fine-tuning and/or evaluation.

Identifier
PID http://hdl.handle.net/11356/1330
Related Identifier https://arxiv.org/abs/2006.07890
Related Identifier http://hdl.handle.net/11356/1317
Related Identifier http://embeddia.eu
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1330
Provenance
Creator Ulčar, Matej; Robnik-Šikonja, Marko
Publisher Faculty of Computer and Information Science, University of Ljubljana
Publication Year 2020
Funding Reference info:eu-repo/grantAgreement/EC/H2020/825153
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Croatian; Slovenian; Slovene; English
Resource Type toolService
Format application/octet-stream; text/plain; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline Linguistics