Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6

PID

This Neural Machine Translation model for Slovene-English language pair was trained following the NVIDIA NeMo NMT AAYN recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation/machine_translation.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for translating text written in Slovene language to English and vice versa.

The training corpus was built from publicly available datasets, including Parallel corpus EN-SL RSDO4 1.0 (https://www.clarin.si/repository/xmlui/handle/11356/1457), as well as a small portion of proprietary data. In total the training corpus consisted of 32.638.758 translation pairs and the validation corpus consisted of 8.163 translation pairs. The model was trained on 64GPUs and on the validation corpus reached a SacreBleu score of 48.3191 (at epoch 37) for translation from Slovene to English and a SacreBleu score of 53.8191 (at epoch 47) for translation from English to Slovene.

Identifier
PID http://hdl.handle.net/11356/1736
Related Identifier https://github.com/clarinsi/Slovene_NMT
Related Identifier https://rsdo.slovenscina.eu/en/machine-translation
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1736
Provenance
Creator Lebar Bajec, Iztok; Repar, Andraž; Demšar, Jure; Bajec, Žan; Rizvič, Mitja; Kumperščak, Borut; Bajec, Marko
Publisher Faculty of Computer and Information Science, University of Ljubljana
Publication Year 2022
Rights Apache License 2.0; https://opensource.org/licenses/Apache-2.0; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene; English
Resource Type toolService
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 2
Discipline Linguistics