X-SRL Dataset and mBERT Word Aligner


This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by transferring the label into the best-aligned target word. This newly labeled data can be used to train different multilingual SOTA models to improve performance, especially for the lower-resource languages.

DOI https://doi.org/10.11588/data/HVXXIJ
Creator Daza, Angel
Publisher heiDATA
Contributor Daza, Angel
Publication Year 2021
Contact Daza, Angel (Leibniz Institute for the German Language / Department of Computational Linguistics, Heidelberg University)
Version 1.0
