The COPA-MK dataset (Choice of plausible alternatives in Macedonian) is a translation of the English COPA dataset ( by following the XCOPA dataset translation methodology (
The dataset consists of 1,000 premises (My body cast a shadow over the grass), each given a question (What is the cause? / What happened as a result?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising).
The dataset follows the same format as the Croatian COPA-HR dataset ( It is split into training (400 instances), validation (100 instances) and test (500 instances) JSONL files.
Translation quality was ensured with the help of the ReLDI Centre Belgrade.