Choice of plausible alternatives dataset in Serbian COPA-SR

PID

The COPA-SR dataset (Choice of plausible alternatives in Serbian) is a translation of the English COPA dataset (https://people.ict.usc.edu/~gordon/copa.html) by following the XCOPA dataset translation methodology (https://arxiv.org/abs/2005.00333).

The dataset consists of 1,000 premises (My body cast a shadow over the grass), each given a question (What is the cause? / What happened as a result?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising).

The dataset follows the same format as the Croatian COPA-HR dataset (http://hdl.handle.net/11356/1404) and Macedonian COPA-MK dataset (http://hdl.handle.net/11356/1687). It is split into training (400 instances), validation (100 instances) and test (500 instances) JSONL files.

Translation of the dataset was performed by the ReLDI Centre Belgrade (https://reldi.spur.uzh.ch/).

Identifier
PID http://hdl.handle.net/11356/1708
Related Identifier https://www.clarin.si/info/k-centre/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1708
Provenance
Creator Ljubešić, Nikola; Starović, Mirjana; Kuzman, Taja; Samardžić, Tanja
Publisher Jožef Stefan Institute
Publication Year 2022
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Serbian
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 3
Discipline Linguistics