Legal CaPER Benchmark

Dataset

DOI

The Legal Case Passage Extraction and Retrieval benchmark is an information retrieval benchmark collection for court case passage retrieval. Specifically, it is a collection for evaluating Cited Case Passage Retrieval (CCPR) and contains case passages from the Austrian building regulations domain (Source: RIS). The following files are included in the dataset:

full_collection.tsv A tab separated file containing the passage texts of court cases from the building regulations domain. Column 1 contains the ID of the passage, Column 2 contains the passage text and Column 3 contains the case ID (Geschäftszahl) of the origin case of the passage.

queries.tsv A tab separated file containing the queries / topics for which relevance assessments exist in this collection. Column 1 contains the ID of the query, Column 2 contains the query passage text and Column 3 contains the case ID (Geschäftszahl) of the cited case. For the task of CCPR, it is intended that results are additionally filtered based on exact matches of the case ID. For each query, only relevance assessments exist for passages that match the case ID of column 3.

qrel.json Contains relevance assessments for each query. In this dictionary, a passage from the full collection is relevant for a query if qrel[][] == 1. If a passage ID is not in qrel[], it is not relevant. Relevance assessments only exist for full collection passages that match the case ID of the query.

qrel.json.txt A conversion of the qrel.json file to be compatible with trec eval.

Identifier
DOI	https://doi.org/10.48436/5caar-3r468
Related Identifier	IsVersionOf https://doi.org/10.48436/eyjff-bv654
Metadata Access	https://researchdata.tuwien.ac.at/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:researchdata.tuwien.ac.at:5caar-3r468

Provenance
Creator	Fink, Tobias
Publisher	TU Wien
Contributor	Ehmer, Judith; Feurer, Silvana; Piroi, Florina
Publication Year	2023
Rights	Creative Commons Attribution 4.0 International; https://creativecommons.org/licenses/by/4.0/legalcode
OpenAccess	true
Contact	tudata(at)tuwien.ac.at

Representation
Language	German
Resource Type	Dataset
Version	1.0.0
Discipline	Other