The Legal Case Passage Extraction and Retrieval benchmark is an information retrieval benchmark collection for court case passage retrieval. Specifically, it is a collection for evaluating Cited Case Passage Retrieval (CCPR) and contains case passages from the Austrian building regulations domain (Source: RIS). The following files are included in the dataset:
full_collection.tsv
A tab separated file containing the passage texts of court cases from the building regulations domain. Column 1 contains the ID of the passage, Column 2 contains the passage text and Column 3 contains the case ID (Geschäftszahl) of the origin case of the passage.
queries.tsv
A tab separated file containing the queries / topics for which relevance assessments exist in this collection. Column 1 contains the ID of the query, Column 2 contains the query passage text and Column 3 contains the case ID (Geschäftszahl) of the cited case. For the task of CCPR, it is intended that results are additionally filtered based on exact matches of the case ID. For each query, only relevance assessments exist for passages that match the case ID of column 3.
qrel.json
Contains relevance assessments for each query. In this dictionary, a passage from the full collection is relevant for a query if qrel[][] == 1. If a passage ID is not in qrel[], it is not relevant. Relevance assessments only exist for full collection passages that match the case ID of the query.
qrel.json.txt
A conversion of the qrel.json file to be compatible with trec eval.