Structure of lexicon in the automatic acquisition of lexical information [research data]

DOI

Research data from the thesis Structure of lexicon in the automatic acquisition of lexical information. MONO-DATASETS: contains files of the single-sense English data sets extracted for each lexical semantic class studied in the thesis (COMMUNICATION_OBJECT, EVENT, HUMAN, LOCATION and ORGANIZATION). Each data set contains a list of nouns and their class membership information; and (ii) LEXICO-SYNTACTIC-REs: contains files of the linguistic cues and corresponding lexical-syntactic patterns formalized as Regular Expressions for each of the single-sense English lexical semantic classes studied in this thesis (COMMUNICATION_OBJECT, EVENT, HUMAN, LOCATION and ORGANIZATION). This folder also contains a file of the unmarked contexts that were identified, as well as their corresponding lexical-syntactic patterns formalized as Regular Expressions.

The MONO-DATASETS folder contains 5 files: one for each lexical-semantic class studied in this thesis. Each file contains two columns. Column 1 contains a target noun and Column 2 contains the ClassID of that noun./nList of files contained in the folder:/n1) COMM_GS.txt/n2) EVENT_GS.txt/n3) HUMAN_GS.txt/n4) LOC_GS.txt/n5) ORG_GS.txt/n/nExample: /n/nNOUNCLASSID/n/n0 = not class member; 1 = class member/n/nHUMAN-CLASS/nvillager1/nvolume0/nvolunteer1/nwarranty0/n/n/nThe LEXICO-SYNTACTIC-REs folder contains 6 files./n/n5 files contain the Regular Expressions of the class indicative patterns that we have identified for each class. 1 file contains the Regular Expressions for the unmarked contexts that we identified and studied./n/nAll of the Regular Expressions have been written to match the Penn Treebank Part-of-Speech Tagset. (reference: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)/n/nTarget nouns are indicated as: /#/#N[A-Z]+//[a-z]+/#/#/ /nPattern begins after numerical marker: [1-9]+&>&/n/nList of files contained in the folder:/n1) COMM-CUES.txt/n2) EVENT-CUES.txt/n3) HUMAN-CUES.txt/n4) LOC-CUES.txt/n5) ORG-CUES.txt/n6) UNMARKED-CUES.txt

Identifier
DOI https://doi.org/10.34810/data260
Metadata Access https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data260
Provenance
Creator Romeo, Lauren Michele
Publisher CORA.Repositori de Dades de Recerca
Publication Year 2023
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Representation
Resource Type Textual data; Dataset
Format text/plain
Size 1951; 6340; 854; 3433; 5928; 3807; 5467; 3626; 3488; 2966; 3105; 2364; 1928
Version 1.0
Discipline Other