Research data from the thesis Structure of lexicon in the automatic acquisition of lexical information. MONO-DATASETS: contains files of the single-sense English data sets extracted for each lexical semantic class studied in the thesis (COMMUNICATION_OBJECT, EVENT, HUMAN, LOCATION and ORGANIZATION). Each data set contains a list of nouns and their class membership information; and (ii) LEXICO-SYNTACTIC-REs: contains files of the linguistic cues and corresponding lexical-syntactic patterns formalized as Regular Expressions for each of the single-sense English lexical semantic classes studied in this thesis (COMMUNICATION_OBJECT, EVENT, HUMAN, LOCATION and ORGANIZATION). This folder also contains a file of the unmarked contexts that were identified, as well as their corresponding lexical-syntactic patterns formalized as Regular Expressions.
The MONO-DATASETS folder contains 5 files: one for each lexical-semantic class studied in this thesis. Each file contains two columns. Column 1 contains a target noun and Column 2 contains the ClassID of that noun./nList of files contained in the folder:/n1) COMM_GS.txt/n2) EVENT_GS.txt/n3) HUMAN_GS.txt/n4) LOC_GS.txt/n5) ORG_GS.txt/n/nExample: /n/nNOUNCLASSID/n/n0 = not class member; 1 = class member/n/nHUMAN-CLASS/nvillager1/nvolume0/nvolunteer1/nwarranty0/n/n/nThe LEXICO-SYNTACTIC-REs folder contains 6 files./n/n5 files contain the Regular Expressions of the class indicative patterns that we have identified for each class. 1 file contains the Regular Expressions for the unmarked contexts that we identified and studied./n/nAll of the Regular Expressions have been written to match the Penn Treebank Part-of-Speech Tagset. (reference: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)/n/nTarget nouns are indicated as: /#/#N[A-Z]+//[a-z]+/#/#/ /nPattern begins after numerical marker: [1-9]+&>&/n/nList of files contained in the folder:/n1) COMM-CUES.txt/n2) EVENT-CUES.txt/n3) HUMAN-CUES.txt/n4) LOC-CUES.txt/n5) ORG-CUES.txt/n6) UNMARKED-CUES.txt