Background data for: Some obstacles to replication in corpus linguistics

DOI

This dataset contains tabular files recording occurrences and frequencies of modal verbs in the Brown family corpora; nine modal verbs (can, could, may, might, must, shall, should, will, would) and six corpora are considered (Brown, LOB, Frown, FLOB, BE06, AmE06). Tokens were retrieved using the CQPweb interface provided by the University of Lancaster, and the tables include information on several text-level variables (text length, broad genre, text category, corpus, time period, variety). The data are provided in two formats: (i) in case form, where each token (77,872 in total) is listed separately, including information on the context of occurrence (10 words to the left and 10 to the right); and (ii) in frequency form, which aggregates occurrences by providing information on how often each modal verb appears in every text, thus including one row per text-modal combination (27,000 in total: 6 corpora x 500 texts x 9 modals).

CQPweb, 3.3.18

R, 4.2.1

Identifier
DOI https://doi.org/10.18710/7LNWJX
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/7LNWJX
Provenance
Creator Sönning, Lukas (ORCID: 0000-0002-2705-395X)
Publisher DataverseNO
Contributor Sönning, Lukas; University of Bamberg; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year 2024
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Sönning, Lukas (University of Bamberg)
Representation
Resource Type corpus data; Dataset
Format text/plain; text/tsv; application/octet-stream
Size 27320; 13334484; 1517836; 51468
Version 1.0
Discipline Design; Fine Arts, Music, Theatre and Media Studies; Humanities; Linguistics
Spatial Coverage Bamberg, Germany