Replication data for: Slangs go online, or the rise and fall of the Olbanian language

DOI

All the data were taken from the website udaff.com (the center of the padonki culture and one of the cradles of the Olbanian language), from the section kreativy ('creative stories') where users upload their own short stories. This is one of the oldest and most important sections on the website, and its name is a symbol of padonki culture. It was chosen as the largest and most diachronically representative collection of texts a) with a large number of erratic spellings; b) written by people who identify themselves as padonki, i.e."native speakers" of Olbanian. Texts were selected from 975 webpages covering the time period from January 2001 to December 2011. One text was selected randomly from each page (each page contained 50 texts), and a random fragment of 100 words was extracted for analysis. If a text was for some reason not suitable for analysis (e.g. it was shorter than 100 words), another random text was selected. This resulted in 975 100-word fragments produced by 729 authors (156 authors produced more than one text, the largest number of texts per author was nine, the mean was 1.34). No adjustment was made for the fact that some authors had more than one fragment included in the sample: while this gives their idiolect additional chances to contribute to the observed variation, that must mirror the actual situation. For every word, it was noted how many deviations from the norm it contained. All kinds of deviations were counted, and not all of them are strictly Olbanian. However, the analysis of distribution of deviations a cross different types shows that the number of indisputably non-Olbanian deviations is relatively small and constant and does not distort the general picture.

Identifier
DOI https://doi.org/10.18710/2NKJPG
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/2NKJPG
Provenance
Creator Berdicevskis, Aleksandrs; Zvereva, Vera
Publisher DataverseNO
Contributor Berdicevskis, Aleksandrs; University of Bergen; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year 2014
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Berdicevskis, Aleksandrs (UiT The Arctic University of Norway)
Representation
Resource Type corpus; Dataset
Format text/plain; charset=US-ASCII
Size 23303
Version 1.3
Discipline Humanities