Replication data for: Slangs go online, or the rise and fall of the Olbanian language

Dataset

DOI

All the data were taken from the website udaff.com (the center of the padonki culture and one of the cradles of the Olbanian language), from the section kreativy ('creative stories') where users upload their own short stories. This is one of the oldest and most important sections on the website, and its name is a symbol of padonki culture. It was chosen as the largest and most diachronically representative collection of texts a) with a large number of erratic spellings; b) written by people who identify themselves as padonki, i.e."native speakers" of Olbanian. Texts were selected from 975 webpages covering the time period from January 2001 to December 2011. One text was selected randomly from each page (each page contained 50 texts), and a random fragment of 100 words was extracted for analysis. If a text was for some reason not suitable for analysis (e.g. it was shorter than 100 words), another random text was selected. This resulted in 975 100-word fragments produced by 729 authors (156 authors produced more than one text, the largest number of texts per author was nine, the mean was 1.34). No adjustment was made for the fact that some authors had more than one fragment included in the sample: while this gives their idiolect additional chances to contribute to the observed variation, that must mirror the actual situation. For every word, it was noted how many deviations from the norm it contained. All kinds of deviations were counted, and not all of them are strictly Olbanian. However, the analysis of distribution of deviations a cross different types shows that the number of indisputably non-Olbanian deviations is relatively small and constant and does not distort the general picture.

Identifier
DOI	https://doi.org/10.18710/2NKJPG
Metadata Access	https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/2NKJPG

Provenance
Creator	Berdicevskis, Aleksandrs; Zvereva, Vera
Publisher	DataverseNO
Contributor	Berdicevskis, Aleksandrs; University of Bergen; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year	2014
Rights	CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess	true
Contact	Berdicevskis, Aleksandrs (UiT The Arctic University of Norway)

Representation
Resource Type	corpus; Dataset
Format	text/plain; charset=US-ASCII
Size	23303
Version	1.3
Discipline	Humanities