The Sarajevo Corpus of SMS Messages in Bosnian 1.0

PID

This corpus is specialized, static (i.e., no future growth is planned), diachronic and covers the period from 2002 to 2022.

The SMS messages included in this corpus were obtained from voluntary donors (informants). Both senders and recipients of the messages included in the corpus are Bosnian speakers, exhibiting diversity in terms of age, education and occupation, place of origin and countries of long-term residence.

The Sarajevo Corpus of SMS Messages in Bosnian was originally published by University of Sarajevo – Faculty of Philosophy as an electronic book. The second phase of the work involved compiling the SMS messages into a corpus and linguistic annotation, which was done using the CLASSLA package (https://github.com/clarinsi/classla), version 2.1, with language = Serbian and type = nonstandard for tokenization, lemmatization and morpho-syntactic tagging (both MULTEXT-East and Universal Dependencies).

Identifier
PID http://hdl.handle.net/11356/1913
Related Identifier http://hdl.handle.net/11356/1956
Related Identifier https://www.ff.unsa.ba/index.php/bs/projekti-centra-za-b-h-s-jezik/18335-sarajevski-korpus-sms-poruka-na-bosanskom-jeziku
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1913
Provenance
Creator Wasserscheidt, Philipp; Bulić, Halid; Durmišević, Elma; Hodžić-Čavkić, Azra; Bajraktarević, Enisa; Ahmetspahić-Peljto, Azra; Šabić, Belmin
Publisher University of Sarajevo – Faculty of Philosophy
Publication Year 2024
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Bosnian
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics