Hamburg Corpus of Argentinean Spanish (HaCASpa)

Dataset

DOI

Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two geographic areas. For the intonational experiments, there are audio recordings only, whereas some of the free interviews and map tasks feature video recordings. The material used as stimuli in the experiments is available with references encoded in the transcriptions.

The Hamburg Corpus of Argentinean Spanish (HaCASpa) was compiled in December 2008 and November/December 2009 within the context of the research project The intonation of Spanish in Argentina (H9, director: Christoph Gabriel), part of the Collaborative Research Centre "Multilingalism", funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) and hosted by the University of Hamburg. It comprises data from two varieties of Argentinean Spanish, i.e. a) the dialect spoken in the capital of Buenos Aires (also called Porteño, derived from puerto 'harbor') and b) the variety of the Neuquén/Comahue area (Northern Patagonia). The seven parts of HaCASpa correspond to the seven tasks described below in more detail:

Five experiments were carried out in order to elicit specific data for research in prosody, with a main focus on (Task 1–5); in addition, several speakers took part in a free interview (Task 6) and a map task experiment (Task 7). The Task is encoded as a metadata attribute for each communication. HaCASpa comprises three different types of spoken data, depending on the Task, i.e. spontaneous, semi-spontaneous, and scripted speech. This information corresponds to the metadata attribute Speech type.

The regional dimension of the corpus is represented through the attribute Area (i.e. Buenos Aires or Neuquén/Comahue), its diachronic dimension through the attribute Age group (i.e. Under 25/Over 25). The subjects are 60 native speakers of the relevant variety of Argentinean Spanish, i.e. Buenos Aires (Porteño) or Nequén/Comahue Spanish. For each speaker, the following information is available: Age, Education, Occupation, Year of school enrollment, Year of school graduation and Parents' mother tongue.

The current version 0.2 contains mainly orthographic transcriptions of verbal behaviour (141,000 transcribed words) and codes that relate utterances to the materials used for the experimental tasks. Experimental design:

Task (1) consists of two subparts: reading a story (1a) and retelling it (1b). For (1a), the subjects were asked to read the short story "The North Wind and the Sun", which was presented on a computer screen, two times. The fable is well known for its use of phonetic descriptions of different languages (see Handbook of the International Phonetic Association, International Phonetic Association. Cambridge: Cambridge University Press, 2005); the Latin American version we used in our data stems from the Dialectoteca del español, (coordination: C.-E. Piñeros). For (1b), the speakers were instructed to retell the story in their own words without being able to consult the text. With the help of these two parts, data of scripted (part 1a) as well as of semi-spontaneous speech (part 1b) could be collected.

Task (2) was designed to collect data of semi-spontaneous speech by asking the subjects to answer questions pertaining to a given picture story. In a first step, the speakers were familiarized with the story, which was presented as two pictures displayed on a computer screen. In a second step, they were asked to answer specific questions about the story. The questions were also presented on the computer screen and varied in their design in order to elicit answers with different information-structural readings (such as broad vs. narrow focus or different focus types). In general, the speakers were free to answer as they wished. However, in order to avoid single word answers, they were asked to utter complete sentences.

Task (3) consisted of reading question-answer pairs, the content of which was based on the picture stories already familiar from task (2). The answers were given together with the questions on the computer screen (i.e. one question / one answer) and the speakers simply had to read both the question and the answer.

Task (4) was a reading task in which the subjects were asked to utter 10 simple subject-verb-object (SVO) sentences, presented on a computer screen. The speakers were instructed to read them at both normal and fast speech rate. Along the lines proposed in D´Imperio et al. 2005 ("Intonational Phrasing in Romance: The Role of Syntactic and Prosodic Structure", in: Prosodies: With Special Reference to Iberian Languages, ed. by Frota, S. et al., Berlin: Mouton de Gruyter, 59-97), the subject and object constituents differed in their syntactic and prosodic complexity (e.g. determiner plus noun or determiner plus noun plus adjective and one or three prosodic words, respectively). The participants were instructed to read the sentences as if they contained new information. The complete experiment design is described in Gabriel, C. et al. 2011 ("Prosodic phrasing in Porteño Spanish", in: Intonational Phrasing in Romance and Germanic: Cross-Linguistic and Bilingual Studies, ed. by Gabriel, C. & Lleó, C., Amsterdam: Benjamins, 153-182).

Task (5), the so-called intonation survey, consisted of 48 situations designed to elicit various intonational contours with specific pragmatic meanings. In this inductive method, the researcher confronts the speaker with a series of hypothetical situations to which he or she is supposed to react verbally. In the Argentinean version of the questionnaire, the hypothetical situations were illustrated by appropriate pictures. The experimental design is described in more detail in Prieto, P. & Roseano, P. 2010 (eds). Transcription of Intonation of the Spanish Language. Munich: Lincom; see also the Interactive atlas of Spanish intonation (coordination: P. Prieto & P. Roseano).

Task (6) was conducted to collect spontaneous speech data by conducting free interviews. In this task, the subjects were asked to tell the interviewer something about a past experience, be it a vacation or memories of Argentina as it was decades ago. Even though the interviewer was still part of the conversation, it was mainly the subjects who spoke during the recordings.

Task (7) consists of Map Task dialogs. Map Task is a technique employed to collect data of spontaneous speech in which two subjects cooperate to complete a specified task. It is designed to lead the subjects to produce particular interrogative patterns. Each of the two subjects receives a map of an imaginary town marked with buildings and other specific elements. A route is marked on the map of one of the two participants, who assumes the role of the instruction-giver. The version of the same map given to the other participant, who assumes the role of the instruction-follower, differs from that of the instruction-giver in that it does not show the route to be followed. The instruction-follower therefore must ask the instruction-giver questions in order to be able to reproduce the same route on his or her own map (see also the Interactive atlas of Spanish intonation).

CLARIN Metadata summary for Hamburg Corpus of Argentinean Spanish (HaCASpa) (CMDI-based)

Title: Hamburg Corpus of Argentinean Spanish (HaCASpa) Description: Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two geographic areas. For the intonational experiments, there are audio recordings only, whereas some of the free interviews and map tasks feature video recordings. The material used as stimuli in the experiments is available with references encoded in the transcriptions. Publication date: 2011-06-30 Data owner: Christoph Gabriel, Institut für Romanistik / Von-Melle-Park 6 / D-20146 Hamburg, christoph.gabriel@uni-hamburg.de Contributors: Christoph Gabriel, Institut für Romanistik / Von-Melle-Park 6 / D-20146 Hamburg, christoph.gabriel@uni-hamburg.de (compiler) Project: H9 "The intonation of Spanish in Argentina", German Research Foundation (DFG) Keywords: contact variety, cross-sectional data, regional variety, language contact, EXMARaLDA Language: Spanish (spa) Size: 63 speakers (39 female, 24 male), 259 communications, 261 recordings, 1119 minutes, 261 transcriptions, 141321 words Annotation types: transcription (manual): mainly orthographic, project-specific conventions, code: reference to underlying prompts Temporal Coverage: 2008-11-01/2009-12-01 Spatial Coverage: Buenos Aires, AR; Neuquén/Comahue, AR Genre: discourse Modality: spoken

Identifier
DOI	https://doi.org/10.25592/uhhfdm.1438
Related Identifier	https://doi.org/10.25592/uhhfdm.1437
Metadata Access	https://www.fdr.uni-hamburg.de/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:fdr.uni-hamburg.de:1438

Provenance
Creator	Gabriel, Christoph
Publisher	Universität Hamburg
Publication Year	2011
Rights	Restricted Access; info:eu-repo/semantics/restrictedAccess
OpenAccess	false

Representation
Resource Type	Dataset
Version	0.2
Discipline	Humanities; Linguistics