Interference in spoken communication: Evaluating the corrupting and disrupting effects of other voices 2016-2019

DOI

The datasets comprise behavioural responses to speech stimuli. These stimuli are either simplified analogues of spoken sentence-length utterances or syllables (for datasets 1-4 and 6) or signal-processed natural syllables (for dataset 5). For the utterances, the responses are the transcriptions entered by the participant using a keyboard. For the syllables, the responses are key presses indicating the perceived identity of the initial consonant. Much of the information necessary to understand speech (acoustic-phonetic information) is carried by the changes in frequency over time of a few broad peaks in the frequency spectrum of the speech signal, known as formants. The project aims to investigate how listeners presented with mixtures of target speech and interfering formants are able to group together the appropriate formants, and to reject others, such that the speech of the talker we want to listen to can be understood. Interfering sounds can have two kinds of effect - energetic masking, in which the neural response of the ear to the target is swamped by the response to the masker, and informational masking, in which the "auditory brain" fails to separate readily detectable parts of the target from the masker. The project will explore the informational masking component of interference - often the primary factor limiting speech intelligibility - using stimulus configurations that eliminate energetic masking. The project will explore how speech-like interferers affects intelligibility, distinguishing the circumstances in which the interferer takes up some of the available perceptual processing capacity from those in which specific properties of the interferer intrude into the perception of the target speech. Our approach is to use artificial speech-like stimuli with precisely controlled properties, to accompany target speech with carefully designed interferers that offer alternative grouping possibilities, and to measure how manipulating the properties of these interferers affects listeners' abilities to recognise the target speech in the mixture. In everyday life, talking with other people is important not only for sharing knowledge and ideas, but also for maintaining a sense of belonging to a community. Most people take it for granted that they can converse with others with little or no effort. Successful communication involves understanding what is being said and being understood, but it is quite rare to hear the speech of a particular talker in isolation. Speech is typically heard in the presence of interfering sounds, which are often the voices of other talkers. The human auditory system, which is responsible for our sense of hearing, therefore faces the challenge of identifying which parts of the sounds reaching our ears have come from which talker. Solving this "auditory scene analysis" problem involves separating those sound elements arising from one source (e.g., the voice of the talker to whom you are attending) from those arising from other sources, so that the identity and meaning of the target source can be interpreted by higher-level processes in the brain. Over the course of evolution, humans have been exposed to a variety of complex listening environments, and so we are generally very successful at understanding the speech of one person in the presence of other talkers. This contrasts with attempts to develop listening machines, which often fail when confronted with adverse conditions, such as automatic transcription of a conversation in an open-plan office. Human listeners with hearing impairment often find these environments especially difficult, even when using the latest developments in hearing-aid or cochlear-implant design, and so can struggle to communicate effectively in such conditions. Much of the information necessary to understand speech (acoustic-phonetic information) is carried by the changes in frequency over time of a few broad peaks in the frequency spectrum of the speech signal, known as formants. The project aims to investigate how listeners presented with mixtures of target speech and interfering formants are able to group together the appropriate formants, and to reject others, such that the speech of the talker we want to listen to can be understood. Interfering sounds can have two kinds of effect - energetic masking, in which the neural response of the ear to the target is swamped by the response to the masker, and informational masking, in which the "auditory brain" fails to separate readily detectable parts of the target from the masker. The project will explore the informational masking component of interference - often the primary factor limiting speech intelligibility - using stimulus configurations that eliminate energetic masking. We will do so using perceptual experiments in which we measure how our ability to understand speech (e.g., the number of words reported correctly) changes under a variety of conditions. The project will examine how acoustic-phonetic information is combined across formants. It will also explore how a speech-like interferer affects intelligibility, distinguishing the circumstances in which the interferer takes up some of the available perceptual processing capacity from those in which specific properties of the interferer intrude into the perception of the target speech. Our approach is to use artificial speech-like stimuli with precisely controlled properties, to mix target speech with carefully designed interferers that offer alternative grouping possibilities, and to measure how manipulating the properties of these interferers affects listeners' abilities to recognise the target speech in the mixture. The results will improve our understanding of how human listeners separate speech from interfering sounds and the constraints on that separation, helping to refine computational models of listening. Such refinements will in turn provide ways of improving the performance of devices such as hearing aids and automatic speech recognisers when they operate in adverse listening conditions.

The datasets comprise behavioural responses to speech stimuli. These stimuli are either simplified analogues of spoken sentence-length utterances or syllables (for datasets 1-4 and 6) or signal-processed natural syllables (for dataset 5). For the utterances, the responses are the transcriptions entered by the participant using a keyboard. For the syllables, the responses are key presses indicating the perceived identity of the initial consonant. All volunteers have English as their first language.

Identifier
DOI https://doi.org/10.5255/UKDA-SN-854052
Metadata Access https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=92b0e7336318b948d6dc2b18a7c68f72bab11308075bdb58471d04ca09a3ef82
Provenance
Creator Roberts, B, Aston University; Summers, R, Aston University
Publisher UK Data Service
Publication Year 2020
Funding Reference Economic and Social Research Council
Rights Brian Roberts, Aston University. Robert J Summers, Aston University; The Data Collection is available to any user without the requirement for registration for download/access.
OpenAccess true
Representation
Resource Type Numeric
Discipline Psychology; Social and Behavioural Sciences
Spatial Coverage United Kingdom