Assuming identities online: Experimental chatlogs

Dataset

DOI

Research taking a computational approach to the analysis of online communications has thus far focused overwhelmingly on the structural elements of Computer Mediated Discourse (CMD), such as typography, orthography and other low level features, with little to no attention being paid to the socially situated discourses in which these features are embedded. The Centre for Forensic Linguistics (CFL) - a research centre within Aston University combining leading-edge research and investigative forensic practice - and Lexegesys - a consultancy and technology company specialising in developing and implementing data analysis solutions, recently collaborated on a project that was successful in automating the process of identification and extraction of low-level features for the purposes of attributing authorship of unknown texts within the context of Twitter. Yet CMD has widely been recognized to operate on a number of linguistic levels, such as those of meaning, of interaction, and of social practice. Outside of the computational linguistic field, the characteristic features of CMD are understood as resources that users draw on in the construction of identities in particular contexts, and CMD constitutes social practice in and of itself rather than simply being shaped by social variables. This data collection consists of transcripts of Instant Messaging conversations between a 'Judge' and an 'Interlocutor', the latter being replaced at some point by an 'Impersonator'. 3 x 15 minute chats per file, representing 3 conditions of preparation for the Impersonator in each case - No Preparation, Over the Shoulder preparation, and Homework preparation. The transcripts correspond to postgraduate students (files 1-12) and undergraduate students (files 13-30). Judges were asked to record when they thought a switch had taken place, what linguistic criteria led them to think this, and how confident they were in their decision. Information on when switches actually occurred was also collected, and cross referenced with these judgements. Preventive policing of serious crime sometimes involves deception and disguise. A case in point is the prevention of abuse arising from paedophile grooming and peer to peer networks where abuse images of children are discussed and exchanged. The preventive techniques by police investigators include assuming identities of existing community members, and of children, so that interventions and arrests can be made. Often, there are tight time constraints associated with this process - investigators have only a small window in which to learn and assume the identity in question before arousing suspicion in their target(s). The training that undercover online investigators currently receive, although broadly informed by linguistic theory, is in need of development. Furthermore, the time constraints mean that a semi-automated system to assist in identity assumption would represent a crucial contribution to the investigative toolkit. Taking an inductive approach, which is to say that the phenomena of interest, rather than a specific theoretical paradigm, are primary, this research aims to bridge the gap between complex theories of the discursive construction of online identities on the one hand, and computational approaches to analysing online communications on the other. A small scale study CFL and Lexegesys are currently engaged in is addressing the challenges of automation at the pragmatic and interactional levels, working towards the semi-automated identification of phenomena such as indirect speech acts and topic management. The work is extremely practical and is informed by real-world police investigations. A partner in the project, the West Midlands Police, Technical Intelligence Development Unit is crucially committed to providing data and operational insights. In addition to empirical applied linguistics, the project conducts proof-of-concept work for software that will assist in an ethical use of assumed identities in policing. Furthermore, it will involve an assessment of the ethical and policy implications for policing and security of complexity in online identity performance.

These chatlogs were collected from Undergraduate and Postgraduate research participants, engaging in Instant Messaging under experimental conditions. The participants were selected via convenience sampling. The postgraduate group were following an MA in Forensic Linguistics, while the Undergraduate group was comprised of first and second years studying either BSc English Language Single Honours, or studying English Language as part of a Joint Honours programme.

Identifier
DOI	https://doi.org/10.5255/UKDA-SN-852099
Metadata Access	https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=edfe018d28e451e1d8059a1a07e243a88d1f852fac1d7d71224c7e859359400d

Provenance
Creator	Grant, T, Aston University
Publisher	UK Data Service
Publication Year	2016
Funding Reference	Economic and Social Research Council
Rights	Timothy David Grant, Aston University
OpenAccess	true

Representation
Language	English
Resource Type	Text
Discipline	Jurisprudence; Law; Social and Behavioural Sciences
Spatial Coverage	Aston University; United Kingdom