Corpus Linguistics – Sign Language - Dataset

Dataset

Corpus Linguistics – Sign Language

DOI

Sign language corpora are rather exotic within corpus linguistics as they deal with language data for languages having no written form and consequently no orthography. Much of the effort in current sign language corpus projects goes into segmentation and lemmatization as these need to be done manually while they are relatively straight- forward und in many cases automatic steps for other languages. The lack of large-scale lexical databases for sign languages implies that many decisions to be taken in these steps are preliminary and subject to later revision. Therefore, it is of utmost importance to always have access to the original data. We present our approach that takes these requirements into account and provides multiple views on the data in order to support data quality assurance even if independent double-transcription often is not an option due to the immense cost.

To illustrate the approach, we present data from the map task as used in the Dicta-Sign project that collected data from four sign languages based on the same elicitation setting. This example also demonstrates where one can expect some parallels between sign and spoken language corpora.

Identifier
DOI	https://doi.org/10.25592/uhhfdm.8356
Related Identifier	https://doi.org/10.25592/uhhfdm.8355
Metadata Access	https://www.fdr.uni-hamburg.de/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:fdr.uni-hamburg.de:8356

Provenance
Creator	Hanke, Thomas
Publisher	Universität Hamburg
Contributor	European Commission
Publication Year	2013
Funding Reference	European Commission info:eu-repo/grantAgreement/EC/FP7/231135/
Rights	Creative Commons Attribution 4.0 International; Open Access; https://creativecommons.org/licenses/by/4.0/legalcode; info:eu-repo/semantics/openAccess
OpenAccess	true

Representation
Resource Type	Presentation; Text
Discipline	Other