A harmonised testsuite for social media POS tagging (DE)

DOI

A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information. The original data comes from 3 different sources: a twitter dataset with 21,181 tokens, and two datasets from the Empirist shared task 2015: web data (12,718 tokens) and computer-mediated communication (10,505 tokens).

Identifier
DOI https://doi.org/10.11588/data/KXLMHN
Metadata Access https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/data/KXLMHN
Provenance
Creator Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor
Publisher heiDATA
Contributor Rehbein, Ines
Publication Year 2020
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Rehbein, Ines (Leibniz Institute for the German Language)
Representation
Resource Type archived tab-separated format (CoNLL-U); Dataset
Format application/octet-stream
Size 3072878
Version 1.0
Discipline Humanities
Spatial Coverage Leibniz Institute for the German Language