Nottinghamer Korpus Deutscher YouTube-Sprache (The NottDeuYTSch Corpus)

PID

The NottDeuYTSch corpus contains over 33 million words taken from approximately 3 million YouTube comments from videos published between 2008 to 2018 targeted at a young, German-speaking demographic and represents an authentic language snapshot of young German speakers. The corpus was proportionally sampled based on video category and year from a database of 112 popular German-speaking YouTube channels in the DACH region for optimal representativeness and balance and contains a considerable amount of associated metadata for each comment that enable further longitudinal cross-sectional analyses.

Identifier
PID http://hdl.handle.net/11372/LRT-4779
Related Identifier http://hdl.handle.net/11372/LRT-4806
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-4779
Provenance
Creator Cotgrove, Louis Alexander
Publisher University of Nottingham
Publication Year 2018
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language German; English; Russian; Turkish
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/zip; downloadable_files_count: 2
Discipline Linguistics