This dataset is an archive of reader comments on the Ekspress Meedia news site from 2009-2019, containing approximately 31M comments, mostly in the Estonian language, with some in Russian.
Description of the Datasets.
There are 11 CSV files:
comments_2009.csv contains 2 898 438 comments from the year 2009
comments_2010.csv contains 2 377 591 comments from the year 2010
comments_2011.csv contains 2 729 389 comments from the year 2011
comments_2012.csv contains 3 372 776 comments from the year 2012
comments_2013.csv contains 3 289 393 comments from the year 2013
comments_2014.csv contains 3 195 502 comments from the year 2014
comments_2015.csv contains 3 202 592 comments from the year 2015
comments_2016.csv contains 2 848 624 comments from the year 2016
comments_2017.csv contains 2 838 075 comments from the year 2017
comments_2018.csv contains 3 194 597 comments from the year 2018
comments_2019.csv contains 1 526 755 comments from the year 2019 May
In sum: 3 1473 732 comments
Columns:
comment_id (string) - the ID of the written comment
article_id (string) - the ID of the article for which the comment was written
created_time (string) - the time and date of the comment
subject (string) - the title of the comment
reply_to_comment_id (string) - the parent comments ID
content (string) - the comment itself
is_anonymous (string) -
1 if the comment was published anonymously
0 if the comment was published by a registered user
is_enabled (string) -
1 if the comment was published (online)
0 if it wasn’t published
Questionable field: not all have been manually moderated
No additional information from the moderators
channel_language (string) - the language of the channel: 'nat' for Estonian, 'rus' for Russian
create_user_id (string) - the user ID of the commentator '0' for all blocked comments.
moderated_by (string) - the ID of the moderator