Russian state institutions full-text datasets

DOI

This is a collection of full-text datasets based on contents extracted from the websites of Russian state institutions.<br><br>All datasets do not include items published after 31 December 2023.<br><br>These datasets have been introduced in the following book chapter, which offers additional context:<br><br>&gt; Comai, Giorgio (2025, forthcoming), "Text-mining on-line sources from Russia openly", in Autocracy, Influence, War: Russian Propaganda Today, edited by Paul Goode<br><br>The name of each corpus is composed of the bare domain name, a two letter code of the main language of the contents, and the year of release of the dataset, separated by an underscore, e.g. kremlin.ru_ru_2024 for the Russian-language version of Kremlin.ru.<br><br>This release includes the following websites:<br>- Russia’s president, kremlin.ru, in English, filename: kremlin.ru_en_2024, from 1999-12-31 to 2023-12-31. Items included: 33 165<br>- Russia’s president, kremlin.ru, in Russian, filename: kremlin.ru_ru_2024, from 1999-12-31 to 2023-12-31. Items included: 45 538<br>- Russia’s MFA, mid.ru, in English, filename: mid.ru_en_2024, from 2003-01-04 to 2023-12-31. Items included: 25 943<br>- Russia’s MFA, mid.ru, in Russian, filename: mid.ru_ru_2024, from 2003-01-02 to 2023-12-31. Items included: 56 203<br>- Russia’s government, government.ru, in Russian, filename: government.ru_ru_2024, from 2006-06-22 to 2023-12-30. Items included: 17 135<br>- Russia’s government (archived version), archive.government.ru, in Russian, filename: archive.government.ru_ru_2024, from 2008-05-07 to 2013-05-21. Items included: 7 103<br>- Russia’s prime minister (archived version), archive.premier.gov.ru, in Russian, filename: archive.premier.gov.ru_ru_2024, from 2008-05-07 to 2012-05-07. Items included: 3 323<br>- Russia’s Duma, duma.gov.ru, in Russian, filename: duma.gov.ru_ru_2024, from 2006-04-05 to 2023-12-30. Items included: 29 094<br>- Russia’s Duma (transcripts), transcript.duma.gov.ru, in Russian, filename: transcript.duma.gov.ru_ru_2024, from 1994-01-11 to 2023-12-15. Items included: 6 032<br><br>File formats: compressed csv files (.csv.gz); Open Document Spreadsheets (.ods)<br><br>A web version of the documentation accompanying this release is available online:<br>https://tadadit.xyz/datasets/2024/russian_institutions_2024/<br><br>Explore through a basic web interface:<br>https://explore.tadadit.xyz/2024/ru_institutions_2024/

Identifier
DOI https://doi.org/10.20375/0000-0013-BE18-B
Metadata Access https://repository.de.dariah.eu/1.0/oaipmh/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=hdl:21.11113/0000-0013-BE18-B
Provenance
Creator Giorgio Comai
Publisher DARIAH-DE
Contributor giocomai(at)dariah.eu
Publication Year 2024
Rights Attribution: Open Data Commons Attribution License (ODC-By) v1.0; info:eu-repo/semantics/openAccess
OpenAccess true
Representation
Resource Type text/vnd.dariah.dhrep.collection+turtle; Dataset
Format text/vnd.dariah.dhrep.collection+turtle
Size 1786 Bytes
Version 2024-10-21T12:55:16.580+02:00
Discipline Humanities