Simulate Gene Expression Profiles from a linear Combination of Maotif Acitivity Scores and Various Forms of Noise

Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different, yet similar, experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities.

Here, we extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. This dataset provides the code to simulate gene expression data as a linear sum of motif influences, with different degrees of interaction assumed. The code was used in a simulation study in "Investigating the Effect of Dependence Between Conditions with Bayesian Linear Mixed Models for Motif Activity Analysis" (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231824) by Simone Lederer, Tom Heskes, Simon J van Heeringen and Cornelis A Albers. This research is presented in Chapter 4 of the PhD thesis titled "Drug-Drug Interaction Models - from Gene Expression tp Phenotype" by Simone Lederer. The code is written in the Python programming language and additionally available on GitHub (https://github.com/Sim19/SimGEXPwMotifs).

Identifier
DOI https://doi.org/10.17026/dans-xyx-3ceu
PID https://nbn-resolving.org/urn:nbn:nl:ui:13-38-hl8y
Related Identifier https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231824
Metadata Access https://easy.dans.knaw.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:easy.dans.knaw.nl:easy-dataset:197459
Provenance
Creator Lederer, SI ORCID logo; Heskes, T ORCID logo; van Heeringen, SJ ORCID logo; Albers, CA
Publisher Data Archiving and Networked Services (DANS)
Contributor Lederer, SI; SI Lederer
Publication Year 2021
Rights info:eu-repo/semantics/restrictedAccess; License: http://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf; http://dans.knaw.nl/en/about/organisation-and-policy/legal-information/DANSLicence.pdf
OpenAccess false
Representation
Language English
Resource Type Software
Format application/x-cmdi+xml
Discipline Other