BACKGROUND: Genome-wide libraries of yeast deletion strains have been used to screen for genes that drive phenotypes such as stress response. A surprising observation emerging from these studies is that the genes with the largest changes in mRNA expression during a state transition are not those that drive that transition. Here, we show that integrating gene expression data with context-independent protein interaction networks can help prioritize master regulators that drive biological phenotypes.
RESULTS: Genes essential for survival had previously been shown to exhibit high centrality in protein interaction networks. However, the set of genes that drive growth in any specific condition is highly context-dependent. We inferred regulatory networks from gene expression data and transcription factor binding motifs in Saccharomyces cerevisiae, and found that high-degree nodes in regulatory networks are enriched for transcription factors that drive the corresponding phenotypes. We then found that using a metric combining protein interaction and transcriptional networks improved the enrichment for drivers in many of the contexts we examined. We applied this principle to a dataset of gene expression in normal human fibroblasts expressing a panel of viral oncogenes. We integrated regulatory interactions inferred from this data with a database of yeast two-hybrid protein interactions and ranked 571 human transcription factors by their combined network score. The ranked list was significantly enriched in known cancer genes that could not be found by standard differential expression or enrichment analyses.
CONCLUSIONS: There has been increasing recognition that network-based approaches can provide insight into critical cellular elements that help define phenotypic state. Our analysis suggests that no one network, based on a single data type, captures the full spectrum of interactions. Greater insight can instead be gained by exploring multiple independent networks and by choosing an appropriate metric on each network. Moreover we can improve our ability to rank phenotypic drivers by combining the information from individual networks. We propose that such integrative network analysis could be used to combine clinical gene expression data with interaction databases to prioritize patient- and disease-specific therapeutic targets.