Abstract
Gene regulation interprets most variations of biological phenotype and remains a crucial topic in biology. Conventionally, manipulating gene sequences like knockout helps to infer gene regulation, but these inferences suffer several pitfalls like transcript compensation1, leading to biased results. An unbiased regulation has rarely been appreciated. Here, we develop a software, FINET2, to infer unbiased regulatory networks from massive data, including all human RNAseq data publicly available from Sequence Read Archive (SRA, 274469 samples) and The Cancer Genome Atlas (TCGA, 11574 samples), and unearth the general regulatory rules in normal genome and cancer as deposited3. Generally, the genome is positively regulated. Regulators primarily self-regulate their targets in the same annotated category, like processed-pseudogenes regulating processed-pseudogenes. At normal, ribosomal proteins drive the regulatory network, and proteins tightly control the genome and primarily regulate the remote proteins across chromosomes, but rarely regulate local targets (<1M bp), yet cancer noncoding RNAs, especially pseudogenes, strongly activate the cancer genome and induce local targets, including noncoding RNAs and proteins. As a result, the whole regulatory regime switches from a normal remote protein-controlled domain to a cancerous local noncoding RNA-activated niche. This parallels with our recent discovery from clinical data revealing noncoding RNAs as the deadliest drivers for cancer4, instead of proteins as conventionally thought. This refreshes the fundamental basis of cancer research and therapy. Our overall finding provides a systems version of the natural regulatory regime in human genome, which helps to correct the biased notions standing in current literature.