1 Department of Mathematics and Computer Science (IMADA), Faculty of Science, SDU2 Computer Science, Department of Mathematics and Computer Science (IMADA), Faculty of Science, SDU3 unknown4 Department of Mathematics and Computer Science (IMADA), Faculty of Science, SDU
The NCBI recently announced the availability of whole genome sequences for more than one thousand species. And the number of sequenced individual organisms is growing. Ongoing improvement of DNA sequencing technology will further contribute to this, enabling large-scale evolution and population genetics studies. However, the availability of sequence information is only the first step in understanding how cells survive, reproduce and adjust their behavior. The genetic control behind organized development and adaptation of complex organisms still remains widely undetermined. One major molecular control mechanism is transcriptional gene regulation. The direct juxtaposition of the total number of sequenced species to the handful of model organisms with known regulations is surprising. Here, we investigate how little we even know about these model organisms. We aim to predict the sizes of the whole-organism regulatory networks of seven species. In particular, we provide statistical lower bounds for the expected number of regulations. For Escherichia coli we estimate at most 37% of the expected gene regulatory interactions to be already discovered, 24% for Bacillus subtilis, and <3% for human respectively. We conclude that even for our best-researched model organisms we still lack substantial understanding of fundamental molecular control mechanisms, at least on a large scale.
I E E E - a C M Transactions on Computational Biology and Bioinformatics, 2012, Vol 9, Issue 5, p. 1293-1300