More study of the human genome has revealed the existence of a massive library of digitally coded information in the ‘junk DNA’ left over from millions of years of evolution! Although touched on in another post, enough cannot be said of the discoveries being made by the ENCODE project of molecular studies of the genome – the chemical blueprint for life. So startling is the discovery that most workers on the ENCODE project believe that the entire human genome has biological function. ENCODE stands for the encyclopedia of DNA Elements. The 1% of well-known molecular information codes for the 20,000 protein genes that give rise to the 100,000 proteins that do the biological work of cellular life. This is only a fraction of the information required by the cell to manage metabolism, cell signalling, development and differentiation. The expression of those proteins and the control of their function, their concentrations and activities at any moment of time in the cell is managed by the information coded in the DNA molecule. This is amazing to biologist who had gotten used to the idea of that protein genes were all there was and the rest of the 99% of DNA was evolutionary left overs or “Junk”. Evolutionists claimed that this fit the evolutionary paradigm very well. It was expected and even predictable after millions of years of mutation and natural selection going on.
In a brief description of these new findings, Brendan Maher wrote in the journal Nature:
“Ewan Birney would like to create a printout of all the genomic data that he and his collaborators have been collecting for the past five years as part of ENCODE, the Encyclopedia of DNA Elements. Finding a place to put it would be a challenge, however. Even if it contained 1,000 base pairs per square centimetre, the printout would stretch 16 metres high and at least 30 kilometres long.”
We are not told how much overlap exists in the total of the genomic data but what he describes here is a wall of paper, 50 feet high, 8.5 inches thick that runs for 18.6 miles. If the paper wall were 11 feet wide and 6 feet tall it would run the same distance. This is a lot of paper but more thrilling, this is a lot of information. The blueprints for living keep growing larger.
“ENCODE was designed to pick up where the Human Genome Project left off. Although that massive effort revealed the blueprint of human biology, it quickly became clear that the instruction manual for reading the blueprint was sketchy at best. Researchers could identify
in its 3 billion letters many of the regions that code for proteins, but those make up little more than 1% of the genome, contained in around 20,000 genes — a few familiar objects in an otherwise stark and unrecognizable landscape. Many biologists suspected that the information responsible for the wondrous complexity of humans lay somewhere in the ‘deserts’ between the genes. ENCODE, which started in 2003, is a massive data-collection effort designed to populate this terrain. The aim is to catalogue the ‘functional’ DNA sequences that lurk there, learn when and in which cells they are active and trace their effects on how the genome is packaged, regulated and read.
After an initial pilot phase, ENCODE scientists started applying their methods to the entire genome in 2007. Now that phase has come to a close, signalled by the publication of 30
papers, in Nature, Genome Research and Genome Biology. The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes (see page 57)1. But the job is far from done, says Birney, a computational biologist at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, who coordinated the data analysis for ENCODE. He says that some of the mapping efforts are about halfway to completion, and that deeper characterization of everything the genome is doing is probably only 10% finished. A third phase, now getting under way, will fill out the human instruction manual and provide much more detail.
Yet some researchers wonder at what point enough will be enough. “I don’t see the runaway train stopping soon,” says Chris Ponting, a computational biologist at the University of Oxford, UK. Although Ponting is supportive of the project’s goals, he does question whether
some aspects of ENCODE will provide a return on the investment, which is estimated to have exceeded US$185 million. But Job Dekker, an ENCODE group leader at the University of Massachusetts Medical School in Worcester, says that realizing ENCODE’s potential will require some patience. “It sometimes takes you a long time to know how much can you learn from any given data set,” he says.
The 32 groups, including more than 440 scientists, focused on 24 standard types of experiments (see ‘Making a genome manual’). They isolated and sequenced the RNA transcribed from the genome, and identified the DNA binding sites for about 120 transcription factors. They mapped the regions of the genome that were carpeted by methyl chemical groups, which generally indicate areas in which genes are silent. They examined patterns of chemical modifications made to histone proteins, which help to package DNA
into chromosomes and can signal regions where gene expression is boosted or suppressed. And even though the genome is the same in most human cells, how it is used is not. So the teams did these experiments on multiple cell types — at least 147 — resulting in the 1,648 experiments that ENCODE reports on this week1, 4–8.”
Obviously this discovery is a big deal. But take a moment and think of what is revealed here. The human genome alone, not to mention the 8 million other genomes on the planet, possesses volumes of information not just to make proteins but to coordinate development, growth, and the cellular differentiation and maintenance of nearly 300 cell types in the human body, let alone other body types. Literally hundreds of thousands of functional DNA elements now populate what was once thought to be useless junk. This library of millions of unique genetic regulatory sequences must also have a network of interacting protein factors that determine the range and accessibility of genomic data for each cell type, tissue type and organ system. Different combinations of transcription factors must regulate specific tracts of DNA among the 46 chromosomes for each cell type. Like a code within a code within a code, proteins have no physiological meaning without being produced in the context of specified metabolic pathways and these are controlled by the regulatory sequences of RNA, which are themselves subject to the correct combination of protein transcription factors. Who would have imagined that the blueprints for living would be so complex, so complete and so incredibly dense with layers of information?
This new data is a boost for opportunities to more completely understand the causes of disease. The article goes on to say:
The data, which have been released throughout the project, are already helping researchers to make sense of disease genetics. Since 2005, genome-wide association studies (GWAS) have spat out thousands of points on the genome in which a single-letter difference, or variant, seems to be associated with disease risk. But almost 90% of these variants fall outside protein-coding genes, so researchers have little clue as to how they might cause or influence disease.
So massive is the amount of information in the genome that real experiments to isolate all the functional entities of the genome are going to fall by the wayside as science relies more heavily on imputation – ascribing functions to recognizable information. Otherwise there will be no end to the amount of research that is needed to discern the meaning of the information encoded by the DNA molecule. Researchers realize that the work would be endless if a transition to intelligent inferences is not adopted. The article complains:
But unlike the genome project, which had a clear endpoint, critics say that ENCODE could
continue to expand and is essentially unfinishable. (None of the scientists would comment on the record, however, for fear that it would affect their funding or that of their postdocs and graduate students.) … After all, says Gerstein, it took more than half a century to get from the realization that DNA is the hereditary material of life to the sequence of the human genome. “You could almost imagine that the scientific programme for the next century is really understanding that sequence.”
Given that the current paradigm for the development of genetic information is molecular evolution, scientists will need to address the reality of both the time limitations and the mode of creative operation for natural selection to have brought about the codes for life. No doubt no funded scientist will
admit this on record either, “for fear that it would affect their funding..!” Randomness is not an answer, it is an excuse for ignoring what is so very apparent in these new data. Life had its origin in something living and that living thing was creative and intelligent beyond our imaginations. It has been a long time in coming but we can go no deeper than the facts; life has been precisely designed. Maybe the next big question is. “for what purpose?”