Did you know that our DNA has many copies of some gene sequences, and that it is important? Not long ago and perhaps even now, neither did many genetic scientists. It’s been known in a general way that there is a lot of redundancy in the human genome. It’s even been called ‘a repetitive landscape.’ Some biologists considered the repetition either superfluous or sort of a ‘backup supply’ of DNA. Some of the repetition could be ‘extra DNA’, but new research is indicating that (one) there are many more sequence copies than suspected and (two) they have a variety of roles, some of them very significant.
Part of the problem for scientists was the difficulty in locating and counting duplicated copies of DNA sequences. In fact, the more a sequence is duplicated, the harder it is to distinguish all the copies. While analytical machinery for the human genome has vastly improved in little more than two decades, the problem of multicopy genes persisted. There are roughly 3 billion DNA base pairs (the bonded nucleotides containing the bases adenine, guanine, thymine, and cytosine). Searching for repeated sequences in this haystack isn’t like looking for a needle. It’s like looking through the haystack for straws which are the same but slightly different from other straws.
As part of the massive 1000 Genomes Project researchers at the University of Washington (Seattle, USA) teamed with Agilent Technologies (genome assay equipment) to develop a battery of new genome sequencing and computational techniques that greatly increase the ability to isolate and identify duplicate genetic sequences. The results have been eye-opening.
Reported in Science magazine, October 29, 2010 [Diversity of Human Copy Number Variation and Multicopy Genes] it appears that most of our genes come with two copies standard. About 7 to 9 percent of human genes have several or no copies (copy number variations). Of these, roughly 80 percent have between 0 and 5 copies. However, this study identified 56 gene families that have an ‘extreme’ number of copies ranging from 5 to 368.
“These genes were dramatically enriched for segmental duplication,” the researchers noted. Segmental duplications are regions that were originally identified in the Human Genome Project as long, repeated blocks of the genome.
The researchers report discovering about 44 “hidden” members of duplicated gene families never before identified in the reference model of the human genome.
While duplications of segments of the genome appear to have led to many of the qualities that distinguish human beings from other primate species, areas of the genome in which duplications promote recurrent rearrangements have also been associated with debilitating diseases like intellectual disability, schizophrenia and autism.
These are sweeping associations – “…distinguish human beings from other primate species…”, “…associated with debilitating diseases….” This is actually like saying, “Look, we found more copy number variations than previously thought. Some of them are from gene families associated with both important positive and negative genetic effects. We think this is significant…very significant, but there’s still a lot of work to be done.”
They could have added, “We have little or no idea how these associations work out at the molecular level.” However, the researchers did say there are 28 large regions of the human genome that are so extraordinarily complex that they can’t interpret them. That is, they are areas containing copy sequences so intricate, they are beyond the sensitivity of current equipment and techniques.
The study of DNA and genetics is beginning to resemble particle physics. Scientists continually find new layers of organization and ever more detailed relationships. However, particle physicists can’t lay claim to be working on a cure for cancer.