Scientifically Speaking | 20 years later, what we know about the human genome
Twenty years ago, the first draft of the human genome was published in Nature and Science. The Human Genome Project was the most costly and ambitious biological enterprise in history. Astoundingly, it came in under the budget of $3 billion allocated by the United States Congress in 1990.
In 2003, a more complete genome was publicly released, but gaps remained. At the time, eight per cent of the most challenging and repetitive parts of the 3.057 billion chemical letters of DNA that make up the human genome remained unmapped. Those challenging gaps were finally sequenced and posted this May to much less fanfare.
In the early days of the Human Genome Project, DNA sequences were handwritten in notebook pages and faxed between groups. Keeping pace with early history of the Internet, the Human Genome Project sparked collaboration, open sharing of data, and made bioinformatics mainstream to all biologists.
However, the race to sequence the human genome didn’t simply catalyse the creation of the infrastructure and tools needed to handle large amounts of data. It also accelerated the development of new fields such as genomics, systems biology, and computational biology. Today, sequencing genomes is a million times cheaper than it was two decades ago. Consequently, millions of people have had their genomes sequenced.
Genomes have helped in ancestry analyses and in identifying risk factors for diseases. With faster and cheaper DNA sequencing, we have entered the era of personalised medicine, which allows for individualised therapies that target molecular signatures of diseases that vary from person to person. Next-generation sequencing has also allowed us to design molecular vaccines, and to track mutations in viruses and variants during the current pandemic.
Genes are the functional units of the genome that contain instructions for how to make proteins. Scientists initially thought that the human genome would contain 50,000 to 100,000 genes. It came as a surprise to us that our genomes are nowhere close to being the largest, nor do they contain the most genes. With a genome of 43 billion base pairs (14 times larger than the human genome), the Australian lungfish — an air-breathing distant relative of the first fish that walked on land 380 million years ago — holds the distinction of the largest animal genome sequenced.
Humans have around 20,000 to 30,000 gene (depending on how a gene is actually defined) and each gene gives rise to three proteins on average. But even at the higher end of the range, this means that genes make up only one per cent of our genomes.
We made pivotal discoveries in the first decade of the draft human genome. We know that the 99% of the genome that doesn’t code for genes is not fluff. Parts of it act as dials controlling the activity of genes. We also know there are switches that aren’t embedded in DNA which can respond to environmental signals to change the fate of cells. This extra layer on top of our genetics is spawning research in the field of epigenetics.
But in my view, the biggest development in genomics came in the second decade of the century from outside the field, with the discovery of the tools to edit the genome itself. This genome editing technology, called CRISPR, which won its discoverers the Nobel Prize in Chemistry last year, allows us to edit any part of the human genome.
Earlier this year, the New England Journal of Medicine published landmark research on two patients who received CRISPR gene-editing based therapy for sickle-cell disease and beta thalassemia. Both patients seem to have been cured of these severely debilitating genetic disorders, a truly monumental breakthrough. Doctors removed stem cells from bone marrow and edited a faulty gene using CRISPR. Billions of gene-edited cells were introduced into patients’ bodies.
On Saturday, the same journal published interim results on the treatment of amyloidosis with CRISPR. With the ability to edit genes inside the body directly, we have entered the genome editing era.
We know we can edit human genomes. But we do not know enough about the effects of making gene changes for complex diseases. Most diseases are not like sickle cell disease and beta thalassemia: they do not have a clear relationship between one gene and its effects. Instead, most diseases progress through the effects of multiple genes and environmental factors.
Right now, much of the medical applications of genomics are geared to genes of known function. But there are many genes for which we don’t know function yet. An ambitious goal for the next decade would be to find out what the remaining genes actually do. An even more ambitious (and likely unachievable) goal would be to map the network of how genes interact with one another and with the rest of our cells.
In addition, some of the unbridled optimism that many diseases would be cured easily once the genome had been sequenced is gone, firmly replaced by the understanding that human biology is more complex and messy than we had realised back then.
Finally, we cannot lose sight of concerns in science that correlate with inequities in broader society. Who benefits from discoveries made from genomics? People of African ancestry, for example, are the most genetically diverse people on the planet. The rest of us are descendants of small populations that survived the journey out of Africa around 60,000 years ago. Yet people of African ancestry are underrepresented in genomic databases, which contain a disproportionate number of sequenced genomes of people of European ancestry. Just as vaccines are a common resource for all humans, so should genomes be.
Anirban Mahapatra, a microbiologist by training, is the author of COVID-19: Separating Fact From Fiction.
The views expressed are personal
Please sign in to continue reading
- Get access to exclusive articles, newsletters, alerts and recommendations
- Read, share and save articles of enduring value