There are more viruses on the planet than any other biological entity. We don’t accurately have a handle on how many actually exist, but it is estimated that there are more viruses on Earth than there are stars in the universe. Suffice it to say that most viruses have not yet been discovered. Viruses truly are the “dark matter” of biology. Using computing power on very large datasets, researchers have recently been able to accelerate discoveries of unknown viruses. Today there is a convergence of advances in genetic sequencing technology, computational analysis, and virological detective work that’s leading to the discovery of a whopping number of viruses.

About a year ago, in research published in the scientific journal Cell, scientists found over 142,000 viruses (around half of which were unknown at the time) that reside inside the human gut. Last week, a research article in the journal Nature described around 132,000 RNA viruses including nine new coronaviruses. It seems that every time scientists analyse large datasets of “metagenomes” in large biological samples they find an enormous number of new viruses.

Because of the speed and efficiency of next-generation sequencing, a lot of biological samples are collected and sequenced from diverse sources. But the ability of researchers to collect data is much faster than the ability to sort through it manually. Much of this sequencing data remains in public databases.

The study on viruses in the gut published last year was led by researchers at European Molecular Biology Laboratory's Bioinformatics Institute and the Wellcome Sanger Institute. Researchers examined over 28,000 gut microbiomes collected globally and found a plethora of new viruses in healthy people.

Fortunately, these viruses don’t infect human cells, but rather infect bacteria and single-celled organisms called archaea that are present in the gut. Among these viruses are thousands of newly discovered ones that belong to a wonderfully named new category – Gubaphage.

So, in short, there are viruses that infect different microbes in the gut. Under certain conditions, some of those microbes attack intestinal cells. The gut really is a microbial zoo!

In work published last week, an international team of researchers set their sights on RNA viruses. These viruses are in the limelight for a good reason. RNA viruses include the coronavirus responsible for the Covid-19 pandemic, as well as viruses that cause influenza, polio, Ebola, hepatitis C, and the common cold. Unlike living organisms that use DNA as genetic material, these viruses rely on RNA for similar functions.

In cells, RNA is relegated to different functions. This biological difference requires most RNA viruses to encode the gene for a copier enzyme for RNA. Sars-CoV-2 has this gene. In fact, finding this gene in a sequence is a telltale sign of an RNA virus.

The team used the copier-enzyme sequence as “bait” to go fishing in a sea of genetic sequences to see how many RNA viruses they could catch. New RNA viruses were identified based on whether they possessed the copier gene and how much they varied from known viruses.

The researchers developed a cloud computing infrastructure that helped them to efficiently trawl through 20 million gigabytes of genetic data from millions of diverse biological samples. A typical supercomputer would’ve required a year, but with the support of a public-private partnership between the University of British Columbia and Amazon Web Services, they were able to finish the task in only 11 days.

One way to think of the process is like quickly sorting through materials with a magnet and then examining what sticks. In this case, the computational firepower was like having an incredibly strong magnet.

The researchers found around 132,000 RNA viruses of which only around 15,000 had been known before. Because we’re in a pandemic caused by a coronavirus, nine newly identified coronaviruses will naturally raise the most eyebrows. But their reservoir hosts are not humans, and indeed these nine viruses probably don’t infect humans at all.

And while the total number of viruses is mindbogglingly large, we must put both of these discoveries in perspective. Most viruses that enter our bodies are incompatible with our cells and unable to cause disease. Some of them infect microbes in our gut, while others pass through harmlessly.

Nonetheless, if we are to correctly assume that some viruses pose a risk to human health, we need to know what we are dealing with. These two landmark studies go a long way to filling the major knowledge gaps in understanding the diversity of viruses inside us and around us.

Anirban Mahapatra, a scientist by training, is the author of COVID-19: Separating Fact From Fiction

The views expressed are personal