A virus infected a microbe on a Tibetan glacier 40,000 years ago, then froze in place. When scientists thawed the ice in 2024, they sequenced that viral genome along with 1,704 others, and most bore no resemblance to known viruses. These ancient genetic archives are just one example of how metagenomics is reshaping our understanding of the viral world.

Metagenomics is trending

The term metagenomics is trending in virology. But what is metagenomics and how is it changing the way scientists view the viral world? In this blog, we’ll investigate how metagenomics is transforming our understanding of viral diversity in all corners of the planet, from Arctic ice cores to human gut ecosystems.

What is metagenomics?

Metagenomics is the study of genetic material recovered directly from environmental samples (such as soil, water, sewage, or clinical specimens) without the need to isolate or culture organisms. Instead of focusing on a single species, metagenomics sequences all the DNA (and/or RNA) present, providing an unbiased view of entire microbial communities.

Finding viruses we weren’t looking for

Traditional virology relies on isolating viruses in culture or detecting them via PCR or serological tests, methods that rely on prior knowledge of the virus or its host. Culturing requires suitable host cells and often misses viruses that cannot replicate in standard lab conditions. PCR and serology depend on known genetic sequences or antigens, making them blind to novel or highly divergent viruses.

Enter shotgun metagenomic sequencing, an unbiased method that identifies all genetic material in a sample without the use of specific primers or culture conditions. The untargeted and comprehensive approach can detect both known and unknown viruses, allowing the discovery of entirely new viral families that traditional techniques would never identify. Only by analysing all the genetic material in a range of environments can the true biodiversity of our world be uncovered.

Tracking outbreaks in real time

Metagenomics has become a powerful tool for identifying emerging pathogens and tracking their spread, transforming approaches to infectious disease diagnosis and treatment. Unlike targeted tests, metagenomic sequencing can detect both known and novel pathogens directly from clinical samples, providing rapid insights during outbreaks. A clear example is COVID-19. Sequencing clinical samples from early patients revealed a novel coronavirus (SARS-CoV-2) without prior knowledge of the virus, and the same approach was later used to monitor its mutations and global transmission.

Beyond COVID-19, metagenomics has illuminated the complexity of viral encephalitis cases, where unbiased sequencing has identified rare pathogens such as astroviruses and novel herpesviruses that standard PCR panels missed. These capabilities accelerate pathogen discovery and inform real-time surveillance and therapeutic strategies. The role of metagenomics as a cornerstone in modern infectious disease management is increasingly clear.

Viruses everywhere: deep vents to human guts

Metagenomic research has revealed viruses in some of the most inhospitable environments on Earth. Hydrothermal vents host viruses adapted to high temperatures and pressures, many of which propagate through lysogeny and carry genes that support sulfur metabolism. Brine pools in the Red Sea harbour highly specialized viral communities that reflect the unique chemistry of these stratified ecosystems. Even ancient environments yield surprises. Metagenomic analysis of Arctic ice cores has revealed viral genomes tens of thousands of years old, including many with no close relatives in modern databases. These findings expand our view of historical viral diversity and raise questions about the potential for ancient viruses to re-emerge as ice melts under climate change. Closer to home, metagenomics is reshaping our understanding of human-associated viromes: the gut, skin, and wastewater have all been probed, leading to discoveries like crAssphage and driving new approaches to pathogen surveillance.

crAssphage: abundant but invisible

The bacterial virus crAssphage was identified by metagenomics in 2014. Researchers assembled its genome from multiple human fecal metagenomes, revealing a 97 kb circular sequence that was unlike any previously known phage. Of its 80 predicted proteins, fewer than half had even distant similarity to known sequences, and only a handful could be assigned clear functions, such as structural components or helicases. What made crAssphage astonishing was its abundance – it appeared across hundreds of human gut metagenomes and, on average, was more common than all other known phages combined. Later analyses predicted its bacterial host to be within the Bacteroides genus, a dominant member of the gut microbiome. The finding highlights how metagenomics can uncover highly prevalent yet completely overlooked viruses that traditional methods fail to detect.

Viral dark matter

While the discovery of crAssphage underscores the hidden complexity of the human gut virome, it is just scratching the surface, as metagenomic studies consistently reveal that a vast proportion of sequences don’t match any known virus. These novel sequences, referred to as “viral dark matter”, hint at a vast universe of undiscovered viruses. For instance, the Global Ocean Viromes 2.0 (GOV 2.0) dataset – compiled from multiple ocean sampling expeditions – identified nearly 200,000 viral populations, around 12 times more than in earlier datasets. Deep-sea expeditions to the South China Sea uncovered ~30,000 viral Operational Taxonomic Units (vOTUs), with over 99% lacking close relatives amongst cultivated reference viruses.

Auxiliary metabolic genes: viral genes that upgrade bacteria

Metagenomic analysis of viral genomes has uncovered auxiliary metabolic genes (AMGs) that allow viruses to influence the metabolism of their hosts. A striking example comes from deep-sea hydrothermal vent environments, where viruses carry genes involved in sulfur cycling, amino acid metabolism, and energy conservation processes such as sulfide production and sulfate reduction. These AMGs can stabilize host tRNA, enhancing the resilience of microbial hosts to extreme conditions.

Viruses sometimes acquire AMGs from their hosts or other organisms in the environment through horizontal gene transfer, revealing a deep evolutionary relationship between these partners. Such findings challenge the traditional view of viruses as mere genetic parasites, highlighting instead their ability to reprogram hosts and exert ecosystem-scale impact.

Tools fuelling discovery

The explosion of viral metagenomics has been driven by advances in sequencing technologies, bioinformatics, and computational tools. Below are some of the influential resources shaping the field:

1. Sequencing Technologies

  • Next-Generation Sequencing (NGS): Illumina platforms (e.g., MiSeq, NovaSeq) provide high-accuracy short reads, while Oxford Nanopore and PacBio enable long-read sequencing, resolving complex or highly repetitive viral genomes.
  • Shotgun metagenomics: The primary approach for unbiased sequencing of all nucleic acids in a sample, capturing both known and novel viruses.
  • Long-read metagenomics: Emerging for near-complete viral genome assembly from environmental samples.

2. Bioinformatics and analysis

  • Assembly: metaSPAdes and MEGAHIT are used to assemble fragmented viral genomes from complex metagenomes.
  • Viral identification:
    • VirSorter2 and DeepVirFinder use machine learning to detect viral sequences, including novel ones.
    • Kraken2 and Kaiju classify reads taxonomically.
  • Databases and annotation: Databases such as IMG/VR, RefSeq, and RVDB support taxonomic and functional annotation, aided by tools like Prokka and InterProScan.
  • Phylogenetics: IQ-TREE and RAxML explore viral evolution and relationships.

3. Platforms and emerging trends

  • MetaVir, VirSorter, and iVirus streamline viral metagenomic workflows.
  • Cloud-based systems (e.g., AWS, Google Cloud) and platforms like Galaxy support large-scale data analysis.
  • Machine learning tools (e.g., VIBRANT, MARVEL) and new approaches like CRISPR-based detection and single-virus genomics are pushing boundaries in sensitivity and resolution.

The outer limits

Despite remarkable progress, viral metagenomics faces key challenges. A vast proportion of viral sequences remain unclassified as “viral dark matter”, reflecting incomplete reference databases and the underrepresentation of RNA viruses due to technical hurdles. Sequence contamination from host or environmental DNA can complicate assembly, while taxonomy lags behind the pace of discovery, even with improved frameworks. Finally, predicting the function of novel viral genes remains a major bottleneck, requiring experimental validation to move beyond sequence-based inference. Addressing these issues will be critical for unlocking the full potential of metagenomics in viral ecology and public health.

What comes next

We can expect global sampling campaigns to continue across every existing environment. In the process, the accumulation of diverse viral genomes in databases will reduce the proportion of “dark” reads. Combining metagenomics with proteomics, metabolomics, and host ecology will help yield multi-layered viral insight. Lab-based experiments (like the team at Virology Research Services) will be vital for understanding newly discovered viruses and validating the functions of their genes. Harnessing novel viral enzymes and metabolic genes for industrial or therapeutic applications will drive biotechnology. Learning how to progress from new insights and technologies to tangible improvements in quality of life, such as scaling wastewater surveillance into public health infrastructure, will be crucial for maximising the benefits of metagenomics.

As metagenomics redraws the virosphere, incorporating the invisible viral majority into a visible ecosystem, it is redefining how we view microbiomes, pathogens, and life itself.

Blog by Farrell MacKenzie


Supported by Reckon Better

More Blog Posts

Recents Posts

Viral Zoonotic Threats

Viral Zoonotic Threats

Ancient Mesopotamian lawmakers wrote penalties for owners of rabid dogs around 1930 BC. In 1918, an influenza virus that likely originated in birds or pigs killed an estimated 50 million people, more than the combat deaths of World War I. In the 1920s, HIV-1 crossed...

How Viruses Hijack Our Neural Circuits

How Viruses Hijack Our Neural Circuits

Every time you sneeze or cough during an infection, you may be doing exactly what the virus wants. While the immune system clears most infections, viruses have evolved ways to escape the respiratory tract and reach new hosts. They achieve this by altering airway...

Find out how we can help

When it comes to viruses, we know our stuff!

Subscribe