Tracking the SARS-CoV-2 outbreak in the UK

After a year of relatively uneventful evolution, the emergence and global spread of variants of severe acute respiratory syndrome of coronavirus 2 (SARS-CoV-2) signal the urgent need for better genetic screening (1) The United Kingdom (UK) has emerged as a leader in this field. An investment of £ 20 million in March 2020 established the COVID-19 Genomics UK (COG-UK) Consortium (two), which produced> 200,000 SARS-CoV-2 genomes, more than double the number produced by any other country. This large volume of data provides an unprecedented opportunity to track which human activities drive the growth of the epidemic during a rapidly changing pandemic, but it also introduces several bioinformatics challenges. On page 708 of this issue, du Plessis et al. (3) describe a new hybrid phylogenetic approach that integrates genetic data with epidemiological and travel data to discover the roots of the severe UK spring epidemic. Notably, they found that the epidemic in the UK resulted from more than 1000 transmission lines sown by travelers from Europe.

The study shows how last winter’s control efforts were consistently one step behind the virus, allowing SARS-CoV-2 to penetrate national borders. His analysis of ∼26,000 sequences in the UK from January to June 2020, the largest study of its kind, reveals that the UK epidemic was brought to the country mainly by travelers from European neighbors: first Italy, then Spain and France. The peak of viral flow to the UK occurred in March, when the virus spread across Western Europe, but delays in surveillance have led to restrictions still focused on travelers arriving from Asia. By capturing a large number of small strains of transmission that would not be detected at lower levels of virological surveillance, as well as> 1600 singleton viruses with no progeny observed, the authors found an unprecedented amount of cross-border virus traffic. Genetic patterns mirrored the patterns of human movement, as the number of viruses entering the United Kingdom increased and fell after international travel plummeted in March.

Embedded Image

Several introductions of SARS-CoV-2 by travelers from Italy, Spain and France, but not from Asia, sowed the epidemic in the UK between January and June 2020.

PHOTO: TOBY MELVILLE / REUTERS

The United Kingdom is not the only country whose initial focus on Asia as the epicenter of the pandemic has allowed viruses from European sources to enter. Genetic data also tracked the origins of epidemics in Brazil (4), Boston (5) and New York City (6) back to Europe. Travel restrictions can be highly effective when rigorously implemented, but these studies collectively highlight the ease with which SARS-CoV-2 infection can arise even during small lapses in border control, including the repatriation of Americans from Asia in the beginning of the pandemic (7)

There is no magic formula for triangulating scalability, speed and statistical rigor, as the genomic data exceeds the capacity of existing platforms. Du Plessis et al. confronted the methodological challenges experienced in previous evolutionary analysis of SARS-CoV-2 (8), expanded in this case by a substantially larger data set. These challenges include low phylogenetic signal among genetically similar viruses, exceeding the capacity of standard phylogenetic software, as well as biases that arise when other countries sequence different numbers of viruses in relation to the national case count. The authors seek a new approach that uses genetic data to infer the timing and number of virus introductions, but uses epidemiological metadata to infer the country of origin. Better integration of genomic and epidemiological data will continue to improve outbreak responses, but it can be complicated without open access data repositories – for example, for fluctuating volumes of global air travel. Epidemiologists increasingly use digital and cellular data collected to track human movements and patterns of social contact (9)

Contact tracking has been effective in controlling the first outbreaks of COVID-19, as the first outbreak in Europe in Munich, Germany (10), and providing important insights into community transmission and the role of over-propagation (11) But contact tracking is laborious and often abandoned as epidemics increase. Genetic data can add a new dimension to these efforts by efficiently determining whether two cases belong to the same transmission line, despite sampling gaps between individuals in the chain. Du Plessis et al. did not explore heterogeneities in city-level transmission (5), but his observations reveal the growth and extinction dependent on the size of hundreds of co-circulating strains, as the national epidemic was controlled by non-pharmaceutical interventions (INP).

The study by du Plessis et al. made use of a fraction of the UK strings generated so far. The risk of new emerging variants increases as SARS-CoV-2 populations appear globally, spreading to immunocompromised, chronically infected or even non-human hosts, where they encounter different selection pressures. As SARS-CoV-2 becomes more dynamic from an evolutionary point of view, well-sampled data from the UK provides a resource for the global community. Denmark, Australia and other countries also have intensive SARS-CoV-2 sequencing operations. But the United Kingdom is currently the only country with more than 1 million COVID-19 cases that sequence more than 1% of the SARS-CoV-2 genomes (the United Kingdom sequences ∼5%).

The most uncomfortable evolutionary questions require extensive population-level analyzes based on continuous representative national sampling, with random selection of viruses to be sequenced (12) A centrally coordinated sampling strategy is a highly advantageous feature of the United Kingdom’s virological monitoring program, even if it is less quantifiable than speed or volume (two) The United States generated the second largest number of SARS-CoV-2 genomes, but the proportion of sequenced cases varies markedly between cities and states due to differences in resources. Large-scale studies become methodologically challenging when data sets are accumulated from smaller studies originally designed to address other research issues, introducing biases. At times, it has been difficult to assess intriguing hypotheses, as if SARS-CoV-2 containing the D614G protein spike mutation has spread globally because of fitness advantages or random chance (13)

Variants that arise in a country quickly become a threat to neighbors. Countries must repay each other’s virological monitoring efforts in a rapidly changing global viral scenario. The UK’s ARTIC network actively shares resources and protocols for sequencing SARS-CoV-2. NextStrain provides a user-friendly visual platform to track the evolution of SARS-CoV-2 in near real time. Numerous open access bioinformatics tools have been developed to analyze SARS-CoV-2 sequences (14) But a lesson from the UK is the importance of sustained government investment in scalable national infrastructure. Intrepid academic researchers may build popular tools, but struggle to scale up as the amount of genomic data explodes. Global coordination would also be useful, including the universal adoption of a single nomenclature for SARS-CoV-2 strains.

The COVID-19 pandemic galvanized long-awaited investments in promising research areas at the frontiers of technology and big data. In the past two decades, faster, cheaper and more portable sequencing technologies and flexible bioinformatics platforms have laid the foundation for real-time genomic epidemiology. Ongoing jumps tend to be stimulated by public health crises, including outbreaks of influenza, Ebola and Zika (15) The COVID-19 jump started.

Thanks: The content does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does it imply endorsement by the United States Government.

Source