COVID-19 infections in the USA almost three times higher than reported, model estimates

Newswise – DALLAS – February 8, 2021 – World health experts have long suspected that the incidence of COVID-19 has been higher than reported. Now, a machine learning algorithm developed at UT Southwestern estimates that the number of COVID-19 cases in the United States since the beginning of the pandemic is almost three times greater than confirmed cases.

The algorithm, described in a study published today in PLOS ONE, provides daily updated estimates of total infections to date, as well as how many people are currently infected in the United States and in the 50 countries most affected by the pandemic.

As of February 4, according to the model’s calculations, more than 71 million people in the United States – 21.5% of Americans – had contracted COVID-19. This compares to the substantially smaller number of publicly reported 26.7 million confirmed cases, says Jungsik Noh, Ph.D., assistant professor at UT Southwestern in the Lyda Hill Department of Bioinformatics and the study’s first author.

Of those 71 million Americans estimated to have COVID-19, 7 million (2.1 percent of the US population) had current infections and were potentially contagious on February 4, according to the algorithm.

Noh’s written study is based on calculations completed in September. At that time, he reports, the number of actual cumulative cases in 25 of the 50 hardest hit countries was five to 20 times greater than the numbers of confirmed cases then suggested.

Looking at the current information available in the online algorithm, the estimates are now closer to the reported numbers – but still much higher. As of February 4, Brazil had more than 36 million cumulative cases estimated by the algorithm, almost four times more than the 9.4 million confirmed cases reported. France had 14 million against the 3.2 million reported. And the UK had almost 25 million instead of about 4 million – more than six times more. Mexico, a discrepant case, had almost 15 times the number of reported cases – 27.6 million instead of 1.9 million confirmed cases.

“Estimates of actual infections reveal for the first time the true severity of COVID-19 in the United States and in countries around the world,” says Noh.

The algorithm uses the number of reported deaths – considered more accurate and complete than the number of laboratory confirmed cases – as the basis for its calculations. It then assumes an infection mortality rate of 0.66 percent, based on a previous study of the pandemic in China, and considers other factors, such as the average number of days from onset of symptoms to death or recovery. He also compares his estimate with the number of confirmed cases to calculate a proportion of confirmed to estimated infections.

Much is still uncertain about COVID-19 – particularly the death rate – and the estimates are therefore crude, says Noh. But he believes that the model’s estimates are more accurate and leave out fewer cases than those confirmed, currently used to guide public health policies. Having a more comprehensive estimate of the prevalence of the disease is important, adds Noh.

“These are critical statistics about the severity of COVID-19 in each region. Knowing the true severity in different regions will help us to fight effectively against the spread of the virus ”, he explains. “The population currently infected is the cause of future infections and deaths. Its actual size in a region is a crucial variable necessary to determine the severity of COVID-19 and to build strategies against regional outbreaks. “

In the USA, infection rates vary widely from state to state. California has had nearly 7 million infections since the start of the pandemic, compared to New York’s 5.7 million infections, according to the algorithm’s projections for February 4. In addition, the model estimated that California had 1.3 million active cases at that date, affecting 3.4 percent of the state’s population.

Other model estimates for February 4: In Pennsylvania, 11.2% of the population had current infections – the highest rate in any state, compared to a minimum of 0.15% of those living in Minnesota; in New York, an initial hot spot, 528,000 people had active infections, or about 2.7% of its population. Meanwhile, in Texas, 2.3% had current infections.

Noh says he developed the algorithm last summer while trying to decide whether to send his sixth-grade daughter back to school in person. There was nowhere to find the data he needed to assess the security of doing so, he says.

After building the machine’s algorithm, he found that the area where he lived had a current infection rate of about 1%. Then your daughter went to school.

Noh verified his findings by comparing his results with the existing prevalence rates found in several studies that used blood tests to check for antibodies to the SARS-CoV-2 virus, which causes COVID-19. For most areas tested, your algorithm’s infection estimates closely matched the percentage of people who tested positive for antibodies, according to the PLOS ONE study.

The online model uses death data from Johns Hopkins University’s COVID-19 and The COVID Tracking Project, a voluntary organization founded to help track COVID-19 to perform its daily updates. However, the estimates published in the PLOS ONE study date September 3. At that time, about 10 percent of the United States population had been infected with COVID-19, based on the Noh algorithm.

Gaudenz Danuser, Ph.D., chairman of the Lyda Hill Bioinformatics Department and professor of cell biology, was the senior author of the study. He also holds Patrick E. Haggerty’s distinguished chair in Basic Biomedical Sciences.

Funding came from Lyda Hill Philanthropies.

.Source