The first major set of genetic associations found in long COVID

PrecisionLife’s Dr Sayoni Das, a computational biologist who leads the research and development of bioinformatics pipelines that generate biological insights from PrecisionLife’s core technology and support drug discovery programmes, details a new study. Using combinatorial analysis, genetic variants associated with long COVID have been identified and, furthermore, it has been found that TLR4 antagonists may be a potential candidate for repurposing long COVID treatment.

Test tube with covid-19 label. Surrounded by DNA genetic strand

Why has it been challenging to identify genetic risk factors for long COVID?

There is an extensive array of symptoms associated with long COVID, with the most common being fatigue and post-exertional malaise, cognitive dysfunction, mood disturbances and respiratory problems. This is likely indicative of the heterogeneous nature of the disorder, and it is this complexity and diversity of clinical presentation and effects across multiple organ systems, that has made efforts to identify genetic risk factors using traditional genomic analysis approaches extremely challenging.

Although many studies have investigated the genetic risks underlying long COVID, only one genome-wide association study (GWAS) has identified a single risk locus around the lead variant in the FOXP4 gene. However, studies that use combinatorial analytical approaches to identify genetic risk factors in similarly heterogenous populations have demonstrated more success, for example in severe COVID-19 and ME/CFS.

Can you explain the combinatorial analysis method used in the study and how it helped identify genetic variants associated with long COVID?

Combinatorial analytics approaches identify combinations of features that together are associated with the disease phenotype in patient sub-groups, capturing the non-linear effects of interactions between multiple genes. These signals are distinct from and complementary to the monogenic, linear additive associations of single SNPs found by GWAS.

The PrecisionLife combinatorial analysis platform enables hypothesis-free identification of combinatorial features, known as disease signatures, which may include multiple SNP genotypes and/or other multi-modal features in combination. These disease signatures capture both the linear and non-linear effects of genetic and molecular interaction networks and enable the identification of associations including those that are only relevant to a subgroup of patients that influence disease risk, prognosis and/or therapy response.

The combinatorial approach is considerably more sensitive than GWAS and requires much smaller patient populations. It enables identification of novel genetic associations and mechanisms that may only be relevant to a subgroup of patients, leading to more novel associations than GWAS when analysing the same datasets.

In complex heterogenous diseases, such as long COVID, CNS disorders, autoimmune, cardiovascular, respiratory, and metabolic diseases, these non-linear combinatorial signals, and disease signatures, are significantly more important in understanding causative disease biology than in relatively monogenic disorders.

What were the key findings regarding the genetic signatures of long COVID patients?

Using combinatorial analytics, we identified genetic disease signatures (ie, combinations of genetic variants significantly associated with the development of long COVID) in two subpopulations of long COVID patients who had experienced either severe disease or a fatigue dominant phenotype.

We identified 73 genes linked to long COVID, of which nine genes have prior associations with acute COVID-19, and 14 were differentially expressed in a transcriptomic analysis of long COVID patients. Comparison of the long COVID analysis with our previous combinatorial analysis of ME/CFS patients from UK Biobank identified nine genes in common.

Pathway enrichment analyses revealed that the biological pathways most significantly associated with the 73 long COVID genes were mainly aligned with neurological and cardiometabolic diseases. The genes unique to Severe long COVID cohort were largely associated with immune pathways such as myeloid differentiation and macrophage foam cells while genes unique to the Fatigue Dominant cohort were enriched in metabolic pathways and processes such as MAPK/JNK signalling and cellular respiration.

We generated strong mechanism of action hypotheses for the role of these genes in the development of long COVID. Additionally, causal insights into the specific effects of key SNPs/genes on disease biology were generated by expanded genotype analysis of the disease signatures. Generation of such insights at scale using a hypothesis-free approach is a unique capability of the PrecisionLife’s platform.

What are the specific genes or genetic variants associated with severe long COVID, and how are they related to immune pathways and other biological mechanisms?

43 genes were identified to be strongly associated with the severe long COVID population who reported the greatest degree of symptoms experienced. The genes unique to the severe long COVID patients were found to be associated with immune pathways such as myeloid differentiation, macrophage foam cells and lipid signalling pathways. The greater number of genes involved in immune response in the Severe long COVID cohort could be indicative of a more severe form of SARS-CoV-2 acute infection. This may potentially arise as a result of patients experiencing higher viral loads than average, as we identified four genes that have been functionally linked to SARS-CoV-2 host response and/or acute severe COVID-19.

The study mentions the overlap in genes associated with Fatigue Dominant long COVID and ME/CFS, including those involved in circadian rhythm regulation and insulin regulation. How do these genetic similarities provide insights into the commonalities between these conditions and their biological mechanisms?

We identified five genes that were strongly associated with risk of development of long COVID in both the Severe and the Fatigue Dominant cohorts using our hypothesis-free approach. In addition, 23 genes identified in the Severe cohort were significantly associated with disease in the Fatigue Dominant cohort using hypothesis-driven analyses. To understand the biological differences underpinning these two clinical manifestations, we analysed the differences in pathways between the genes uniquely associated with the Severe and Fatigue Dominant cohorts.

When we evaluated the degree of similarity between the genes associated with ME/CFS and long COVID, we identified nine genes that were previously associated with ME/CFS. One of these genes is the CLOCK gene that is an important regulator of circadian rhythm, disruptions of which have been associated with impaired mitochondrial function and pain among other things. Dysregulated mitochondrial function results in the inability to meet energy demands in response to stressors such as exercise and can result in the post-exertional malaise that is a hallmark of both ME/CFS and Fatigue Dominant long COVID. We also identified the genes ATP9A and INSR in long COVID that we had hypothesised contributes to dysregulated insulin signalling in subgroups of ME/CFS patients. Type 2 diabetes-related signalling pathways and insulin resistance were also a key theme within the genes associated with long COVID.

The article mentions that TLR4 antagonists have been identified as potential candidates for repurposing long COVID treatment. Can you elaborate on how these antagonists may help with long COVID?

42 genes were found to be potentially tractable for novel drug discovery approaches for long COVID, of these 13 genes have drugs in clinical development pipelines. We are currently evaluating these repurposing opportunities for use in treating long COVID and/or ME/CFS.

We identified the TLR4 gene as an attractive repurposing candidate with potential to protect against long term cognitive impairment pathology caused by SARS-CoV-2. Our analysis indicated disease signatures linked to TLR4 were strongly associated with development of long COVID in 52 percent of the Severe cohort. There is additional supporting evidence that inhibition of TLR4 in a mouse model prevents long term cognitive pathology caused by SARS-CoV-2 and some clinical studies have already shown that antagonising TLR4 signalling inhibits inflammatory cytokine storms and reduces mortality rates in hospitalised COVID-19 patients.


About the author

Dr Sayoni Das Dr Sayoni Das

SVP Bioinformatics at PrecisionLife 

Sayoni leads the research and development of bioinformatics pipelines that generate biological insights from PrecisionLife’s core technology and support drug discovery programmes.

She is a computational biologist with a background in bioprocess engineering and biotechnology. Sayoni received a PhD in Computational Biology from University College London where she developed a protein function prediction method that was ranked among the top methods in two consecutive protein-function prediction challenges.

Prior to joining PrecisionLife, Sayoni developed tools for interpretation of genetic variants as part of her post-doctoral research and served as project coordinator for a large academic structural bioinformatics project.

Related topics

Related conditions

Related organisations