Disclaimer: Early release articles are not considered as final versions. Any changes will be reflected in the online version in the month the article is officially released.
Author affiliation: European Programme for Public Health Microbiology Training, European Centre for Disease Prevention and Control, Solna, Sweden (D. Espadinha); National Reference Laboratory for STEC at Public Health Laboratory Health Service Executive, Cherry Orchard Hospital, Dublin, Ireland (D. Espadinha, A. Carroll, E. McNamara); European Programme for Intervention Epidemiology Training, European Centre for Disease Prevention and Control, Solna (M. Brady); Health Service Executive Health Protection Surveillance Centre, Dublin (M. Brady, C. Brehony, S. Cotter, P. Garvey); Health Service Executive National Social Inclusion Office, Dublin (D. Hamilton); Health Service Executive Public Health, Dr. Steevens’ Hospital, Dublin (L. O’Connor); Children’s Health Ireland at Temple Street, Dublin (R. Cunney); Royal College of Surgeons in Ireland, Dublin (R. Cunney); Trinity College Dublin School of Medicine and Saint James’s Hospital, Dublin (E. McNamara)
Shiga toxin–producing Escherichia coli (STEC) are a major cause of gastroenteritis worldwide. Transmission routes include person-to-person spread, animal contact, ingestion of untreated water, and consumption of contaminated food, including minced beef products and fresh produce such as lettuce and spinach (1). Symptoms range in severity, from diarrhea and bloody diarrhea to the potentially fatal condition hemolytic uremic syndrome (HUS), which is characterized by microangiopathic hemolytic anemia, thrombocytopenia, and acute kidney injury (2). A combination of host, environmental, and bacterial factors have been identified as contributors to HUS, including young age, bloody diarrhea and vomiting, antimicrobial drug treatment, and presence of specific Shiga toxin stx genes, the intimin eae gene, and the entero-hemolysin ehxA and α-hemolysin hlyA genes (3–6).
STEC has long been a public health problem in Ireland, which has reported the highest incidence rate among European Union Member States for many years; in 2018, the crude rate was 20.0 cases/100,000 population, nearly 10 times the average for Europe (7). In 2017, a total of 2.9% (n = 27) of reported STEC cases in Ireland led to HUS (1).
Despite past research and increased availability of microbial genomic information resulting from a rise in the application of molecular-based approaches to diagnose STEC infections (8), identification of factors that place patients at higher risk of HUS remains difficult. To gain new insights into factors potentially associated with HUS, we conducted a case–control study linking epidemiologic data reported on Ireland’s Computerised Infectious Disease Reporting (CIDR) system to complete pathogen molecular characterization data. Our investigation included a genomewide association study (GWAS) to identify novel genes associated with HUS in STEC isolate genomes.
Study Design and Record Linkage
In this retrospective case–control study, we selected patients from a national cohort of 3,735 persons notified as having STEC infection to Ireland’s Health Protection Surveillance Centre via CIDR during January 1, 2017–December 31, 2020. We linked epidemiologic and laboratory data from CIDR to laboratory records from the National Reference Laboratory for STEC at the Public Health Laboratory HSE Dublin. In total, 3,486 (93%) CIDR notifications could be linked to a laboratory record, 1,457 (39%) by using laboratory specimen identification and 2,029 (54%) by using a combination of variables (date of birth, sex, county of residence, specimen collection date, and report date). We validated linkage with Regional Departments of Public Health, which have responsibility for notifying STEC infections and related HUS and STEC outbreaks, according to a standard surveillance case definition (9). In line with the surveillance definitions of the European Union, we defined an HUS patient as an STEC patient who had acute renal failure and microangiopathic hemolytic anemia, thrombocytopenia, or both (10).
Whole-genome sequencing (WGS) results were available for 2,911 (84%) linked records. We selected patients from among those that met the inclusion criteria (n = 2,296 [66%]): having available WGS data and either having a sporadic infection (not outbreak associated) or being part of an outbreak. Only 1 patient from each outbreak was included, to mitigate potential bias from including the same strain multiple times and because of the lower threshold for testing during outbreak investigations. Case-patients were those who were notified as having STEC infection and who had related HUS. Controls were defined as patients who were notified as having STEC infection but who did not have HUS. Patients who had a clinical diagnosis but no laboratory sample could not be included.
Sample Size Estimation
We applied Fleiss formulas for unmatched case–control studies with continuity correction to estimate the minimum sample size for case-patients (n = 16) and controls (n = 64), given the power 0.8, significance level of p = 0.05, case-control ratio of 1:4, and target odds ratio (OR) of >2.0. We determined the probability of exposure (0.9 in case-patients and 0.5 in controls) on the basis of results of stx2 in a multivariable analysis of risk factors for STEC-related HUS conducted by other researchers (11). The final sample size was 514 patients, comprising all 108 cases that met the inclusion criteria (representing 82% of STEC patients who had HUS develop during the study period) and 416 unmatched controls.
Variables
We included epidemiologic variables routinely collected by standardized questionnaire (12). Those categories were age (categorized as 0–9 years or >10 years), sex, notification date, residence status in Ireland, public health administrative region (within Ireland) (13), outbreak association, reported vomiting, reported bloody diarrhea, residence in an urban or rural location (urban location defined as a settlement of >1,500 people), travel abroad within 10 days before illness onset, type of home drinking water (public or private), reported consumption of unpasteurized cheese or milk in the 10 days before illness onset, risk group (child attending crèche, childcare worker, or food handler), recent (within 10 days before illness onset) outdoor activities or recreational farmland contact (hillwalking, camping, swimming in lakes, water sports, or going to a beach), contact with farm animals or their feces, and HUS. An outbreak was defined as the occurrence of >2 cases that shared an epidemiologic link (a potential common source) or where the observed number of cases exceeded the expected number. We extracted the following genomic variables from isolates recovered from patients by the NRL: serogroup, stx genes or subtypes, eae genes or subtypes, ehxA gene, and genes with significant associations with HUS in the GWAS (Appendix 1).
WGS
The study dataset included genomes of 531 STEC isolates from 524 patients. All microbial culture and PCR testing at the NRL was ISO 15189 accredited. We excluded isolates from repeated sampling of the same patient (within the same episode of infection) unless the serogroup was different. We considered an episode of infection resolved if a patient had 2 negative stool samples 48 hours apart. Seven patients had isolates from 2 different episodes of infection; we included isolates from both episodes in the analysis.
The distribution of isolates by year increased from 99 in 2017 to 154 in 2018, decreased to 135 in 2019, and increased again to 141 in 2020. Isolates collected in 2017 were sequenced at the UK Health Security Agency Gastrointestinal Bacteria Reference Unit. From 2018 onward, all isolates were sequenced at the NRL. In brief, bacterial genomic DNA was extracted using a MagNA Pure 96 automated station (Roche Diagnostics, https://www.roche.com), according to the manufacturer’s instructions. DNA library preparation was performed using Nextera chemistry and MiSeq platform for sequencing (paired-end reads, read length 300 bp) (Illumina, https://www.illumina.com). The paired-ended reads were imported into BioNumerics version 8.1 (bioMérieux, https://www.biomerieux.com) and quality control and trimming performed according to default settings, and genomes assembled de novo with SPAdes (https://github.com/ablab/spades).
In Silico Virulence and Serogroup Analysis
Serogroup, stx subtype, and presence of eae and ehxA genes were detected through BioNumerics’ built-in search functions. The eae gene subtypes were determined using a BLAST search (https://blast.ncbi.nlm.nih.gov) of a manually curated in-house database established in the BioNumerics platform by collecting the nucleotide sequences of eae subtypes described in the literature (14–16).
Pangenome and Genomewide Association Studies
We performed further bioinformatic analyses by using tools available on the Galaxy Europe Server platform (https://usegalaxy.eu) (17). We annotated draft genomes by using Prokka Galaxy Version 1.14.6+galaxy1 (18) with the E. coli genus BLAST database. We then used Roary Galaxy version 3.13.+galaxy2 (19) in the pangenome creation, with loci defined by alleles with a minimum of 95% blastp identity and split paralogs enabled. We defined core genes as genes present in >99% of the genomes, the remaining genes were defined as accessory. We used Scoary Galaxy version 1.6.16+galaxy0 (20) to determine significant associations between accessory genes and HUS status. To control the false discovery rate associated with multiple comparisons, we considered genes positively associated if the OR was >1 and the Benjamini-Hochberg p value <0.05. We used pairwise comparisons with p<0.05 as a threshold to minimize the lineage confounding effect. We explored the putative function of genes annotated as hypothetical proteins by performing a BLAST search of consensus sequence against other databases such as UniProt (21) and STRING (22).
Phylogenomic Analysis
Figure
Figure. Maximum-likelihood phylogenetic tree of HUS and non-HUS STEC isolates from study of HUS among patients with STEC, Ireland, 2017–2020. Tree was generated by using RaxML (23) on the…
We generated a maximum-likelihood tree by using RaxML (23) on the basis of a multi-FASTA alignment of the core genes of the 531 STEC isolates (Figure). We annotated and visualized the final tree by using iTOL version 6.8.1 (https://itol.embl.de) (24).
Statistical Analyses
We performed statistical analyses by using the glm function in R version 4.0.3 (The R Project for Statistical Computing, https://www.r-project.org) and the car (26) and generalhoslem (27) packages. We first explored the relationship between case-patients and controls by using the χ2 test of proportions. We added variables that differed significantly (p<0.05) to univariate logistic regression to calculate ORs with 95% CIs and p values to assess the associations between the variables and HUS. We included the variables age, source of drinking water, and region of residence in stratified analysis to explore potential confounders and effect modifiers. We conducted multivariable logistic regression analysis (MVA) to control for negative and positive confounding and to calculate adjusted ORs (aORs). All p values correspond to a 2-tailed test. To reduce omitted-variable bias, we added predictor variables with a significance level of p<0.2 (rather than p<0.05) in the univariate analyses to the initial MVA model, an approach that is supported in the literature (28,29). We used forward stepwise techniques to identify variables suited or unsuited to the model and excluded variables on the basis of model efficiency, as indicated by the Akaike information criterion (AIC), in combination with other statistical tests.
We used multiple models to explore potential gene dependencies, variance inflation factor to explore collinearity, and the Hosmer-Lemeshow test to assess goodness of fit. We noted no evidence of poor fitting; the χ2 statistic for the final model was 11.9, d.f. = 8, and p = 0.156. We deemed variables with a significance level of p<0.05 in the final MVA model to be independently associated with HUS.
Ethics Approval
Formal consent was not required from patients in this study. STEC is a notifiable disease in Ireland, and formal consent is not required from patients to collect their data. CIDR data are collected as part of routine surveillance procedures, and laboratory testing records are collected as part of routine diagnostic and confirmatory testing. Approval was granted from the CIDR National Peer Review Committee to use CIDR data for the purposes of this study.
Patient Demographic Data
Among 524 patients, 233 (44%) were 0–9 years of age and 291 (56%) were >10 years of age (Table 1); 53% were female and 47% male. The highest proportion of patients was in the South (20%; n = 104) followed by the Southeast and East (each 16%; n = 82). Ninety-three (18%) patients had outbreak-associated infection (Table 2).
Patient Isolate Genomic Data
Overall, the stx subtypes most commonly found in patients’ isolates were stx2a alone (27%; n = 144), stx1a alone (20%; n = 105), or both stx1a and stx2a (29%; n = 154) (Table 3). The most common subtypes among case isolates were stx2a alone (52%; n = 56) or stx1a and stx2a (38%; n = 41). Four (4%) cases had stx1a alone. Isolates from 419 (80%) patients contained eae genes, wherein β1 (38%; n = 198) and γ1 (31%; n = 161) subtypes were predominant, similar to the 102 (94%) HUS cases, where γ1 (45%; n = 49) and β1 (38%; n = 41) were also predominant. Ninety-five (88%) case and 360 (87%) control isolates contained the ehxA gene. Isolates from 187 (36%) patients were serogroup O26, and isolates from 122 (23%) patients were serogroup O157.
Genomewide Association Study on Microbial Genomic Factors Associated with HUS
The pangenome for the 531 STEC isolates contained 63,763 genes, from which 1,246 were defined as core genes present in 99% of the isolates. Twenty-six accessory genes had statistically significant associations with HUS (Table 4). Of those, 7 genes encoded hypothetical proteins with unknown function; the other 19 genes were functionally annotated and predicted to be involved in different processes, such as toxin production (stx2B), phage life cycle (ybcQ_1, ydfU_1), transcriptional regulation (group_5720, yiaU, yedW_10), transporters (fieF, purP, sbp), sugar (pfkA, tpiA) and lipid (cdh) metabolisms, detoxification (sodA, yiiM, rsxG), and stress response (uspD, cpxA and cpxP, ygiW_2).
Results of Multivariable Analysis
We assessed 47 variables in the MVA, including the patient characteristics, epidemiologic factors, virulence genes, serogroups (Tables 5,6,7), and all 26 genes that had statistically significant associations in the GWAS (Appendix 2 Table 1). Variables in the final MVA model were age, region, outbreak association, stx subtypes, eae subtypes, and ehxA, pfkA, fieF, ygiW_2, and group_5720. We observed potential dependencies or synergies between ygiW_2 and group_5720 and between pfkA and fieF during development of the MVA model. To resolve that issue, we created 2 composite variables, ygiW_2/group_5720, and pfkA/fieF. Variables that we assessed but that did not remain in the final MVA model were season of STEC diagnosis; reported vomiting and reported bloody diarrhea; risk group (child in crèche, recent outdoor recreational activities or recreational farmland contact, contact of an unknown nature with farm animals or their feces); eae subtypes γ1, β1, and θ1; ehxA; and all genes from the GWAS except for pfkA, fieF, ygiW_2, and group_5720.
In MVA, younger patients (0–9 years of age) had 3-fold odds of HUS compared with those >10 years of age (aOR 3.3 [95% CI 1.7–6.4]). Patients residing in regions other than the East had lower odds of developing HUS compared with those resident in the East (Northeast aOR 0.2 [95% CI 0.0–0.6], Midlands aOR 0.2 [95% CI 0.1–0.7], Midwest aOR 0.3 [95% CI 0.1–0.7], West aOR 0.2 [95% CI 0.1–0.7], and Southeast aOR 0.1 [95% CI 0.0–0.4]). Persons with outbreak-associated infection had >3-fold odds of HUS compared with persons whose infection was not outbreak-associated (aOR 3.5 [95% CI 1.8–7.2]). Compared with patients who had stx1a alone, the odds of HUS were higher among patients with stx2a alone (aOR 154.3 [95% CI 27.1–1,567.3]), both stx1a and stx2a (aOR 36.7 [95% CI 7.3–358.4]), or both stx1a and stx2c (aOR 31.3 [95% CI 2.9–447.4]).
The inclusion of the genes ygiW_2 (aOR 3.2 [95% CI 1.2–9.1]) or group_5720 (aOR 2.6 [95% CI 1.3–5.3]) had positive associations with HUS in forward stepwise regression, but only group_5720 remained statistically significant when we made attempts to incorporate both genes as independent variables. A combined ygiW_2/group_5720 variable had increased odds (aOR 5.4 [95% CI 1.8–18.6]) and provided a better model fit.
Similarly, when assessed independently, the inclusion of pfkA (aOR 2.0 [95% CI 1.1–2.7]) showed a positive association with HUS but fieF (aOR 0.03 [95% CI 0.0–0.92]) showed a negative association, and a considerable increase in odds for pfkA when fieF was added to the model (aOR 58.05 [95% CI 1.9–1,104.7]). A combined pfkA/fieF variable had an overall positive association (aOR 1.8 [95% CI 1.0–3.3]) and provided a better model fit.
Phylogeny of HUS and Non-HUS STEC Isolates
HUS cases were distributed across several serogroups. Those serogroups were O26 (36%), O157 (26%), O145 (14%), O103 (4.6%), O111 (2.8%), and O55 (5.6%) (Figure).
Consistent with the findings of previous studies, we found that young age, outbreak-associated infection, and region of residence in Ireland were associated with HUS developing in STEC patients during the study period (4,30–32). The higher odds of HUS among patients residing in the East of Ireland (likely representing a more urban environment) might be because patients in more rural environments are protected by repeated previous STEC exposures, although we cannot confirm that hypothesis. Another possible reason is the higher density of childcare facilities in the East region; children are more likely to be associated with an STEC outbreak in a childcare setting in the East and therefore may have a higher risk for HUS. Being part of an STEC outbreak was associated with HUS, possibly because of increased virulence of pathogenic strains linked to outbreaks. Other factors that were associated with HUS in previous studies were season of infection and having reported bloody diarrhea and vomiting (4,30–32), factors that were significant in our univariate analysis but not in our MVA. Even though bloody diarrhea and vomiting were not significant, it is arguable that in the absence of information on symptom onset date, as in our study, those factors should not be included because of potential for causal confounding.
Also consistent with the findings of previous studies, we found that the presence of stx2 genes was independently associated with HUS (4,33). We demonstrated that the subtype stx2a alone had a stronger association with HUS compared with presence of stx1a alone or stx1 and stx2 subtype combinations (34). We further found that the combined presence of stx1a and stx2a was independently associated with HUS.
The presence of eae genes, described elsewhere as being associated with HUS (5,11,35–37), was not significantly associated with HUS in our study. That difference may be because of the collinearity we observed between stx and eae. Other genes involved in adherence, such as tir, toxB, and the sfp and lpf gene clusters, were not associated with HUS in our study (38). We excluded serogroup from MVA because of known collinearity with stx genes. The non–locus of enterocyte effacement–encoded immune system modulator nleH1–2 has been reported to be associated with HUS (30) but was not identified in our GWAS.
The application of GWAS methodology to public health research on STEC infections is relatively uncommon. STEC GWAS studies in other countries have focused on different outcomes (e.g., bloody diarrhea) or have been limited in sample size (34,37). Using GWAS, we identified 26 putative genes that were significantly associated with HUS but whose definitive role in HUS pathogenesis remains to be elucidated. Functional annotation suggests their involvement in processes such as toxin production, phage life cycle, transcriptional regulation, transporters, and stress response.
Only the 2 composite gene pairs pfkA/fieF and ygiW_2/group_5720 were significantly associated with HUS in MVA. Of note, pfkA and fieF are contiguous in the genome and have the same presence/absence pattern, supporting the theory of gene dependency. The fieF gene was negatively associated with HUS when added to the model as an independent variable but, when coupled with pfkA, it was positively associated and improved the model fit. Information on the potential role of those genes is limited. The pfkA gene product is a phosphofructokinase and a key component in the glycolytic pathway, enabling E. coli to utilize glucose as a carbon source (39), whereas the fieF gene encodes an iron/zinc/cadmium efflux transporter that forms part of a detoxification mechanism (40–42). Previous studies describe a role for ygiW in tolerance to cadmium, oxidative stress (43,44), and biofilm growth (45), whereas the group_5720 gene product appears to be similar to mokC (through functional annotation), a mediator in plasmid stabilization (46). Further research is warranted to explore how those genes could be interacting, and how they modulate STEC virulence and potentially contribute to HUS development.
One strength of our study is that we used data from a full national cohort of notified infections, minimizing potential bias where possible through study design. In contrast to prior studies on STEC-associated HUS, we had a large number of HUS cases (5,11,30) and used complete molecular data from the national strain collection.
For the novel gene associations, our findings should be interpreted cautiously. The ORs for gene pairs ygiW_2/group_5720 and pfkA/fieF were modest, and the role of those genes in pathogenesis needs to be further elucidated. For example, we did not measure the level of gene expression and regulation, which plays a fundamental role in virulence. We did explore potential gene interdependencies and interactions by using forward and backward stepwise regression techniques, but even though we identified interactions between 2 gene pairs, more may exist.
Regarding limitations of our study, we took measures to mitigate potential biases resulting from screening policies for STEC outbreaks in Ireland. Unknown biases might have resulted from exclusion of patients that did not yield culture-positive isolates and either could not be linked to a laboratory record or were linked but did not have associated isolate genomes. Whereas stx2 is more often associated with high-risk STEC isolates, isolates for 4 (4%) HUS cases were detected with stx1 only, even though we made every effort to find a co-infecting stx2-producing strain through exhaustive accredited laboratory methods. We cannot exclude the possibility that a co-infecting stx2-producing STEC was present at some point between illness onset date and sample collection, which ranged up to several weeks, and was not detectable in the sample. Recall bias was not possible in MVA, since the variables included were based on factual information. The R2 value of the MVA model suggested that 35% of the outcome (HUS) could be explained by the independent variables, indicating that other factors influence HUS development. Relevant data on volume of drinking water, underlying medical conditions, and other host factors; clinical management, including antimicrobial drug treatment; and longer-term data on outcomes were not available because those data are not collected in routine STEC surveillance in Ireland. In addition, variables of interest collected in routine surveillance, including recent outdoor recreational activities or recreational farmland contact, contact with farm animals or their feces, and residence in an urban or rural location, had a high number of missing observations, reducing the precision of results. We instead determined residence distribution on the basis of the administrative region. Furthermore, incomplete data for other variables may have negatively impacted their suitability to MVA. The type of GWAS carried out in this study also has limitations, assessing only presence or absence of accessory genes, omitting important genetic variation caused by single-nucleotide polymorphisms and insertions or deletions that could be explored through other GWAS methodologies (47–50).
In conclusion, this study benefitted from the use of a full national cohort of notified infections with complete molecular data and is another step toward clarifying the factors influencing HUS development among STEC patients. The roles of genes and their dependencies and synergies in STEC pathogenesis should be further investigated, particularly the role of the novel genes identified using GWAS. Our findings, particularly if validated by further studies, could improve early identification of higher-risk STEC infection and help guide enhanced surveillance and public health management.
Dr. Espadinha is a public health microbiologist at the Public Health Laboratory HSE Dublin. Her research interests include the molecular epidemiology and surveillance of bacterial pathogens and antimicrobial resistance monitoring. Ms. Brady is a senior epidemiologist at the Health Service Executive Health Protection Surveillance Centre in Ireland. Her research interests include the epidemiology of infectious diseases and enhancing public health surveillance.