From searches to sneezes: Evaluating digital indicators for allergic rhinitis surveillance in the United Kingdom

IEEE BigData 2024

Forthcoming
Evaluation of digital surveillance indicators—search trends and social media data—for allergic rhinitis monitoring in the United Kingdom. Presented at IEEE BigData 2024.
Authors

Manohara, K.

Jankin, S.

Roa, J.

Béchara, H.

Bousquet, J.

Garcia-Corral, P.

Sousa-Pinto, B.

Published

December 15, 2024

Doi

From Searches to Sneezes: Digital Surveillance for Allergic Rhinitis

Manohara, K., Jankin, S., Roa, J., Béchara, H., Bousquet, J., Garcia-Corral, P., & Sousa-Pinto, B.

J.M.W. Turner — The Burning of the Houses of Lords and Commons (1834) Cleveland Museum of Art · Cleveland Parliament consumed by fire while crowds watch from the Thames — a catastrophe unfolding in full public view, visible from every bridge and bank. Turner painted the spectacle of collapse: what a city sees when its own systems fail, and everyone watches in real time.

The Problem

Allergic rhinitis affects up to 40% of some populations and creates cyclical pressures on healthcare systems every pollen season. Traditional surveillance relies on clinical data that lags behind real-time disease activity, hindering timely interventions. Can digital traces – what people search, tweet, and report on health apps – serve as early warning signals?

The Approach

We analyzed eight years of weekly data from the United Kingdom (January 2016 – January 2024), integrating four digital data sources with clinical AR incidence from GP surveillance networks. We collected Google Trends data for “allergic rhinitis”, “pollen”, and “hay fever”; scraped tweets from X (formerly Twitter) containing the same AR-related terms to build a Twitter Frequency indicator; and incorporated self-reported medication use from the MASK-air mHealth app. We employed Spearman correlations, Granger causality tests, SARIMAX time-series models, linear regression, and Random Forest regression to assess each source’s predictive value.

What We Found

0.73
Spearman correlation (Google Trends vs. AR incidence)
10.33
Granger causality score (Google Trends)
8
Years of weekly surveillance data

Google Trends dominated at the national level, showing the strongest correlation with AR incidence (0.73) and the highest Granger causality score (10.33), meaning search behavior preceded and predicted clinical cases. Regression models using GT data alone performed nearly as well as combined models. Our scraped Twitter data provided slight but meaningful improvements in test-set performance – and, crucially, offers something Google Trends cannot: geographical specificity. While GT data is limited to national-level trends, geotagged tweets can capture local AR activity, making social media scraping a valuable complement for regional surveillance. Self-reported medication data from the MASK-air app showed weak correlations and did not significantly improve predictions at the national level.

Left: Normalized weekly time series of all digital indicators and clinical AR incidence over 8 years (2016–2024), showing the strong seasonal co-movement of Google Trends with AR incidence. Right: Random Forest regression predictions vs. observed AR incidence on the test set.

Digital activity indicators evaluated against influenza-like-illness surveillance

Why It Matters

Google Trends data can serve as a powerful, real-time proxy for allergic rhinitis surveillance in the UK – enabling earlier public health responses and better resource planning before clinical data becomes available. Integrating search-based digital indicators into surveillance systems could transform how we monitor and manage seasonal allergic conditions.

Citation

Manohara, K., Jankin, S., Roa, J., Béchara, H., Bousquet, J., Garcia-Corral, P., & Sousa-Pinto, B. (2024). From searches to sneezes: Evaluating digital indicators for allergic rhinitis surveillance in the United Kingdom. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, D.C., USA. DOI: 10.1109/BigData62323.2024.10825514

Citation

BibTeX citation:
@inproceedings{k.2024,
  author = {K. , Manohara and S. , Jankin and J. , Roa and H. , Béchara
    and J. , Bousquet and P. , Garcia-Corral and B. , Sousa-Pinto},
  title = {From Searches to Sneezes: {Evaluating} Digital Indicators for
    Allergic Rhinitis Surveillance in the {United} {Kingdom}},
  booktitle = {IEEE International Conference on Big Data (BigData) 2024},
  date = {2024-12-15},
  url = {https://jorgeroac.com/publications/papers/forthcoming/searches-to-sneezes/},
  doi = {10.1109/BigData62323.2024.10825514},
  langid = {en}
}
For attribution, please cite this work as:
K., Manohara, Jankin S., Roa J., et al. 2024. “From Searches to Sneezes: Evaluating Digital Indicators for Allergic Rhinitis Surveillance in the United Kingdom.” IEEE International Conference on Big Data (BigData) 2024, accepted, December 15. https://doi.org/10.1109/BigData62323.2024.10825514.