June 22-July 31, 2020
This week we delved into data. Primed by the Alessandro Vespignani’s video presentation, we addressed the main topic of the summer school:Integrating observational data into mathematical models through Data Assimilation (DA).
Data Assimilation
Our leader Chris started the week with an introduction to DA that emphasized three perspectives: DA as an application of optimization (variational methods), DA as an interpolation scheme (Kalman filters), and DA as an application of probability theory (Bayesian inference).
In true story-telling style, Chris whetted our appetites with some success stories of DA: improved numerical weather prediction, explanations of the cause of the fish kill in Kinneret Lake in Israel, and use of satellite data to visualize the ozone hole around the South Pole. The introduction was interwoven with exercises in breakout groups using Google Jamboard and frames.
Participants spent Tuesday morning practicing DA on a susceptible-infectious-recovered (SIR) model using a prepared MATLAB code. Getting the code to run was more of a challenge than we anticipated, but most groups were successful in the end. Students spent the afternoon revisiting the questions they wanted to present to the experts, who returned on Thursday.
Part of Tuesday evening’s homework assignment asked participants to read through the preprint “An international assessment of the COVID-19 pandemic using ensemble data assimilation” by Geir Evensen and his collaborators, in preparation for Wednesday morning’s lecture by Evensen.
The paper provides an overview of the Ensemble Kalman filter (EnKF) technique for parameter estimation, with applications to the SEIR model for COVID-19. The SEIR model is simple to understand and formulate, and tends to successfully model epidemics. The code EnKF_seir (available on GitHub) was developed as a tool for decision-making in Norway, especially with regard to school reopening plans. It is an age-structured model with 11 age groups and uses data from hospitals and care homes. Evensen’s paper includes results for many countries and states.
In his talk on Wednesday morning, Evensen summarized the technique and illustrated the results for Norway, the Netherlands, and four states in the US.
That afternoon, students worked to implement EnKF in the MATLAB and Python codes.
Data Fusion and Forecasting
Homework for Wednesday evening included watching a recording of a plenary talk by Sara Del Valle (Los Alamos National Laboratory) at the virtual 2020 SIAM Conference on the Life Sciences. Del Valle spoke about “Real-time Data Fusion to Guide Disease Forecasting Models” and offered an overview of Los Alamos’ work for real-time forecasting of epidemics. The work was inspired by a challenge issued by the Centers for Disease Control and Prevention (CDC) to forecast the onset, peak timing, and peak intensity of flu epidemics, based on influenza-like illness data that the CDC and other data streams make available on a weekly basis. The basic model is an SIR model with various statistical analysis techniques. Del Valle also shared results of dengue fever in Brazil and COVID-19 in the U.S. Participants discussed her talk on Thursday morning.
Interaction with the Experts
On Thursday afternoon, the school welcomed back the eight experts who had visited during the first week: Aziz Yakubu (Howard University), Cordelia McGehee (University of Minnesota), Jack O’Brian (Bowdoin College), Linda Allen (Texas Tech University), Pauline Van den Driessche (University of Victoria), Jianhong Wu (York University), Andrew Roberts, (Cerner Corporation), and Nick Ma (Cerner Corporation). They listened to presentations of 17 different projects, each neatly summarized in a single slide. The experts selected topics that matched their expertise and discussed the individual projects with the proposers.
It is worth noting that one more project was added to the list overnight for a total of 18 projects, with expert opinions and suggestions for possible further research.
Pivoting to the Real Task
Friday was the pivot point of the summer school. After hearing and reading about the many questions that surround the novel coronavirus and the associated COVID-19 disease— and becoming familiar with the various mathematical models of epidemiology, exploring the numerous datasets available online and elsewhere about the current pandemic, and tentatively identifying 18 projects of interest for further research—it was time to develop a plan.
As a first step, the leaders had identified five themes or “umbrellas” and categorized each project under one of the following umbrellas:
- Social justice
- Impacts of human behavior
- Diseases and the environment
- Resource allocation
- Incorporation of data.
While the proposed projects reflected the interests of the participants, they did not imply a commitment to participate in any particular scheme. It was then time for students to select a theme and make a commitment to work on one or more projects under the corresponding umbrella. We were also reminded that we should focus specifically on the mathematics. In other words, the moment of truth had arrived.
By the end of the day, everyone had found a “home” under one of the umbrellas. We used the weekend to reflect and prepare ourselves for the real task at hand.