Contact tracing apps won’t elicit honest symptom reporting: Harvard epidemiologist
Caroline Buckee is a top epidemiologist at Harvard University’s TH Chan School of Public Health who pioneered something that has now become ubiquitous the world over in the fight against Covid-19—data from mobile phones.Buckee, the associate director of the School of Public Health’s Center for Communicable Disease Dynamics, had come up with the idea of using location data to study the spread of malaria while doing fieldwork in Kenya in 2011. She led a team of researchers who tracked more than 15 million cellphones for a year to map the way human mobility contributed to the spread of the disease.
This year, as Covid-19 began to spread out from its epicentre in China to become a global pandemic, Buckee put her expertise to use to form the Covid-19 Mobility Data Network, connecting epidemiologists from around the world to analyze massive tranches of mobile phone location data to track the efficacy of social-distancing measures, and the spread of the disease. In this email/phone interview with Rudraneil Sengupta, Buckee talks about the intersection of big data and disease control, contract tracing apps and privacy issues, and why predicting the spread of Covid-19 has been so fraught with uncertainties.
Has there ever been more interest on aggregating data on a large scale to fight a disease than what we are seeing now?
For about 10 years now I’ve been working with mobile phone data in the context of epidemic modelling and trying to understand the spatial spread of human pathogens via human mobility. It’s been slow progress because of the regulatory frameworks and the privacy frameworks—there is reticence to share data and scepticism about its utility. But in the last four months, since the beginning of the pandemic, it’s become abundantly clear to the world that actually this data is incredibly useful. So we have seen an outpouring of data from different kinds of providers—mobile phone companies, but also Facebook and Google and others. It has led to a proliferation of work that builds on a lot of what we have been doing for a long time, but it’s happening in a somewhat unregulated way. It’s exciting what’s happening, but also there is some cause for concern that we are putting the cart before the horse. We really need to take care in the interpretation of these data sets and ensure that they are actually answering policy-relevant questions. We also need to make sure that we are aggregating this data in a way that protects privacy.
What are the correct privacy protocols?
I should distinguish between contact tracing apps and what we are doing. What we are doing is always aggregated in time and space so you can never identify an individual in that data. There are different ways of doing that; you can use differential privacy, which basically means adding a layer of noise, so it jitters the positions—the signal is fine, but the details can’t be figured out. Also you never go down to the spatial resolution where there are too few devices in a particular area. A lot of times we aggregate in a way that’s memory-less—you can’t tell that this individual went here and then there—you can only tell that X number of people moved between city A and city B on this day.
For the contact tracing apps, obviously when you consent to take part in it, you are saying you are giving up your data to be analysed.
You could argue that boots-on-the-ground contact tracing is also invasive and also involves collection of individual data, but the problem with digital contact tracing is that it’s more opaque. Who has access to the data? Who controls the data? How is it being used? Is there a deadline for destroying it? Contact tracing apps open up a whole new realm of questions as to what’s appropriate and in what context and it will be effective. It remains to be seen what the efficacy of these apps will be in places where the health system is not very strong. I am not sure they will be effective and I think the risk of undermining public trust further and the potential for misuse of the data and spurious false positives, poor sensitivity and specificity, are all problematic.
Like the app we are using here, Aarogya Setu, where you self-declare symptoms...the feeling is, a lot of people will not comply
You need high compliance for these apps to work, and if the incentives are not there and the trust is not there then you won’t get sufficient compliance. If you try to make it mandatory then you will certainly not get honest reporting of symptoms. The second thing is that Bluetooth devices are not perfect, neither have they been tested yet. We don’t know the extent of false positives—the app may think you are close to somebody, but actually you are in an office building and behind a wall. These apps are going to have very variable false positivity rates, a lot of spurious pings, as well as false negative rates because not everybody is compliant, not everyone has a smartphone, and so on. Also can the health system even respond in a meaningful way to whatever data is coming from these apps?
What about people who cannot afford social distancing?
Strict quarantine and social distancing may not be appropriate everywhere. There’s also this issue of how to even think about isolating people in the context of an urban slum—it’s a very difficult problem. There are a set of approaches that’s been tried in refugee camps and low income settings that are crowded and they involve simple things like separating out infected people from susceptible people, masks, and this concept of immune shielding—though we still don’t know the extent of immunity or protection.
Usually, if you have some vulnerable groups, our tendency is to try and put them all in one place and then protect that place. But there is this idea that once you have people who have already had Covid-19, you should actually mix them in with the vulnerable population, and they will act as immune shields. For example, people who are already immune are given the task of taking care of elderly people.
What are the biggest challenges you face when you try to model the way the disease may behave?
The biggest one is not knowing how many cases there are. That affects all models. If you are not testing enough, if you have mild or no symptoms….then we don’t know where we are with the epidemic and we don’t know where the cases are. Without that underlying data, all the models become uncertain, we just don’t know where we are with the epidemic curve.
For longer-term forecasts, we don’t know about immunity, so we don’t know how protective it is or how long that lasts, and that has an impact on whether we can expect seasonal outbreaks in the future.
There’s a big question on how infectious children are and how much they are contributing. There is some data that suggests that children may be less infectious, but it’s still a big uncertainty
All of these epidemiological models also make some assumption on the contact rate—how many people do you come in close contact with every day? That’s a very hard thing to measure or make sense of given the various interventions. We don’t really know what impact it has had on the number of exposures people have every day, so that’s a major source of uncertainty.
Where do we stand right now?
The answer depends on whether we will see seasonal effects. For most countries, we are well into the first wave of the epidemic, and the lockdowns that have happened has in many places flattened the curve by reducing the contact rate. The big question is the extent to which we will have massive resurgence as we reopen, and when the highest risk for that resurgence is. The northe