Finessing forecasts of Indian elections
Polling agencies can use a Bayesian model to predict the number of seats a party is expected to winUpdated: Apr 13, 2019 20:29 IST
The first phase of the Lok Sabha elections was held on April 11, setting off the process for electing the 17th Lok Sabha.
The last phase of polling is scheduled for May 19. The evening of that day will also see the release of a rash of exit polls. We have already seen, before the beginning of the first phase, a clutch of opinion polls on the number of seats various political parties and groupings can be expected to get.
Opinion polls and exit polls in India get a bad rap, and that’s not entirely unwarranted. Most exit polls — there are exceptions — are based on fairly simplistic statistical techniques. Using these, they arrive at a vote share, and then, using a black box (which means an undisclosed algorithm or formula) convert this vote share into the number of seats a party can hope to win. In a few cases, it’s entirely possible that the black box is replaced by opinions and judgements.
The perils of following this approach in a first-past-the-post system are all too evident. For instance, in such a system, where the candidate who secures the most votes wins, irrespective of the vote share, it becomes very difficult to convert votes into seats, especially in multi-cornered contests.
A far better approach would be to predict the number of seats a party can be expected to win using a Bayesian model. Complex as this may sound (and it is, to some extent), all this means is to factor probability into the entire polling exercise. For instance, ahead of last year’s assembly elections in the Hindi heartland states of Chhattisgarh, Madhya Pradesh, and Rajasthan, this column said that there was a probability (however small) that the Congress would win all three states. A Bayesian model would have assigned this event a probability, based on science and judgement (the best ways to assign probabilities for elections).
Such a model would probably throw up numbers and probabilities, which can then be aggregated into one single number (or a range) using straightforward mathematics. Not too many polling companies in India do this.
How does this work?
Information on how a particular population has historically voted is easily available in India. This could include information for just the previous election or several previous elections. This information can be used to construct what is called a prior distribution. But what happened last time need not necessarily happen again; there could be a new alliance; something that wasn’t a big election issue last time (say, the agrarian crisis or unemployment or national security), may have become one now. This means the distribution needs updating. This updating can be done using a scientific opinion poll on a small sample. The updated distribution is called a posterior distribution.
What this helps us do is to calculate what is called the posterior probability. This is done by revising the prior probability with a factor called “likelihood”, which is based on new information (in math, we simply say it has been updated using Bayes Theorem)
Given that there is a probabilistic function influencing the conversion of vote share into seat share, it’s easy to see how using Bayesian forecasting techniques can help improve the quality of forecasts. While no Indian agency has so far claimed to be using such a model — most forecasting agencies, and the media companies they work for, are notoriously reticent about sharing the material aspects of the methodology — Karthik Shashidhar, a quantitative specialist, has previously said in a column in Mint that some could be. Specifically, he named Today’s Chanakya for some of its 2013 and 2014 work and said it looked like the agency was using a Bayesian model.
Nate Silver, who correctly called 49 of the 50 US states in the 2008 US presidential election, is a big proponent of Bayesian models.
Can they be adopted to as vast and diverse a country as India? Yes, and they will perhaps cost lower than surveys that depend on brute sample size for accuracy do. It is surprising, then, that no one has done so. After all, Bayesian models have been used in politics, sport, weather, even to predict the winner of Big Brother (although the last was more in the nature of an academic exercise).
To be sure, Bayesian models are not without their failings. Since the prior model is built based on past information and assumptions, it may be wrong in itself. And clearly, a posterior probability calculated on the basis of an incorrect prior model is not going to be accurate.
India’s parliamentary elections are gigantic affairs, complex in terms of their underlying competitive factors, and hugely expensive, to contest as well as conduct. It would be appropriate for the world’s biggest exercise in democracy to have sophisticated predicting techniques — even if they go wrong in the end.
First Published: Apr 13, 2019 20:24 IST