by Adrian Worton
Our General Election model is based on odds given by bookies (namely Ladbrokes) for each constituency, which are turned into percentages and used as the basis for random generations.
This is not the first election in which these odds have been available; the run up to the 2010 election was where Ladbrokes premiered their constituency-level odds. Therefore, having obtained some of these past odds we can look to see what can be learned about our current model.
Our General Election model is based on odds given by bookies (namely Ladbrokes) for each constituency, which are turned into percentages and used as the basis for random generations.
This is not the first election in which these odds have been available; the run up to the 2010 election was where Ladbrokes premiered their constituency-level odds. Therefore, having obtained some of these past odds we can look to see what can be learned about our current model.
How is the model currently doing?
The graph above shows the expected seat predictions made by us in the March update of our simulator (shown in green) for the seven debate parties, compared to a number of other predictions from the same time. It should be noted that Plaid predictions are absent for May15 and the Guardian - they do not necessarily predict zero seats. We have subtracted 200 seats from the Conservatives and Labour to avoid stretching the graph too far.
We can see that the TGIAF predictions for each party are either entirely above or entirely below the other predictions. This is not necessarily a major concern, as most other methods simply look at their favourite for a seat, and say they will win it, whereas we award each party with their probability of winning. So, for example, if the a seat has a 60% chance of being won by Party A, and a 40% chance of being won by Party B, then 0.6 and 0.4 are added to our expected seat total for Parties A & B respectively. This is different to the others, so a little difference is to be expected.
However, looking at the seat predictions for UKIP, the TGIAF value of 29.2 is vastly greater than those of the others, with the second-highest being the Guardian's prediction of 4. It seems that our method is over-rewarding those parties in second-place in constituencies, at the expense of the favourites. This would explain why the Liberal Democrats are also a bit higher than expected, and why the SNP (who are favourites in the majority of Scottish seats) are under-valued, along with Labour and the Conservatives.
Therefore, we need to look at how the predicted probabilities from the 2010 election transferred into actual likelihoods of a side winning a seat, in order to see how our model can be transformed.
2010 Results
Unfortunately, the full list of odds ahead of the 2010 vote aren't available (if you do know of a source, please feel free to get in touch). However, we were lucky enough to find a list of the top 200 target seats for the Conservatives, and their respective odds (see Sources).
However, this is just one odd for a given seat, our method relies on knowing all odds for a constituency in order to work out probabilities. However, we were able to easily use our 2015 data to turn these odds into percentages. To see the method behind this, see Appendix.
We can then group these odds into ten bands - those with 0-10% chance of winning, those with 10-20% chance of winning, and so on. Within each band we see what proportion of the Conservative candidates actually won. We can see the results on the graph below:
We can see that the TGIAF predictions for each party are either entirely above or entirely below the other predictions. This is not necessarily a major concern, as most other methods simply look at their favourite for a seat, and say they will win it, whereas we award each party with their probability of winning. So, for example, if the a seat has a 60% chance of being won by Party A, and a 40% chance of being won by Party B, then 0.6 and 0.4 are added to our expected seat total for Parties A & B respectively. This is different to the others, so a little difference is to be expected.
However, looking at the seat predictions for UKIP, the TGIAF value of 29.2 is vastly greater than those of the others, with the second-highest being the Guardian's prediction of 4. It seems that our method is over-rewarding those parties in second-place in constituencies, at the expense of the favourites. This would explain why the Liberal Democrats are also a bit higher than expected, and why the SNP (who are favourites in the majority of Scottish seats) are under-valued, along with Labour and the Conservatives.
Therefore, we need to look at how the predicted probabilities from the 2010 election transferred into actual likelihoods of a side winning a seat, in order to see how our model can be transformed.
2010 Results
Unfortunately, the full list of odds ahead of the 2010 vote aren't available (if you do know of a source, please feel free to get in touch). However, we were lucky enough to find a list of the top 200 target seats for the Conservatives, and their respective odds (see Sources).
However, this is just one odd for a given seat, our method relies on knowing all odds for a constituency in order to work out probabilities. However, we were able to easily use our 2015 data to turn these odds into percentages. To see the method behind this, see Appendix.
We can then group these odds into ten bands - those with 0-10% chance of winning, those with 10-20% chance of winning, and so on. Within each band we see what proportion of the Conservative candidates actually won. We can see the results on the graph below:
We can see that the relationship is not linear. The favourites seem to win more than expected, and the outsiders win less than expected. This backs up our earlier idea that the model was over-representing the chances of the non-favourite parties in each seat.
Now we need to work out how we can correct this within our model.
Transforming the data
Our current method simply involves flipping the odds upside-down into their pay-offs using the following formula:
Now we need to work out how we can correct this within our model.
Transforming the data
Our current method simply involves flipping the odds upside-down into their pay-offs using the following formula:
1/(odds + 1)
These values are then re-scaled to add up to one. An explanation of this method is in our article on our first Premiership simulator. We need to exaggerate the pay-offs such that the larger ones increase at an exaggerated rate compared to the smaller ones. To do this we will use exponentials, simply using the following formula:
[1/(odds+1)]^ϕ
This is to say, the whole pay-off gets raised by an exponent ϕ (phi). The next step is to find the appropriate value for ϕ.
Phinding Phi
We cannot use the 2010 Conservative data as a cast-iron measure of how the relationship works, as it is too small a sample. But we know that the values should follow some relationship where the favourites become more, for want of a better word, favouritier, and so on. You can use the widget below to experiment with different values yourself, in order to see the result on the graph. A value of ϕ=1 will return the original values, anything lower will start to level the playing field, whilst a negative value will reverse the chances of victory in favour of the outsiders. Values too high will just polarise the results such that all favourites have a 100% chance of winning. Therefore, we recommend using a range for ϕ between 1.0 and 3.0.
Phinding Phi
We cannot use the 2010 Conservative data as a cast-iron measure of how the relationship works, as it is too small a sample. But we know that the values should follow some relationship where the favourites become more, for want of a better word, favouritier, and so on. You can use the widget below to experiment with different values yourself, in order to see the result on the graph. A value of ϕ=1 will return the original values, anything lower will start to level the playing field, whilst a negative value will reverse the chances of victory in favour of the outsiders. Values too high will just polarise the results such that all favourites have a 100% chance of winning. Therefore, we recommend using a range for ϕ between 1.0 and 3.0.
It is a subjective process to judge which value is best, therefore we will see which values of ϕ we need in order to bring our predictions in line with those we showed earlier. To do this we can compare the average difference between our prediction and those of the others. As we increase ϕ the graph below shows how the average difference changes:
Sources
May2015 - used for its listings of various seat predictions.
Political Betting - for its list of 200 Conservative seat odds from 2010.
Appendix - turning individual odds into percentages
Our challenge is to turn a single odd into the relevant percentage. For example, the odds of the Conservatives winning in Bath was given as evens, or 1/1. If bookies did not shorten their odds to guarantee a profit, then this would be equal to a 50% chance (think of it like a game where you toss a coin, and if it's heads you win £1, if it's tails you lose £1). However, we know they do alter odds, so we need to find a way to turn this into its "true" percentage.
All we do is look through our vast data for the current election, and find the average win likelihood of all candidates with the same original odd. For a runner with odds of 1/1, this percentage happens to be 45.7%. The relationship between odds and the likelihood of winning is shown below.
May2015 - used for its listings of various seat predictions.
Political Betting - for its list of 200 Conservative seat odds from 2010.
Appendix - turning individual odds into percentages
Our challenge is to turn a single odd into the relevant percentage. For example, the odds of the Conservatives winning in Bath was given as evens, or 1/1. If bookies did not shorten their odds to guarantee a profit, then this would be equal to a 50% chance (think of it like a game where you toss a coin, and if it's heads you win £1, if it's tails you lose £1). However, we know they do alter odds, so we need to find a way to turn this into its "true" percentage.
All we do is look through our vast data for the current election, and find the average win likelihood of all candidates with the same original odd. For a runner with odds of 1/1, this percentage happens to be 45.7%. The relationship between odds and the likelihood of winning is shown below.
We could endlessly increase ϕ and get slowly and slowly closer to the predictions, but this would lose the essence of our modelling approach. So we need to pick a value which is high enough to bring our predictions a bit more in line with those elsewhere, but not so much so as to just reduce it to a process where the favourites always win. Therefore, we have subjectively chosen 1.8 as the appropriate value.
Conclusion
Whilst our fundamental model is incredibly simplistic, if we want to fine-tune it to produce more conventional results, an awful amount of tinkering needs to take place.
If we had the full set of odds for 2010, we would be able to get a much better estimation of how to alter our model. Instead, we have to rely on subjective impressions.
Next time we will be using this altered model to give our latest predictions for each party.
Conclusion
Whilst our fundamental model is incredibly simplistic, if we want to fine-tune it to produce more conventional results, an awful amount of tinkering needs to take place.
If we had the full set of odds for 2010, we would be able to get a much better estimation of how to alter our model. Instead, we have to rely on subjective impressions.
Next time we will be using this altered model to give our latest predictions for each party.
We can also take this opportunity to compare these likelihood to the probability of someone winning if we did assume that the odds were unaltered, to get an idea of how much of a margin the bookies create for themselves. This relationship is shown below:
We can see that unsurprisingly, the bookies essentially given odds that overestimate someone's chances of winning, to reduce the pay-out they have to award. What is interesting to note is that the discrepancy between the fair and the actual percentages is that the gap is largest for the biggest favourites, presumably because these are the ones who the greatest proportion of people will bet on.
General Election articles
Previous: Target Seats - Greens
Next: General Election Update - Easter Update
Previous: Target Seats - Greens
Next: General Election Update - Easter Update