With the Ashes approaching, I have finally got around to doing a new article for this site. As I will be at the full five days at the first Test at Trent Bridge, I also plan to be updating the site with daily reports, probably of varying length. But first, a preview, to decide if the pre-series predictions of a nailed-on England win are justified.
By looking at the build-up to the last 10 Ashes (including the 2013 series) stretching back to 1994/95, we can see how Australia and England are shaping up compared to previous series.
The time range chosen to represent the build-up to each series is taken from the 1st of January two years before the series begins, up until the last Test before the Ashes begin.
So, for example, the 1994/95 series in Australia takesdata from January the 1st, 1992 to November the 24th, 1994. And the 1997 series in England takes data from the 1st of January 1995 to the 4th of June 1997.
Whilst this is slightly imbalanced in that it lends itself to a few extra months before a series in Australia, it shouldn't make a huge difference, and was substantially easier to use when compiling the data.
The first aspect we will look at is the win ratios of each side leading up into each Ashes. For this we just simply take the percentage of Test matches won by each team in the build-up to each series, the results of which are shown below:
It doesn't seem that form will give us a great indication of which side will come out on top this summer, since the form of the two sides are very close - although the only two times the sides have been particularly close - 2005 and 2010/11 - England have come out on top. So, the next thing to check is whether batting or bowling form is a good indicator of likelihood to win.
For the full details of the team's win ratios, see the Appendix.
For batting and bowling form, we decided that for the 9 previous Ashes to look at the form of the 7 batsmen and 5 bowlers who played the most during the Ashes, with 1 all-rounder selected for each side, whose batting and bowling would be looked at.
If it was a close call, for example two bowlers both played 2 Tests in an Ashes, then the one with the most experience was chosen, in order to glean the most data.
For the upcoming 2013 series, a team was chosen based on previous selections and media predictions. Below you can see the 20 teams selected:
For each team's 7 batsmen, we took their batting averages, but rather than just averaging those seven figures, we weighted it against the amount of innings they faced. For example, before the 2005 Ashes, Ian Bell averaged 297.00. This was because he had only had three innings. So by taking a weighted average, it avoided huge outliers skewing the data.
This year seems to be the first year when England lead Australia in batting form by a clear margin. The gap between the two sides this year (9.50) is larger than any gap before where the team behind has managed to win, with the 2005 Ashes being the previous series where the eventual winners were behind the winners by a clear margin, in this case 5.88. So whilst Australia can take heart from the England performance in 2005, their task this summer looks to be almost twice as difficult, not to mention that they will be on foreign soil, compared to the home side in 2005. The 1997 series saw an Australian side overcome worse batting form to beat England, but their average was only 1.29 behind, and as the overall form showed, the Australian team was far and away in the better form than England.
The weak Australian batting is one of the main reasons they are being written off, with a lot riding on the shoulders of captain Michael Clarke. In fact, take away Clarke and Australia only average 31.21, lower than any other average in the sample, just below England's 31.26 in the build-up to the 2001 Ashes. And previous series seem to indicate that Australia's poor batting form could be a decisive factor in the 2013 series.
See the Appendix for full details on batting form.
As with the batting, bowling averages were compared, weighted by wickets taken, so this time a lower average is better. This led to the following results:
What is perhaps more surprising is the low bowling average for England from 1997 to 2001, despite the side's generally poor results. And despite the current England side featuring some of the world's best bowlers, specifically Jimmy Anderson and Graeme Swann, the team's average (and in fact, both Anderson and Swann's averages) are just below 30.00.
Like overall form, the bowling form doesn't seem to give us a clear idea of which side will do the better in the 2013 Ashes, since they're so close. What it does do is show how baseless Mickey Arthur's claim that Australia had the best bowling attack in the world was.
See the Appendix for more on bowling form.
Which Indicator is the Strongest?
We can find the correlation (explained here) between the actual result in the 9 previous Ashes series and our different indicators to see which one is best at predicting the result of a series.
The different indicators we used were:
Win percentage - the side which has won the most of their games before a series should be favourites in the series.
Loss percentage - if a side has barely lost in the previous 2 years, then it should be hard for their opponents to make them start.
Batting averages - a side with a higher batting average should be in greater touch than their opponents.
Bowling averages - a side with a lower bowling average should be harder to score runs against.
Innings - if a side's batsmen have faced more innings in the build-up to the series than their opponents, they should be more used to Test cricket.
Wickets - if a side's bowlers have taken more wickets (probably through playing more matches) then they should have the experience advantage against their opponents.
Below are the graphs of each of the variables against result (click the thumbnails at the bottom to look through the pictures):
However, we can get a more formal measure of which indicator is the most important by finding r, known as the correlation coefficient. This is a measure between 1 and -1 and shows the strength of a relationship. If it is close to 1, then it is a strong positive relationship, which means the higher one measure, the higher the other one will be (for example, age of first child and years of education are positively correlated). If it is close to -1, then it is a strong negative correlation, where increasing one factor tends to decrease the other (for example, mean class size and reading scores seem to be negatively correlated). And if the number is close to 0, then there is very little relation (for example, there is very little correlation between watching TV and TV size).
So by putting the numbers into the formula for finding a coefficient, we get the following result:
Of the other four factors, the all do have relationships with the series result, but they're all quite weak relationships, with the strongest being the bowling average.
So using the 2013 data, can we predict what the series result will be? We can, by using a technique called regression. By choosing predictors we can get a formula which predicts Ashes results as closesly as possible. So by entering the three strongest variables, plus an extra three for the opponent (i.e. "Opponent's win percentage", Opponent's batting average" and "Opponent's bowling average"), we get the following formula:
England (Opp): Win% = 0.4643, BatAv = 48.46, BowAv = 28.68
There is a 7% chance the value is above 0.5 (which is when an Australian win becomes more likely than a series draw), which roughly equates to a 7% chance of an Australian win. And there is a 25% chance the value is above -0.5, which similarly equates to a 75% chance of an English win.
If we remove Michael Clarke from the Australian side, which as we earlier mentioned reduces the batting average to 31.21, then the expected series outcome becomes -2.362, which means without Clarke Australia are likely to lose a whole extra Test.
If the outlook seemed bleak for Australia in the build-up to this series, then I doubt the numbers given will have changed any of that.
These figures aren't infallible - the statistical methods used to work out the regression analysis don't know that the output of the series is 5 discrete events, but it does give us a very good idea nontheless.
An alternative method for more accurate future research could be to work out the probability per Test of an Australian win, an English win or a draw, and use combinatorics to work out the overall probability of an Australian series win.
Hopefully the result of this model, however, accurate and by the end of the summer we will be enjoying a third consecutive Ashes victory for England!
Below is the data for overall form, batting averages and bowling averages (click the thumbnails to see the different sets of data: