by Adrian Worton

With the Ashes approaching, I have finally got around to doing a new article for this site. As I will be at the full five days at the first Test at Trent Bridge, I also plan to be updating the site with daily reports, probably of varying length. But first, a preview, to decide if the pre-series predictions of a nailed-on England win are justified.

By looking at the build-up to the last 10 Ashes (including the 2013 series) stretching back to 1994/95, we can see how Australia and England are shaping up compared to previous series.

**Introduction**With the Ashes approaching, I have finally got around to doing a new article for this site. As I will be at the full five days at the first Test at Trent Bridge, I also plan to be updating the site with daily reports, probably of varying length. But first, a preview, to decide if the pre-series predictions of a nailed-on England win are justified.

By looking at the build-up to the last 10 Ashes (including the 2013 series) stretching back to 1994/95, we can see how Australia and England are shaping up compared to previous series.

Of course, in sport, and famously in cricket, nothing is ever certain until it's happened. But what we can establish is whether an Australian win would require an upset greater than anything we have seen in the past 20 years. To do this, we can check the two sides' overall form leading into each Ashes, as well as their batting and bowling forms. As well as giving us an idea of what to expect this summer this will also give us a chance to see which of these three parts of a side's performance gives us the greatest predictor of the series result.

The time range chosen to represent the build-up to each series is taken from the 1st of January two years before the series begins, up until the last Test before the Ashes begin.

So, for example, the 1994/95 series in Australia takesdata from January the 1st, 1992 to November the 24th, 1994. And the 1997 series in England takes data from the 1st of January 1995 to the 4th of June 1997.

Whilst this is slightly imbalanced in that it lends itself to a few extra months before a series in Australia, it shouldn't make a huge difference, and was substantially easier to use when compiling the data.

The first aspect we will look at is the win ratios of each side leading up into each Ashes. For this we just simply take the percentage of Test matches won by each team in the build-up to each series, the results of which are shown below:

**The Build-Up**The time range chosen to represent the build-up to each series is taken from the 1st of January two years before the series begins, up until the last Test before the Ashes begin.

So, for example, the 1994/95 series in Australia takesdata from January the 1st, 1992 to November the 24th, 1994. And the 1997 series in England takes data from the 1st of January 1995 to the 4th of June 1997.

Whilst this is slightly imbalanced in that it lends itself to a few extra months before a series in Australia, it shouldn't make a huge difference, and was substantially easier to use when compiling the data.

**Overall Form**The first aspect we will look at is the win ratios of each side leading up into each Ashes. For this we just simply take the percentage of Test matches won by each team in the build-up to each series, the results of which are shown below:

The circles show which side won the Ashes that particular year. What is perhaps most surprising about this is that England haven't been ahead of Australia in terms of Tests won in the build-up to any of the last 10 Ashes, which is a bit of a shock given how in recent years England have topped the world rankings, whilst Australia have begun to slip down. However, using win percentages isn't a great indicator in that one team could have been facing weaker nations over the two-and-a-bit years before the Ashes. Since on average each side played around 29 Tests in each period, one or two Test series against, say, Bangladesh would certainly skew the data.

It doesn't seem that form will give us a great indication of which side will come out on top this summer, since the form of the two sides are very close - although the only two times the sides have been particularly close - 2005 and 2010/11 - England have come out on top. So, the next thing to check is whether batting or bowling form is a good indicator of likelihood to win.

For the full details of the team's win ratios, see the

For batting and bowling form, we decided that for the 9 previous Ashes to look at the form of the 7 batsmen and 5 bowlers who played the most during the Ashes, with 1 all-rounder selected for each side, whose batting and bowling would be looked at.

If it was a close call, for example two bowlers both played 2 Tests in an Ashes, then the one with the most experience was chosen, in order to glean the most data.

For the upcoming 2013 series, a team was chosen based on previous selections and media predictions. Below you can see the 20 teams selected:

It doesn't seem that form will give us a great indication of which side will come out on top this summer, since the form of the two sides are very close - although the only two times the sides have been particularly close - 2005 and 2010/11 - England have come out on top. So, the next thing to check is whether batting or bowling form is a good indicator of likelihood to win.

For the full details of the team's win ratios, see the

*Appendix*.**The Teams**For batting and bowling form, we decided that for the 9 previous Ashes to look at the form of the 7 batsmen and 5 bowlers who played the most during the Ashes, with 1 all-rounder selected for each side, whose batting and bowling would be looked at.

If it was a close call, for example two bowlers both played 2 Tests in an Ashes, then the one with the most experience was chosen, in order to glean the most data.

For the upcoming 2013 series, a team was chosen based on previous selections and media predictions. Below you can see the 20 teams selected:

Where Bt = Batsman, Bw = Bowler and AR = All-Rounder.

For each team's 7 batsmen, we took their batting averages, but rather than just averaging those seven figures, we weighted it against the amount of innings they faced. For example, before the 2005 Ashes, Ian Bell averaged 297.00. This was because he had only had three innings. So by taking a weighted average, it avoided huge outliers skewing the data.

**Batting Form**For each team's 7 batsmen, we took their batting averages, but rather than just averaging those seven figures, we weighted it against the amount of innings they faced. For example, before the 2005 Ashes, Ian Bell averaged 297.00. This was because he had only had three innings. So by taking a weighted average, it avoided huge outliers skewing the data.

Again, the circles represent that year's winners.

This year seems to be the first year when England lead Australia in batting form by a clear margin. The gap between the two sides this year (9.50) is larger than any gap before where the team behind has managed to win, with the 2005 Ashes being the previous series where the eventual winners were behind the winners by a clear margin, in this case 5.88. So whilst Australia can take heart from the England performance in 2005, their task this summer looks to be almost twice as difficult, not to mention that they will be on foreign soil, compared to the home side in 2005. The 1997 series saw an Australian side overcome worse batting form to beat England, but their average was only 1.29 behind, and as the overall form showed, the Australian team was far and away in the better form than England.

The weak Australian batting is one of the main reasons they are being written off, with a lot riding on the shoulders of captain Michael Clarke. In fact, take away Clarke and Australia only average 31.21, lower than any other average in the sample, just below England's 31.26 in the build-up to the 2001 Ashes. And previous series seem to indicate that Australia's poor batting form could be a decisive factor in the 2013 series.

See the

As with the batting, bowling averages were compared, weighted by wickets taken, so this time a lower average is better. This led to the following results:

This year seems to be the first year when England lead Australia in batting form by a clear margin. The gap between the two sides this year (9.50) is larger than any gap before where the team behind has managed to win, with the 2005 Ashes being the previous series where the eventual winners were behind the winners by a clear margin, in this case 5.88. So whilst Australia can take heart from the England performance in 2005, their task this summer looks to be almost twice as difficult, not to mention that they will be on foreign soil, compared to the home side in 2005. The 1997 series saw an Australian side overcome worse batting form to beat England, but their average was only 1.29 behind, and as the overall form showed, the Australian team was far and away in the better form than England.

The weak Australian batting is one of the main reasons they are being written off, with a lot riding on the shoulders of captain Michael Clarke. In fact, take away Clarke and Australia only average 31.21, lower than any other average in the sample, just below England's 31.26 in the build-up to the 2001 Ashes. And previous series seem to indicate that Australia's poor batting form could be a decisive factor in the 2013 series.

See the

*Appendix*for full details on batting form.**Bowling Form**As with the batting, bowling averages were compared, weighted by wickets taken, so this time a lower average is better. This led to the following results:

It's no coincidence that Australia's exceptionally low bowling averages from 1997 to 2006/07 are influenced by Glenn McGrath, as he played in every series in this time, and none outside it. McGrath's own bowling averages ranged from 18.90 to 21.27, along with over a hundred wickets in all-but-one of the build-ups to the Ashes he was involved in.

What is perhaps more surprising is the low bowling average for England from 1997 to 2001, despite the side's generally poor results. And despite the current England side featuring some of the world's best bowlers, specifically Jimmy Anderson and Graeme Swann, the team's average (and in fact, both Anderson and Swann's averages) are just below 30.00.

Like overall form, the bowling form doesn't seem to give us a clear idea of which side will do the better in the 2013 Ashes, since they're so close. What it does do is show how baseless Mickey Arthur's claim that Australia had the best bowling attack in the world was.

See the

We can find the correlation (explained here) between the actual result in the 9 previous Ashes series and our different indicators to see which one is best at predicting the result of a series.

The different indicators we used were:

Below are the graphs of each of the variables against result (click the thumbnails at the bottom to look through the pictures):

What is perhaps more surprising is the low bowling average for England from 1997 to 2001, despite the side's generally poor results. And despite the current England side featuring some of the world's best bowlers, specifically Jimmy Anderson and Graeme Swann, the team's average (and in fact, both Anderson and Swann's averages) are just below 30.00.

Like overall form, the bowling form doesn't seem to give us a clear idea of which side will do the better in the 2013 Ashes, since they're so close. What it does do is show how baseless Mickey Arthur's claim that Australia had the best bowling attack in the world was.

See the

*Appendix*for more on bowling form.**Which Indicator is the Strongest?**We can find the correlation (explained here) between the actual result in the 9 previous Ashes series and our different indicators to see which one is best at predicting the result of a series.

The different indicators we used were:

**Win percentage**- the side which has won the most of their games before a series should be favourites in the series.**Loss percentage**- if a side has barely lost in the previous 2 years, then it should be hard for their opponents to make them start.**Batting averages**- a side with a higher batting average should be in greater touch than their opponents.**Bowling averages**- a side with a lower bowling average should be harder to score runs against.**Innings**- if a side's batsmen have faced more innings in the build-up to the series than their opponents, they should be more used to Test cricket.**Wickets**- if a side's bowlers have taken more wickets (probably through playing more matches) then they should have the experience advantage against their opponents.**Results**Below are the graphs of each of the variables against result (click the thumbnails at the bottom to look through the pictures):

The steepness of the trend line indicates how strong the relationship is. From looking at the graphs it seems that the strongest indicators seem to be batting and bowling average, win percentage and loss percentage. The graphs also seem to demonstrate what a freakish result the 2006 series was for England, as their indicators that series (seen at -5.0 on the graphs) are always well away from the trend line.

However, we can get a more formal measure of which indicator is the most important by finding

So by putting the numbers into the formula for finding a coefficient, we get the following result:

However, we can get a more formal measure of which indicator is the most important by finding

*r*, known as the correlation coefficient. This is a measure between 1 and -1 and shows the strength of a relationship. If it is close to 1, then it is a strong positive relationship, which means the higher one measure, the higher the other one will be (for example, age of first child and years of education are positively correlated). If it is close to -1, then it is a strong negative correlation, where increasing one factor tends to decrease the other (for example, mean class size and reading scores seem to be negatively correlated). And if the number is close to 0, then there is very little relation (for example, there is very little correlation between watching TV and TV size).So by putting the numbers into the formula for finding a coefficient, we get the following result:

IndicatorWin percentageLoss percentage Batting averageBowling average Innings Wickets | Correlation (r)0.473-0.279 0.500-0.351 0.199 0.266 |

So we can now see that the strongest indicator of an Ashes' result is the batting average of a team in the build-up, followed by its win percentage. These are the two most important factors, and are in italics because they are the ones which have a p value below 0.05, which means that there is at least a 95% chance (known as being

Of the other four factors, the all do have relationships with the series result, but they're all quite weak relationships, with the strongest being the bowling average.

So using the 2013 data, can we predict what the series result will be? We can, by using a technique called regression. By choosing predictors we can get a formula which predicts Ashes results as closesly as possible. So by entering the three strongest variables, plus an extra three for the opponent (i.e. "Opponent's win percentage", Opponent's batting average" and "Opponent's bowling average"), we get the following formula:

*stastically significant*) that they do affect results.Of the other four factors, the all do have relationships with the series result, but they're all quite weak relationships, with the strongest being the bowling average.

**Ashes 2013**So using the 2013 data, can we predict what the series result will be? We can, by using a technique called regression. By choosing predictors we can get a formula which predicts Ashes results as closesly as possible. So by entering the three strongest variables, plus an extra three for the opponent (i.e. "Opponent's win percentage", Opponent's batting average" and "Opponent's bowling average"), we get the following formula:

*Res = - 0.00 + 3.13 Win% + 0.138 BatAv - 0.061 BowAv - 3.13 OppWin% - 0.138 OppBatAv + 0.061 OppBowAv*

This looks incredibly daunting. However, it makes sense once we substitute some numbers into it. So to get the predicted series result for Australia, we need to substitute in the following numbers:

*Australia:*Win% = 0.4800, BatAv = 38.96, BowAv = 29.22

*England (Opp):*Win% = 0.4643, BatAv = 48.46, BowAv = 28.68

This gives us a final result of -1.2948. This means that Australia are expected to lose the series by either 1 or 2 matches. The 95% confidence interval for this value is (-3.781, 1.194), which means there's a 95% chance the result will be between an England win by 3 and an Australian win by 1.

There is a 7% chance the value is above 0.5 (which is when an Australian win becomes more likely than a series draw), which roughly equates to a 7% chance of an Australian win. And there is a 25% chance the value is above -0.5, which similarly equates to a 75% chance of an English win.

If we remove Michael Clarke from the Australian side, which as we earlier mentioned reduces the batting average to 31.21, then the expected series outcome becomes -2.362, which means without Clarke Australia are likely to lose a whole extra Test.

If the outlook seemed bleak for Australia in the build-up to this series, then I doubt the numbers given will have changed any of that.

These figures aren't infallible - the statistical methods used to work out the regression analysis don't know that the output of the series is 5 discrete events, but it does give us a very good idea nontheless.

An alternative method for more accurate future research could be to work out the probability per Test of an Australian win, an English win or a draw, and use combinatorics to work out the overall probability of an Australian series win.

Hopefully the result of this model, however, accurate and by the end of the summer we will be enjoying a third consecutive Ashes victory for England!

Below is the data for overall form, batting averages and bowling averages (click the thumbnails to see the different sets of data:

There is a 7% chance the value is above 0.5 (which is when an Australian win becomes more likely than a series draw), which roughly equates to a 7% chance of an Australian win. And there is a 25% chance the value is above -0.5, which similarly equates to a 75% chance of an English win.

If we remove Michael Clarke from the Australian side, which as we earlier mentioned reduces the batting average to 31.21, then the expected series outcome becomes -2.362, which means without Clarke Australia are likely to lose a whole extra Test.

**Conclusion**If the outlook seemed bleak for Australia in the build-up to this series, then I doubt the numbers given will have changed any of that.

These figures aren't infallible - the statistical methods used to work out the regression analysis don't know that the output of the series is 5 discrete events, but it does give us a very good idea nontheless.

An alternative method for more accurate future research could be to work out the probability per Test of an Australian win, an English win or a draw, and use combinatorics to work out the overall probability of an Australian series win.

Hopefully the result of this model, however, accurate and by the end of the summer we will be enjoying a third consecutive Ashes victory for England!

**Appendix**Below is the data for overall form, batting averages and bowling averages (click the thumbnails to see the different sets of data: