Friday, July 17, 2015

Comparing Sports Legends

When Novak Djokovic beat Roger Federer to win the Wimbledon men’s singles championship on July 12th, he gave his supporters fresh ammunition to argue that he is playing better tennis than anyone in history. It was his 14th victory in his past 21 matches against the Swiss maestro.
Younger fans might presume that only Mr. Federer’s superlative run from 2004-09 could compete with Mr. Djokovic’s dominance. But those with longer memories could make a compelling claim for Rod Laver, who won a record 200 tournaments from 1956-76, or even Bill Tilden, who dominated the 1920s. Mr. Federer’s oft-cited status as the best player ever, and Mr. Djokovic’s as the heir apparent, rest on a widely held but hard-to-prove assumption: because the quality of play has increased so much over time, today’s finest sportsmen must be superior to their predecessors.



Cross-era comparisons are easiest in sports like running, jumping and weightlifting, which are measured in units like time, distance or mass. In general, performance in such contests has improved substantially over the years: the average top-ten finisher in the men’s 100-metre sprint has cut his time from 11.2 seconds in 1900 to just under ten now, and in the marathon from around two hours and 35 minutes in 1939 to two hours and five minutes today. The gains have been greater still in events that require complex equipment or techniques: the current pole-vault world record, at 6.16 metres, is over 50% higher than the best height a century ago.



However, the pace of progress has tended to slow. Most events—with the men’s 100 metres an exception—have settled into a plateau, where new world records are set less often and surpass the old marks by smaller margins. For example, the best men’s 800-metre time has shrunk by a mere 0.82 seconds since 1981, versus almost four seconds in the 26 years before that. And in a few disciplines, improvement has ground to a halt completely. The average times for female short- and middle-distance runners have not budged in 30 years (though some 1980s records by eastern European competitors may have been aided by performance-enhancing drugs). Some “speed limit” is inevitable—humans will never run as fast as an aeroplane, or jump into outer space—and athletes may be approaching it much faster than is widely believed. Mark Denny of Stanford University calculates that most human race times are within 3% of their potential best.



Outside athletics, performance is harder to measure. In bowling, for example, the number of perfect 300 games per year in America rose nearly 40 times during the 30 years to 1999. But connoisseurs attribute much of this to strategically oiled lanes that guide the ball towards its target, rather than any broad-based gain in skill. Golf has demonstrated the opposite pattern: in response to better players wielding better clubs, designers have built longer golf courses with more hazards, such as lakes and bunkers.



Yet even these measurement difficulties pale in comparison with those in interactive sports, in which opponents affect each other. If players improve at the same rate, scoring levels will remain flat. The challenge of comparing players from different eras in games like football—Pelé, Maradona or Messi?—has fuelled many a bar-room brawl. But analysts have devised a few statistical methods to resolve these debates, and estimate how the greats of the past might fare against modern competition.



In a 1985 essay Stephen Jay Gould, a Harvard biologist, proposed using variance among athletes to measure quality of play. If a sport draws on a small population of potential players, mediocre ones will be able to get jobs. Facing inconsistent opposition, the best will produce outstanding results. In contrast, in a sport with a large talent pool, everyone who plays professionally will be reasonably excellent. As a result, the best players will be closer to the average. Gould concluded that the more individual performances in a league differ from each other, the weaker it is likely to be. 



This principle underlies a study by Charles Davis, an Australian researcher. He calculated the standard deviation—a measure of how closely clumped together or spread out players’ performances are—of cricketers’ batting averages in different time periods. He found that variance among batsmen was indeed about 25% lower in 2000 than during the 1930s, when Don Bradman, widely regarded as the sport’s greatest player, was at his peak. However, Bradman exceeded the average of his peers by an unparalleled 4.4 standard deviations, making him a one-in-100,000 outlier. That suggests that he would still be in a class of his own today, though his Test-match average might be in the 70s or 80s rather than his actual 99.94.



Another approach is to look for natural experiments buried within interactive sports. Perhaps the best one can be found in baseball. During World War II, most of the best baseball players were sent off to fight. In order to replenish their rosters, teams had to hire new players. These new players were much worse than the ones they replaced--at the skills they were hired to perform. So the replacement hitters were much worse at hitting than the hitters who went off to war, and the replacement pitchers were much worse at pitching than the pitchers who went off to war.



However, pitchers are only selected for their ability to pitch. They are not selected for their ability to hit. As a result, the replacement pitchers were probably just as good at *hitting* as the pitchers who went off to war were.



This creates a natural experiment. The group of pitchers who played in MLB during the war (the "guinea pigs") were equivalent hitters to the group of pitchers who played in MLB before and after the war. But they were batting against a group of pitchers who were much worse at pitching (the "laggards") than the group of pitchers who played in MLB before and after the war. As a result, we would expect pitchers that played during the war to produce better hitting numbers than pitchers who played before and after the war did, because they were of equal quality as hitters but faced far weaker competition. Sure enough, that's exactly what happened. Pitchers hit about 60% as well as non-pitchers in 1942 and 1946, but 65-66% as well as non-pitchers from 1943-45. This is pretty strong evidence that hitting by pitchers is a good measure of overall quality of play: during the wartime years, when we *know* there was a sharp decline in quality of play, pitchers hit much better than they did in the surrounding seasons.
With the hypothesis that hitting is a good measure of quality of play supported by the war years, we can then zoom out and examine how pitchers have hit during the entirety of baseball history. As the graph shows, their hitting has gotten steadily worse over time. In the early 1870's, when the game was in its infancy, they hit 90% as well as non-pitchers. By 2015, they only hit about 45% as well, a gap of 45 percentage points. That suggests that a league-average hitter in 1871 would hit about 55% as well as an average hitter today.



Next, we apply this method to Babe Ruth. In the 1920's, pitchers hit about 65% as well as non-pitchers. Today, they hit 45% as well. So we can subtract 20 percentage points from Ruth to get an estimate of how well he would perform today. He hit about 45% better than average during his time, so that equates to about 25% better than average today. The best modern players, like Mike Trout or Miguel Cabrera, hit precisely around 25% better than average.



Finally, we can use this measure to estimate an overall effect on wins and losses. A league-average team will by definition win half of its games. If you subtract 20 percentage points of offense--so they score 80% as many runs as average, while allowing an average total--that team would be expected to win about 38% of its games.



Fortunately for fans of Mr. Djokovic, tennis seems to have improved faster than bat-and-ball games. In 2014 Jeff Sackmann, a statistical analyst, examined the performances of players since 1970 who were ranked in the top 50 for two consecutive years. He found that they scored an average of 2.2% fewer return points against other top-50 opponents in the second season than the first, because the players who entered the group in the second year were better than the ones they had replaced. Compounded over 44 years, that pace of improvement suggests that Mr. Laver would struggle to win a single game, let alone a set or match, against Mr. Federer or almost any other modern opponent. And unlike the plateaus seen in many forms of racing, the rate of progress has slowed only modestly to 1.5% in recent years. Even Mr. Djokovic will probably pale in comparison to future talent.

No comments:

Post a Comment