Wednesday, August 8, 2007

Are the yanks surprising?

"Derek, are we BAD AT BASEBALL? or just unlucky"
Clearly I have been completely fixated on this idea of the pythagorean expectation and figuring out whether a team's below expected performance is due to random chance or something else is figuring in (insert favorite explanation here). I decided to get away from all of the statistics and to harness the power of computing to replay the entire Yankee season to see if where we are now could be expected randomly or it something else is going on.

Here is how I replayed the season:

(1) Imagine the number of runs scored by the Yankees in a given game is entirely independent of the number of runs scored against the Yankees. This assumption holds up pretty well as the correlation of those two numbers is 0.014 (zero is no correlation 1 and -1 is extreme correlation).

(2) We now have two lists of numbers, Yankees scores and other team scores. Instead of the outcome we observed, we can randomly select a different Yankee score to match to the other team score. From those new scores, we can figure out which team won our pretend game (with ties split in half).
Imagine the yankees played 3 games the scores of which were 2-1, 5-3 and 3-4 leading to 2 wins.
One replayed season might have the scores 3-1, 2-3, and 5-4, again 2 wins.
Another might have the scores 2-3, 5-1, and 3-4, which only would be 1 win.
Yet another might have the scores 3-3, 5-4 and 2-1, which would be 2 wins and 1 tie.

Results:
I did this rerandomization 10,000 times with the scores from Yankee games this year.
Currently, they are 63-50. Most often in our 10,000 fake seasons, they were 70-43. This is almost dead on the Pythagorean expectation with the best possible exponent (if you didn't follow that, don't worry about it). But, the likelihood of them winning 63 games or fewer in these random seasons is only 1.8%. That seems surprising. In fact- calculating a p-value to determine the likelihood that we would see performance that deviates so far from the mean, we find p = 0.04. Traditionally, in cogsci we reject the null hypothesis (here that performance is just a random assignment of Yankee's scores to opponents scores) when p<0.05. So, we can conclude that there is something wrong with the Yanks (insert favorite theory here)

"For some reason - we are awesome"
Arizona Diamondbacks
Actual: 63-51
Correlation btwn Snakes scores and Opponents scores = -0.1
Expected by best Pythagorean Estimate: 53.4 - 60.6
Most often seen in random seasons: 55-59, 56-58 (tied)
Likelihood of Actual given random seasons: ~1.1%
p-value = 0.02

Zona is better this season than we might expect by chance.

Under performance makes CC cry
2006 Cleveland Indians
Actual: 78-84
Correlation btwn Tribe scores and Opponents scores = -0.014
Expected by best Pythagorean Estimate: 88.8 - 73.2
Most often seen in random seasons: 88-74
Likelihood of Actual given random seasons: ~0.5%
p-value = 0.01
The Indians really screwed up bad last year.


Finally, (sort of) my Orioles
Who is this handsome man?

Actual: 52-58
Correlation btwn Birds scores and Opponents scores = +0.10
Expected by best Pythagorean Estimate: 55-55
Most often seen in random seasons: 56-54
Likelihood of Actual given random seasons: ~10%
p-value = 0.22

Maybe we shouldn't be so surprised by the Os...

No comments: