Under the Knife: The effects of Tommy John surgery
Logically, we wanted to see if there was any impact on key pitching statistics before and after surgery, but the data set we found only contained the list of pitchers that had the surgery and not their stats. However, I noticed that there was a column for mlbamid, which corresponds to the player_id used on mlb.com.
Now that I had a way to pair the list of pitchers that had surgery with their stats, I used import.io to scrape the data. import.io makes scraping pages extremely easy by allowing you to easily extract table data. I pointed import.io at mlb.com and used the player id to retrieve a page and extract the table data.
This worked great except for one issue. The API that was created by import.io was not able to follow the redirect that mlb.com issues for current players. For example, if you visit a URL for an inactive player, there is no redirect, and import.io handles this perfectly. However, for active players like A.J. Burnett, there is a redirect from mlb.com to pittsburgh.pirates.mlb.com, and the import.io API does not handle this properly. In order to follow the redirect and capture the URL, I built a script that runs the following cURL command:
curl -w "%{url_effective}\n" -L -s -S http://mlb.com/team/player.jsp?player_id=150359
This allows the redirected URL to be captured. Once I ran this for all active pitchers, I was able to run the URLs through the batch search process and finish completing the data set.
With a complete data set combining all pitchers that have had Tommy John surgery with their career stats, we were finally able to start visualizing the data. After looking at several statistics for pitchers, we decided that Walks plus hits per inning pitched (WHIP) was the most fair statistic since a pitcher only controls how many walks and hits they give up per inning. Earned run average is not always a fair statistic because it depends heavily on the defense behind the pitcher as well as the size of the pitcher’s home stadium.
We wanted to focus on building a visualization that easily lets a user compare an individual pitcher to the rest of the cohort and also see the overall differences in the statistics before and after surgery. We worked with Dave Lee on the design, and he built an awesome graphic for us to build the viz around.
As it turns out, WHIP and ERA are higher after the surgery, thus illustrating that there is no advantage from a statistical perspective from having the surgery. The surgery simply allows pitchers to come back from an injury that was once a career-ender.
0 comments:
Note: Only a member of this blog may post a comment.