Saturday, July 31, 2010

A first look at shot location visualizations

On the subject of investigating play-by-play data for the first time, Ryan J. Parker over at has provided the NBA stats community with great NBA play-by-play data between the 2006-2010 seasons. I downloaded that data this past week for the first time (even though I've known about it for awhile now), and I've become inspired to take a deeper look at the entire dataset.

Using a macro that I found via Google called "Merge CSV files," I was able to combine all of the play-by-play data in single spreadsheets, one for each of the four seasons that Basketball Geek has available.

I then filtered each of the spreadsheets by etype, and chose shot, in order to return all plays in each season that were shots. I took each of these filtered datasets combined them into a fifth Excel file to list all shots that happened in the past four regular seasons of the NBA (turns out to be 763,444 shots, which unfortunately does not agree with's 796,617 shots, something that I will ignore for now due to the sheer amount of entries here).

This shots data has everything from players on the court at the time, who the assist went to, who blocked the shot if it was, the result (made or missed) type of shot (ranging from 3pt to driving layup to pullup jumper to running bank shot), and, get this, the X and Y coordinates of each shot. And with a general knowledge of filter and pivot tables and the like, I've come up with a lot of interesting findings.

Thursday, July 29, 2010

More graphs: Play call% by yard line for each down

Yesterday, I looked at pass locations, run directions, field goal distances, and FGA% vs. Punt% by yard line. The following graphs, however, are far more interesting. They show the play call %s on each yard line, and there's a graph each for 1st down, 2nd down, etc.

Wednesday, July 28, 2010

First graphs: Pass locations, rush directions, field goals, and punting

As promised, here's a first look at some of the graphs from Burke's PBP dataset, specifically the 2008 spreadsheet. The first graph here looks at the distribution of pass locations in the 2008 NFL season. Passing up the middle, short or long, occurs less frequently than passing left or right. Passing right occurs more frequently than passing left, due to the fact that most quarterbacks are right-handed. Throwing over your shoulder requires more arm strength and gives the defense more time to adjust and get a good look to get a pick. In short, throwing left is common but is a slightly more risky play for a right-handed quarterback.

Tuesday, July 27, 2010

A first look at NFL PBP data

I will use this blog to present, discuss, and archive some of my thoughts on research into sports, particularly baseball, basketball, and football.

Thanks to Brian Burke of Advanced NFL Stats, we now have freely available play-by-play data for NFL seasons 2002-2009. I've taken the 2008 data set and added additional columns to capture other characteristics of each play, categorizing things like pass/rush plays, fumbles, types of penalties, types of scores, even run direction, pass location, intended receiver on complete/incomplete/intercepted passes, etc. This is thanks to the help of the valuable comments left by contributors at Burke's website.

Here are the original columns in the 2008 spreadsheet: