Monday, August 16, 2010

The Anatomy of a Block: Points Created (Part 5)

In case you missed it, check out Parts 1, 2, 3, and 4 in this series on "The Anatomy of a Block": Introduction, By Shot Location, By Shot Type, and Repeatable Skill.

This is a long post coming up, but I hope you try to read it to the end, as I believe I uncovered the most interesting findings so far in this study. I mentioned last post that I would be doing a summary of my findings and concluding with improvements and possible future study ideas on the value of a block. Turns out that there is a lot more to work on and a lot more ahead, but for the time being, this will be the last post in this series on "The Anatomy of a Block," which will almost certainly be re-continued sometime in the future.

It's amazing what social media and the strong online basketball community has been able to help me with in terms of understanding the merits of this study, but much more so the limitations and areas for improvement. In the end, I believe the analysis on the value of blocks based on shot location and shot type may be worth it, but that it is limited in assessing the defensive value of players, and even assessing the value of a block itself. There are a host of other factors that go into determining the quantitative value of a block and its effects on the game, not just shot location and shot type. I suppose this is a consequence of every area of research, in that an examination of one part of the analysis will never be complete and always has room for improvement.

Before I go into the other components to take into account when evaluating the value of a block, let me first address a few problems from my previous posts, thanks to the critical evaluations of readers. My initial idea of assigning a number to a block based on shot location and shot type eventually came up with an average value of a block for each shot-blocker, and hence, my main analysis revolved around measures in the units of PPS (points per shot). To recall, I looked at points saved per block by shot location and points saved per block by shot type. I did not give credit to the actual quantity of blocks amassed by the Marcus Cambys and the Dwight Howards when discussing the PPS values, and I realize that I should not have neglected it. Even if Dwight Howard does not get as high value per block (based on location or type) as Amare Stoudemire (both with over 10,000 minutes played since 2007), it would be misguided to suggest that Stoudemire was more valuable from the shot-blocking perspective than Howard when Howard's 791 blocks since 2007 (2.43 per game) is much greater than Stoudemire's 413 blocks (1.40 per game).

Here's a tabular reproduction of the top 25 shot-blockers in total blocks since 2007 with additional columns of points saved per 36 minutes played for shot location and shot type as well as a look at the top and bottom shot-blockers in that category (minimum of 200 blocks and 1.00 blocks per game since 2007):

Friday, August 13, 2010

The Anatomy of a Block: Repeatable Skill? (Part 4)

In case you missed it, check out Parts 1, 2, and 3 in this series on "The Anatomy of a Block": Introduction, By Shot Location, and By Shot Type.

In this post, I'll take a look at whether or not value of blocks as measured by shot location and value by shot type is a repeatable skill. The premise is that if a skill in sports is measured effectively, then there should be reasonable expectations that the statistic measuring the skill will remain consistent from year to year, making the skill repeatable. One of the tests of the value of a statistic in objective evaluation is looking at how much that statistic varies with time for each player. Essentially, if a set of numbers fluctuates from year to year for not just one but most players, then that provides evidence that a certain skill (such as blocking certain shots based on location or type) may not be repeatable. This type of analysis looking at the correlation between what a player does one year with what a player does the next year is used in preliminary studies before determining the existence of the hot hand, clutch hitting in baseball, and how much control a pitcher has over the number of hits he allows.

Here's a look at the block value by shot location of the top 25 shot-blockers in total blocks since 2007 along with their season-by-season values to see if they fluctuate:

Wednesday, August 11, 2010

The Anatomy of a Block: By Shot Type (Part 3)

In case you missed it, check out Parts 1 and 2 in this series on "The Anatomy of a Block": Introduction and By Shot Location.

In this post, I'll take a look at the value of a blocked shot based on the shot type. The first thing to figure out is what type of shot types are recorded in the PbP dataset provided by Basketball Geek (I really can't stress enough how thankful I am that Ryan J. Parker provided this data). I grouped every shot with a recorded shot type from the 2007-2010 seasons and found the number of shots taken as well as the total points scored in order to figure out points per shot by shot type (there are 63 different types of shots in the PbP data). Let's look at several lists of shot types: 1) Most shots taken, 2) Highest points per shot, and 3) Lowest points per shot.

Tuesday, August 10, 2010

The Anatomy of a Block: By Shot Location (Part 2)

In case you missed it, check out Part 1 in this series, The Anatomy of a Block: Introduction.

In this post, I'll take a look at the value of a blocked shot based on the shot location. The conventional wisdom is that big men down low find plenty of shot-blocking opportunities in the painted area and that perhaps forwards and more athletic guards get blocks at the 3-point line and in the jump shot range. Each location on the grid of a basketball court can be assigned a point value based on the expected point value of a shot in that specific location. These assigned point values can then be totaled by the number of blocks in each location in order to come up with "points saved per block by shot location" for each player.

To do this, I looked at four seasons' worth of PbP data with over 750,000+ shots and their X,Y coordinates to indicate their locations. If you can imagine yourself standing behind the offense's basket, the X-axis runs from left to right along the baseline (the range of X values is 0 to 50, or 51 possible values) and the Y-axis runs from bottom to top toward and beyond the 3-point line (the range of Y values is 1 to 35, or 35 possible values). This forms the basis of a half court, where the center of the hoop is located at (25, 5.25).

For the 51*35 = 1785 shot location coordinates I looked at, I noted the total number of shots taken in each coordinate over the past four seasons and stored it in a matrix. Here's what the shot location frequency for the NBA in that time period looks like (excuse the color scheme and note that approximately 28% of the shots were taken at rim):

Monday, August 9, 2010

The Anatomy of a Block: Introduction (Part 1)

Blocks are a fundamental statistic in basketball. Along with steals, the number of blocks is often recorded and cited by fans and writers in order to evaluate a basketball player's defense. Generally, fans attribute steals to small and fast guards with quick hands, while blocks are a contribution by tall, high-flying centers and power forwards who can get off the ground quickly. No doubt, such qualities are assets on the defensive end of the basketball court, and racking up steals and/or blocks force the worst kinds of turnovers for the opposing offense, as many of them result in fast break opportunities for the defensive team in transition.

Yet, the number of blocks a player gets is but a summation of a general defensive weapon, and says nothing of the value that the actual block gave to the defense by preventing a basket opportunity on a shot attempt. Sure, players like Hakeem Olajuwon racked up hundreds, even thousands of blocks in their careers to make cases for themselves as one of the best defensive centers in NBA history. And this study is not trying to take away from those exceptional centers who were able to gain much for their teams by swatting away multiple balls on a nightly basis.

Holy Morrow

So Brandon Morrow struck out 17 Rays yesterday, while having his no-hitter broken up with 2 outs in the 9th inning. Hopefully the novelty of striking out 17 hitters doesn't wear off and the near no-hitter doesn't take away from the fact that... well, Brandon Morrow has some sick stuff.

I'm still figuring out how to sync my pitch database for the 2010 season... which means I won't be able to take a look at my own PITCHf/x plots of Morrow's game just yet. Fortunately, there are some awesome PITCHf/x tools on the web, such as Joe Lefkowitz's website, Brooks Baseball, and TexasLeaguers.com. I will use those sites in the mean time if I want to take a look at current season pitch data, and just would like to give them lots of credit for the great work they've done for the baseball community (especially those like me who are not as well-versed in SQL and Perl).

Anyway, I pulled Morrow's 17 K's game from Brooks Baseball just to take a look at how he did. Of the 137 pitches he threw, 97 went for strikes, 63 were four-seam fastballs, 34 were splitters, 38 were sliders, and 2 were curveballs.

Sunday, August 8, 2010

Rivera's Fastballs

So, Mariano Rivera is pretty good. I thought I'd take a look at some of his PITCHf/x plots from 2007-2009. Before we go on, keep in mind that the pitch database that I have does not necessarily accurately categorize every fastball as cutter, sinker, four-seam fastball, etc. It is a well-known fact that Rivera throws only two pitches, well, three I suppose: the cutter and occasionally a four-seam fastball and a two-seam fastball. In the PITCHf/x database, FA, FC, and FF come up, which are generic fastballs, cutters, and four-seam fastballs respectively. Rivera's pitches between 2007-2009 break down and are categorized as follows:

FA: 897 pitches (28.7%)
FC: 1907 pitches (61.1%)
FF: 320 pitches (10.2%)

So bear in mind that some of his cutters which didn't have movement (intended or unintended) may have been categorized as generic fastballs, while some of them may also be two-seam or four-seam fastballs. Now let's take a look at some of his PITCHf/x plots from 2007-2009, split up by fastball type and batter handedness. Again, this is the catcher's POV, right-handed hitters standing on the left and left-handed hitters standing on the right:

Carmelo's Shots Blocked

I've been out of town for the weekend. It's amazing what a few days cut off from the Internet leaves in your Google Reader and RSS feeds, especially this breaking news that Kendrick Perkins signed with Boston for less than $800K.

What's also amazing is how a largely objective article in the sports world causes readers to scream for the firing of a writer as well as the boycott of the world leader in sports. Tom Haberstroh over at ESPN Insider had a great article a few days ago about Carmelo Anthony being an inefficient offensive player and not worthy of a max contract (in a Joe Johnson-less world). He presented pretty strong evidence that Carmelo is at least not one of the top 5 current players in the NBA, if you take his offensive ratings, the Nuggets' pace factor, and his sheer number of shots taken (wasted?) into account. Sure, Haberstroh's article may come across as written in order to belittle some of the conventional statistics that Carmelo Anthony has piled upon himself since 2003, but it's not like he's claiming that Carmelo should be riding the bench, just that he isn't as elite as fans and the media glorify him to be. It's pretty disconcerting how many readers can get offended by a statistical look at things so easily, but that is a barrier that we have to break in order to get a larger portion of the sports world to understand the usefulness of new, perceptive statistics that take context into account. It adds to our understanding of sports from what we watch through our eyes, not replaces it.

One of the things that Haberstroh mentioned about Carmelo was that he "got his shot blocked a whopping 109 times last season, which ranks as the second-highest total in the league, according to Hoopdata.com." I've been looking at blocks data in quite a bit of detail, and I thought I'd take a look at Carmelo's blocked shots on the offensive end.

Thursday, August 5, 2010

Looking at Pitch Movements

Here's a look at horizontal pitch movement against vertical pitch movement. The units are in inches, and is compared to a "theoretical pitch thrown at the same speed with no spin-induced movement." And since I'm on the subject of swinging third strikes, here are the movements of four-seam fastballs, changeups, curveballs, and sliders on those types of pitches between 2007-2009:

Wednesday, August 4, 2010

Update on Value of Blocked Shots

I do have an update on my research on the value of blocked shots. I made two preliminary rankings lists of the value of blocks by the leading blockers in the NBA between the 2006-2010 seasons. For the first list, I calculated points saved per block based on the shot location. I found that Brendan Haywood and Andris Biedrins (394 and 370 total blocks in 2006-2010, respectively) saved 0.987 and 0.982 points per block while Andrew Bogut (382), Chris Anderson (322), and Paul Millsap (319) saved 1.152, 1.130, and 1.131 points per block respectively, based solely on the location of the shot (so the value of blocking a running slam dunk is the same as that of a reverse layup in this model).

For the second list, I calculated points saved per block based on shot type. This is where I ran into problems, as although every shot is categorized with a specific shot type (no blanks or uncategorized shots), there are many idiosyncratic shot types such as running finger roll layups vs. driving finger roll layups. I've kept these for now (they should have negligible effect on the final values), and have not generalized the categorizations as of yet (say, into 3pts, dunks, layups, jump shots, etc.), but there is no question that dunks produce the most points per shot (PPS) of all the generic shot types, ranging from putback dunks at 1.81 PPS to running slam dunks at 1.97 PPS. Blocking dunks appears to be the most valuable skill for blocking any type of shot (all while earning a spot on Top 10 Plays on Sportscenter), while 'risky' jump shots such as the running jump, driving jump, and turnaround jump go down toward 1.06 PPS.

Called Strike Three

Yesterday, I took a look at swinging strike threes on four-seam fastballs, changeups, curveballs, and sliders. To summarize, I found what most baseball fans expect, that many swinging third strikes come on high pitches for four-seamers, low pitches for breaking balls, and low and away or sometimes just outside of the zone for sliders. These findings were expected for all four combinations of handedness.

Today, I'll look at called strike threes. Here's what I've got:

Tuesday, August 3, 2010

Spray Charts and Defensive Shifts

Here's my first look at plotting spray charts. I look at every batted ball in 2007-2009 of a few of the more famous pull-heavy left handed hitters, favorites for inducing defensive shifts. The cloud of hits obscures the baseball diamond behind it, but if you look closely, you can determine approximately where the infielders should ideally set up in order to pick up the most grounders:

Strike Three Swinging

It took awhile, but I'm going to finally get to post about baseball (it is, after all, by far my favorite sport). I was able to read values from the PITCHf/x database and plot pitch charts directly from R. Somehow, reading articles about PITCHf/x these past few years, though exciting, is not the same as actually being able to do my own PITCHf/x analysis. So it's definitely amazing to finally get into the game.

Batters getting fooled swinging on strike 3 appear on Sportscenter all the time it seems, whether it's a high fastball out of the zone or if it's a slider low and away. Whatever the case, a swinging strike on strike 3 is one of the most exciting plays in baseball, and I wanted to take a look at all of the swinging strike 3 pitches (strike 3 swinging pitches?) on four-seam fastballs, changeups, curveballs, and sliders. Let's take a look at all swinging strike 3 pitches from 2007-2009, categorized by pitch and by handedness (from the catcher's perspective):

Monday, August 2, 2010

Time Splits, FG%, and eFG%

Came up with a couple more basketball shot location visualizations. I can see more and more how this data can be used in addition to visualizing hot zones for players, teams, and etc. but it's always fun to see more basketball graphics.

It's definitely been an exciting week taking more and more of an indepth look at visualizing some of the data out there. All of that and I've only just scratched the surface of the PITCHf/x data. I hope to do much more PITCHf/x visualization analysis in the future, perhaps after I've exhausted all my questions concerning the NBA and NFL PBP data.

Something to look for in the future: I've been thinking more and more about John Huizinga's paper/presentation at the MIT Sloan Sports Analytics Conference about the value of a blocked shot. Sebastian Pruiti's summary of Huizinga's work was indeed very helpful for those of us who weren't fortunate enough to attend the conference back in March. You can check it out over at his acclaimed blog, NBA Playbook. Anyway, I've been looking at using Basketball Geek's data (under the alias Ryan J. Parker) to see how I could use points per shot by location to assign a value to each block between 2006-2010, since most of the blocks are attributed to their corresponding shot locations. My preliminary calculations show ranges from Brendan Haywood saving 0.987 points per block to Andrew Bogut saving 1.152 points per block, which seems to agree with Huizinga's conclusion that Haywood is not as valuable a shot blocker as traditional numbers indicate. Still, my preliminary estimates aren't that promising, but I'll continue to dig into it. Huizinga did have access to more crucial data (such as turnovers leading to blocks, etc.) but I believe that he did not use shot locations to determine expected values of shots blocked. Instead, he used preblock situations and shot types, which probably is more effective. Anyway, I'm going to continue some work on this, and hopefully I'll publish my findings on Think Blue Crew soon (with a look at players who were blocked the most as well).

Here are the time splits of NBA shot location frequencies for 1st quarter, 2nd quarter, 3rd quarter, 4th quarter, first two minutes of 1/2/3/4 quarters, and last two minutes of 4th quarter plus overtime. And here... we... go.:

Sunday, August 1, 2010

Scat backs vs. power backs, inside run vs. outside run

In football, there is a great diversity of running backs. Many of the most exciting players are the smaller and faster players, backs who use their speed and agility to elude defenders and make outside runs while dodging tacklers (Barry Sanders comes to mind). These are known as "scat backs." On the other extreme are bigger, stronger backs who are usually slower (but not always) than the smaller more agile scat backs. These are called "power backs."

In order to ascertain which backs are categorized to which type of running back and to what degree, I used the run direction data to look at running backs between the 2007-2009 seasons to roughly figure out which players were the "most extreme scat backs" and which were the "most extreme power backs" (but not necessarily the best or most successful). Here's what I came up with:

Assists, blocks, and team shot location frequencies

Since the play-by-play data records which shots are assisted and blocked, I filtered the 2006-2010 data in order to see where shots are assisted (locations of the shot made after an assist) and where shots are blocked: