A Statistical Model to Predict the 2018-19 NHL Season


Last year around this time, I wrote about Player Overall Value, a stat I derived to try and estimate the value an individual hockey player adds to his team. This was based upon a number of stats that you would generally find on a box score, and regressions to determine how much each of these contributed to winning games. Then, I used this to run a projection for the 2017-18 season, which you can read about there (shoutout Vegas for messing things up).

This year, forget everything you read about that.

I am only partially kidding. This year, I wanted to take a different look at things, primarily to add a different dimension to the projection. That added dimension took the form of point projections for each individual skater that I expect to form the normal lineup for each team in the NHL. To get these, two different methods were taken, one to project goals and one to project assists.

Goals

To determine how many goals each player was expected to score, I took a look at shot rates for every player over their last three seasons. Using those numbers, I got a three year weighted average (for players with less than three years, their total was regressed to a league mean). That weighted average was then adjusted for their team effects. If a player was on a better team in 2018 than they will be this year, then their shot rate was affected by it, and vice versa. Once the shot rates/minute were compiled, they were multiplied by their projected time on ice to see how many shots each player would take over the course of the season.

After compiling their projected shot rates, I looked at shooting percentages. To determine the projected shooting percentage for 2019, it was another three year weighted average, with the same process being taken for players with less than 3 years of data.

This procedure was done for both even strength and powerplay. Then, shot totals for the season were multiplied by shooting percentage for the season, which gave us our projected goals for the year. Below is the top 20 projected goal scorers for this year among forwards, and top ten projected goal scorers among defensemen.

 Assists

For assists, an entirely different method was used. Since there is no standout predictor of assists, I used multiple regression with a few different statistics that I thought to be predictive. These stats were Corsi For (a possession metric), team shooting percentage, and individual shots. All three of these proved to be predictive, with individual shots having negative correlation. I used a similar method that we talked about with determining shot rates for each of the predictive stats, then used the linear formula that was derived from the regressions to determine the assist total for each player. Below is the leaderboard for the top 20 players as ranked by assists.

And finally, we have the top 20 players by total points. 

Now, a few things should jump out to anyone who follows hockey. Connor McDavid being anywhere but number one is the indication of a bad model, so why is this the case? McDavid gets the vast majority of his points off of assists, as he is not one of the top shooters in the league. And assists, by this model, largely depend on the strength of your team around you. Unfortunately for Connor McDavid, he plays for the Edmonton Oilers, a team that does not have much around him. We can safely expect that McDavid will have many more points than the 75 that this model projects.

Secondly, for the assists totals, they are largely bunched up together. Again, this is something that we would not expect to happen over the course of the season. However, as we talked about in regards to McDavid, assists are largely dependent upon team success, and as we will see in the next section, teams by this model are pretty bunched up. 

Season Simulation

So once the goals numbers were derived, the next portion was to simulate the season to see where teams would end up. However, first, two things had to be done to assess team strength. First and foremost, goaltending had to be projected. To do this, a lot of the work that was necessary was already done in the previous step. To predict the amount of goals each goalie would give up, I looked at how many shots I projected each team to give up, and then multiplied that by the save percentage projected for the goalie (again, based on a 3 year weighted average). 

With goals for and goals against for each team, each component of the season simulation was in place. To determine the strength of each team, I looked at pythagorean wins. Pythagorean wins was first looked at in baseball, by mathematician Bill James, as a way to predict how many games each team should win based on their runs scored and runs allowed. This was later applied to hockey. The formula appears as this:
(Goals Scored + 0.5)^2.1 / ((Goals Scored + 0.5)^2.1 + (Goals Allowed + 0.5)^2.1)
Using this, I was able to find the amount of points each team should have over the course of the year, as well as the playoff probability for each team. Both of these are shown below:

Now, again, there are problems with this season simulation, of which I will attempt to point out:
  • The playoff odds do not add up to 800%, or 8 playoff teams. This is because each sim is run independently of each other, instead of running the sim for the entire league at once. While the second way is the preferred way to do so, I barely made it through the basic computer programming course that I took and so this limitation is evident.
  • Certain teams with fewer points than another team have a higher playoff probability. What is the reason for this? The points projections were determined entirely from pythagorean wins, the formula we talked about above. The playoff probabilities were determined from simulating the season 1,000 times, which causes the slight disparity. Other factors in this are strength of schedule, and days of rest between games.
Ultimately, I feel that this model does a much better job of projecting points than the previous one. Here, we are much better suited to see the effects that individual players have on the entirety of the team, and I believe that the results from the simulation are pretty consistent with what is expected to happen over the course of the season. However, like all other models struggle with, hockey is nearly impossible to predict, and so it will be interesting to see which team is this year's Vegas. I know that I, for one, am ready to welcome our new Montreal Canadien overlords. 

Comments

Popular posts from this blog

2017-18 NHL Season Projection

Player Overall Value