8 Indicators and statistics
8.1 Expected sideout rate
It is common to see passer performance reported as a single number bsaed on pass ratings, using a weighting scheme to combine them. For example, a perfect pass might be worth 3 points, an “OK” pass 2 points, a poor pass 1 point, and an error 0 points. Then a passer who passed one perfect pass, one poor pass, and one error would have a performance rating of (3 + 1 + 0)/3 = 1.333
. However, the weightings used in this type of approach are often arbitrary (why should a perfect pass be 3 times as valuable as a poor pass?)
Expected sideout rate uses pass ratings to evaluate passing performance, but with a more principled approach to assigning the value of the different pass outcomes. It weights each pass rating according to the league-average sideout rate associated with it. For example, if the league-average sideout rate on a perfect pass is 75%, and the league-average sideout rate on a poor pass is 50%, then a perfect pass should be worth 75/50 = 1.5
times as much as a poor pass.
We can implement this by first calculating the league-average sideout rate on each pass rating, using a reference data set which is usually a whole-league data set or similar (note that in this example, purely for convenience, we are using data px
from a single match as its own reference. This is generally not a good idea, as noted below):
<- px %>% dplyr::filter(skill == "Reception") %>% group_by(.data$evaluation) %>%
lso ::summarize(expected_sideout_rate = mean(.data$team == .data$point_won_by, na.rm = TRUE)) %>% ungroup dplyr
This tells us the relative value of each pass rating:
lso#> # A tibble: 6 × 2
#> evaluation expected_sideout_rate
#> <chr> <dbl>
#> 1 Error 0
#> 2 Negative, limited attack 0.512
#> 3 OK, no first tempo possible 0.444
#> 4 Perfect pass 0.808
#> 5 Poor, no attack 0
#> 6 Positive, attack 0.688
(As a side note — you can see that in this case an “OK” pass (e.g. a pass on the 3m line) has a value of 0.44, which is lower than the value of a negative pass (a poorer pass than an “OK” one — value 0.51). This is because we are using only a single match as our reference data set, and it just so happens that in this particular match the sideout rate on negative passes was better than on OK passes. With a larger reference data set from many matches, these types of inconsistencies will be greatly reduced.)
Then we can join our lso
data back to our target px
data set, creating an expected_sideout_rate
value associated with each pass. The overall expected sideout rate for a given player or team is then just the average of the expected_sideout_rate
values of all of their passes:
%>% dplyr::filter(.data$skill == "Reception") %>% left_join(lso, by = "evaluation") %>%
px group_by(.data$player_id, .data$player_name) %>%
::summarize(n_receptions = n(),
dplyrexpected_sideout_rate = mean(.data$expected_sideout_rate, na.rm = TRUE)) %>%
ungroup#> # A tibble: 9 × 4
#> player_id player_name n_receptions expected_sideout_rate
#> <chr> <chr> <int> <dbl>
#> 1 162 Jakub Peszko 3 0.647
#> 2 164 Bartosz Mariański 27 0.545
#> 3 231 Adrian Buchowski 28 0.536
#> 4 30341 Tomas Rousseaux 21 0.543
#> 5 30511 Jake Langlois 29 0.470
#> 6 420 Lukas Tichacek 2 0.512
#> 7 456 Rafał Sobański 25 0.549
#> 8 561 Marcin Komenda 1 0
#> 9 656 Michał Potera 24 0.649
8.2 Expected breakpoint rate
An analogous approach can be used to calculate expected breakpoint rate, as a measure of serving performance.
<- px %>% dplyr::filter(skill == "Serve") %>% group_by(.data$evaluation) %>%
lbp ::summarize(expected_breakpoint_rate = mean(.data$team == .data$point_won_by, na.rm = TRUE)) %>% ungroup dplyr
And
%>% dplyr::filter(.data$skill == "Serve") %>% left_join(lbp, by = "evaluation") %>%
px group_by(.data$player_id, .data$player_name) %>%
::summarize(n_serves = n(),
dplyrexpected_breakpoint_rate = mean(.data$expected_breakpoint_rate, na.rm = TRUE)) %>%
ungroup#> # A tibble: 19 × 4
#> player_id player_name n_serves expected_breakpoint_rate
#> <chr> <chr> <int> <dbl>
#> 1 162 Jakub Peszko 1 0
#> 2 172 Wojciech Sobala 2 0.270
#> 3 22529 Rafał Faryna 11 0.364
#> 4 22531 Bartłomiej Grzechnik 13 0.319
#> 5 22706 Maciej Fijałek 1 0.270
#> 6 231 Adrian Buchowski 15 0.480
#> 7 235 Tomasz Kowalski 2 0.244
#> 8 29752 Dawid Woch 2 0.488
#> 9 29886 Emanuel Kohut 15 0.313
#> 10 30341 Tomas Rousseaux 16 0.374
#> 11 30511 Jake Langlois 11 0.264
#> 12 420 Lukas Tichacek 22 0.392
#> 13 433 Bartosz Krzysiek 7 0.377
#> 14 450 Artur Ratajczak 13 0.461
#> 15 456 Rafał Sobański 13 0.399
#> 16 488 Karol Butryn 15 0.376
#> 17 516 Bartłomiej Krulicki 20 0.437
#> 18 561 Marcin Komenda 11 0.363
#> 19 632 Jan Fornal 3 0.325
8.3 Set assist rate
The assist rate is the proportion of sets that yield an attack kill. The lead
function from the dplyr package helps here, allowing us to augment the “set” data rows with the outcome of the associated attack (which will be in the data row following the set data row):
## first add a variable indicating whether a set was followed by a kill by the same team
%>% mutate(set_had_attack_kill = .data$skill == "Set" & lead(.data$skill) == "Attack" &
px lead(.data$evaluation) == "Winning attack" & lead(.data$team) == .data$team) %>%
## then filter to just set rows
filter(.data$skill == "Set") %>%
## and summarize as desired
group_by(.data$team, .data$phase) %>%
::summarize(assist_rate = sum(set_had_attack_kill, na.rm = TRUE) / n())
dplyr#> # A tibble: 4 × 3
#> # Groups: team [2]
#> team phase assist_rate
#> <chr> <chr> <dbl>
#> 1 GKS Katowice Reception 0.431
#> 2 GKS Katowice Transition 0.478
#> 3 MKS Będzin Reception 0.429
#> 4 MKS Będzin Transition 0.355
Note that this relies on all set and attack actions being scouted (i.e. there is a row in our px
data frame for every set, as well as for every attack. Some scouts do not record all ball touches — digs and sets are the most commonly-omitted skills.)