6 Data augmentation
Some examples of adding extra columns to our play-by-play data in order to support particular analyses, or to make other data-wrangling tasks easier.
6.1 The setter of a given attack
Aim: Identify the player who made the set associated with each attack (noting that some files might not have the setting action coded for all attacks, or even coded at all).
<- px %>% mutate(set_player_id = case_when(skill == "Attack" & lag(skill) == "Set" & team == lag(team) ~ lag(player_id)) px
(The set_player_id
column will be NA
when there is no matching set coded before the attack.)
6.2 Setter player IDs
Aim: identify the player_id
of the setter on court for each data row.
The play-by-play data frame has home_player_id1
, home_player_id2
, …, home_player_id6
, which give the player_id
of the home team player currently in position 1, 2, …, and 6. It also has a home_setter_position
column, which tells us which position the home team setter is in (1–6). So to create a home_setter_id
column we simply need to extract the value in the home_player_idX
column for each row of the data frame, where X
is the value in the home_setter_position
at each row.
One way to do this is:
<- px %>% mutate(home_setter_id = case_when(home_setter_position == 1 ~ home_player_id1,
px == 2 ~ home_player_id2,
home_setter_position == 3 ~ home_player_id3,
home_setter_position == 4 ~ home_player_id4,
home_setter_position == 5 ~ home_player_id5,
home_setter_position == 6 ~ home_player_id6)) home_setter_position
But that is rather cumbersome and inelegant. A more concise method is:
<- px %>% rowwise() %>% mutate(home_setter_id = cur_data()[[paste0("home_player_id", home_setter_position)]]) %>% ungroup px
6.3 Reception quality
Aim: add a column that tells us the reception quality associated with each rally.
The reception quality for a given rally is found in the evaluation
column of the row that has the skill
value of “Reception”. We just need to propagate this value to all other rows associated with that particular rally.
First we extract the reception quality for each rally (remember from the Identifiers section that the point_id
value identifies the rally, but needs to be combined with match_id
to be globally unique):
<- px %>% dplyr::filter(skill == "Reception") %>% group_by(match_id, point_id) %>%
rq ::summarize(reception_quality = if (n() == 1) .data$evaluation else NA_character_) %>% ungroup dplyr
So rq
is a data.frame with the reception quality for each match_id
and point_id
combination. reception_quality
will be NA
if there was no reception (or more than one reception — perhaps due to a scouting error?) involved in that point. Now join rq
back to our full plays dataframe:
<- px %>% left_join(rq, by = c("match_id", "point_id")) px
So now we have reception_quality
for all rows of the data frame.