3 Reading files

3.1 DataVolley

library(datavolley)
filename <- "c:/my/filename.dvw"
x <- dv_read(filename)

The dv_read function has a number of optional parameters. The most important are probably:

  • insert_technical_timeouts. By default, technical timeouts will be inserted at points 8 and 16 of sets 1–4 (for indoor files) or when the team scores sum to 21 in sets 1–2 (beach). You can avoid inserting technical timeouts by setting this to FALSE, or change the scores at which TTs are inserted (see dv_read function help)

  • skill_evaluation_decode. By default, dv_read uses the standard DataVolley scouting conventions. This controls the interpretation of the evaluation codes (e.g. B/ is a block invasion (net touch or other violation)). However, not all scouts use these conventions. VolleyMetrics, for example, use B/ to mean a poor block that the opposition can replay (amongst other convention differences). In Germany, B/ is usually used to indicate a block tool (attack off the block for a kill) and B= is used to indicate an invasion. You can tell the dv_read function to follow these conventions by dv_read(..., skill_evaluation_decode = "volleymetrics") or dv_read(..., skill_evaluation_decode = "german"). If your files use other scouting conventions, you can write your own decoder (see dv_read function help)

  • date_format. Dates can be ambiguous in DataVolley files, and sometimes they will be parsed incorrectly (e.g. swapping month and day). If the dates in your files are being read incorrectly you can set the expected format, e.g. dv_read(..., date_format = "dmy") or dv_read(..., date_format = "mdy").

  • encoding specifies the text encoding of the file. By default this will be guessed, but if the text encoding is guessed incorrectly then player/team names might appear garbled (e.g. accented characters wrong) and in extreme cases the file might refuse to read altogether. You can get an idea of what is going on with text encoding by asking for verbose output: dv_read(..., verbose = TRUE). You can set the text encoding by e.g. dv_read(..., encoding = "windows-1252")

3.2 VBStats

library(peranavolley)
x <- pv_read("c:/my/filename.psvb")

3.3 Reading multiple files

You might want to read multiple files in and analyze them all together. First find all of the DataVolley files in the target directory:

d <- dir("c:/data", pattern = "dvw$", full.names = TRUE)
## if your files are in nested directories, add 'recursive = TRUE' to the arguments

Read all of those files in a loop, extract the play-by-play component from each, and then join of those all together:

lx <- list()
## read each file
for (fi in seq_along(d)) lx[[fi]] <- dv_read(d[fi], insert_technical_timeouts = FALSE)
## now extract the play-by-play component from each and bind them together
px <- list()
for (fi in seq_along(lx)) px[[fi]] <- plays(lx)
px <- do.call(rbind, px)

Note, the idiomatic R way to do this would be to use lapply instead of for loops:

lx <- lapply(d, dv_read, insert_technical_timeouts = FALSE)
px <- do.call(rbind, lapply(lx, plays))

It achieves the same thing. Use whichever you prefer. Similarly, you could also use dplyr’s bind_rows function instead of do.call(rbind, ...):

library(dplyr)
px <- bind_rows(lapply(lx, plays))

After these operations, we have lx, which is a list containing the full contents of every match file (including the match and team metadata), and px, which is just the play-by-play component of each (but all joined together, which makes it easy to analyze multiple matches at once).