Finding dupes in pre/post data

find_dupes(data, vars)

Arguments

data: A data frame containing at least one key ID variable used to identify any duplicates
vars: The ID variable in quotes. If more than one variable is needed to identify unique records, pass a comma separated character vector e.g. `c("a", "b")`

Value

A data set containing the ID variables and number of copies found

Examples

suppressMessages(library(dplyr))
testdata <- datasets::sleep %>% rbind(c(.4, 1, 1)) %>% filter(extra != 3.7) # for example purposes
find_dupes(testdata, "ID")
#> # A tibble: 10 × 2
#>    ID        n
#>    <fct> <int>
#>  1 1         3
#>  2 2         2
#>  3 3         2
#>  4 4         2
#>  5 5         2
#>  6 6         2
#>  7 7         1
#>  8 8         2
#>  9 9         2
#> 10 10        2

# For more than one key variable
find_dupes(testdata, c("ID", "group"))
#> # A tibble: 19 × 3
#>    ID    group     n
#>    <fct> <fct> <int>
#>  1 1     1         2
#>  2 1     2         1
#>  3 2     1         1
#>  4 2     2         1
#>  5 3     1         1
#>  6 3     2         1
#>  7 4     1         1
#>  8 4     2         1
#>  9 5     1         1
#> 10 5     2         1
#> 11 6     1         1
#> 12 6     2         1
#> 13 7     2         1
#> 14 8     1         1
#> 15 8     2         1
#> 16 9     1         1
#> 17 9     2         1
#> 18 10    1         1
#> 19 10    2         1