Apply either count_na()
or dplyr::n_distinct()
to every column of a data
frame and return the count and share of total values (either proportion
missing or proportion distinct).
var_missing(df)
var_distinct(df)
Invisibly, a table of statistics by column of a data frame.
var_missing(dplyr::storms)
#> # A tibble: 13 × 4
#> var class n p
#> <chr> <chr> <int> <dbl>
#> 1 name <chr> 0 0
#> 2 year <dbl> 0 0
#> 3 month <dbl> 0 0
#> 4 day <int> 0 0
#> 5 hour <dbl> 0 0
#> 6 lat <dbl> 0 0
#> 7 long <dbl> 0 0
#> 8 status <fct> 0 0
#> 9 category <dbl> 14734 0.754
#> 10 wind <int> 0 0
#> 11 pressure <int> 0 0
#> 12 tropicalstorm_force_diameter <int> 9512 0.487
#> 13 hurricane_force_diameter <int> 9512 0.487
var_distinct(dplyr::storms)
#> # A tibble: 13 × 4
#> var class n p
#> <chr> <chr> <int> <dbl>
#> 1 name <chr> 260 0.0133
#> 2 year <dbl> 48 0.00246
#> 3 month <dbl> 10 0.000512
#> 4 day <int> 31 0.00159
#> 5 hour <dbl> 24 0.00123
#> 6 lat <dbl> 550 0.0282
#> 7 long <dbl> 1022 0.0523
#> 8 status <fct> 9 0.000461
#> 9 category <dbl> 6 0.000307
#> 10 wind <int> 32 0.00164
#> 11 pressure <int> 129 0.00660
#> 12 tropicalstorm_force_diameter <int> 142 0.00727
#> 13 hurricane_force_diameter <int> 43 0.00220