Calculates summary statistics from outputs of generate()
or
hypothesize()
.
Learn more in vignette("infer")
.
calculate( x, stat = c("mean", "median", "sum", "sd", "prop", "count", "diff in means", "diff in medians", "diff in props", "Chisq", "F", "slope", "correlation", "t", "z", "ratio of props", "odds ratio"), order = NULL, ... )
x | The output from |
---|---|
stat | A string giving the type of the statistic to calculate. Current
options include |
order | A string vector of specifying the order in which the levels of
the explanatory variable should be ordered for subtraction, where |
... | To pass options like |
A tibble containing a stat
column of calculated statistics.
In some cases, when bootstrapping with small samples, some generated bootstrap samples will have only one level of the explanatory variable present. For some test statistics, the calculated statistic in these cases will be NaN. The package will omit non-finite values from visualizations (with a warning) and raise an error in p-value calculations.
# calculate a null distribution of hours worked per week under # the null hypothesis that the mean is 40 gss %>% specify(response = hours) %>% hypothesize(null = "point", mu = 40) %>% generate(reps = 200, type = "bootstrap") %>% calculate(stat = "mean")#> # A tibble: 200 x 2 #> replicate stat #> <int> <dbl> #> 1 1 39.2 #> 2 2 39.4 #> 3 3 40.1 #> 4 4 39.6 #> 5 5 40.8 #> 6 6 39.9 #> 7 7 39.9 #> 8 8 40.8 #> 9 9 39.6 #> 10 10 41.0 #> # … with 190 more rows# calculate a null distribution assuming independence between age # of respondent and whether they have a college degree gss %>% specify(age ~ college) %>% hypothesize(null = "independence") %>% generate(reps = 200, type = "permute") %>% calculate("diff in means", order = c("degree", "no degree"))#> # A tibble: 200 x 2 #> replicate stat #> <int> <dbl> #> 1 1 -2.48 #> 2 2 -0.699 #> 3 3 -0.0113 #> 4 4 0.579 #> 5 5 0.553 #> 6 6 1.84 #> 7 7 -2.31 #> 8 8 -0.320 #> 9 9 -0.00250 #> 10 10 -1.78 #> # … with 190 more rows