P-value function: Independence Treatment Distance Test
pIndepDist.RdThese functions accept a data frame and perhaps test specific arguments (like whether or not the test will be asymptotic or simulation based). It produces a p-value.
Usage
pIndepDist(
dat,
fmla = YcontNorm ~ trtF | blockF,
simthresh = 20,
sims = 1000,
parallel = "yes",
ncpu = NULL,
distfn = fast_dists_and_trans_hybrid
)Arguments
- dat
An object inheriting from class data.frame
- fmla
A formula appropriate to the function. Here it should be something like outcome~treatment|block
- simthresh
is the size of the data below which we use direct permutations for p-values
- sims
Either NULL (meaning use an asymptotic reference dist) or a number (meaning sampling from the randomization distribution implied by the formula)
- parallel
is "no" then parallelization is not required, otherwise it is "multicore" or "snow" in the call to
coin::independence_test()(see help for coin::approximate()). Also, if parallel is not "no" andadaptive_dist_functionis TRUE, then an openmp version of the distance creation function is called usingncputhreads (orparallel::detectCores(logical=FALSE)cores).- ncpu
is number of cpus to be used for parallel operation.
- distfn
is a function that produces one or more vectors (a data frame or matrix) of the same number of rows as the dat
Details
For now, this function does an omnibus-style chi-square test using (1) the ratio of distances to controls to distances to treated observations within block; (2) the rank of distances to controls for each unit; and (3) the raw outcome.
Although the distances are calculated by block, our profiling suggests that
it is better to parallelize the distance creation distfn (done here in C++
in the fastfns.cpp file) rather than use the data.table approach of
setDTthreads(). So, here we assume that the threads for data.table are 1.
Examples
# \donttest{
# Example using distance-based independence test
data(example_dat, package = "manytestsr")
library(data.table)
# Test for treatment effect using distance-based approach
single_block <- as.data.table(subset(example_dat, blockF == "B080"))
p_val <- pIndepDist(single_block, Y1 ~ trtF | blockF, parallel = "no")
print(p_val)
#> [1] 0.5394499
# Test with different outcome variable
p_val2 <- pIndepDist(single_block, Y2 ~ trtF | blockF, parallel = "no")
print(p_val2)
#> [1] 0.7764711
# }