P-value function: Independence Treatment Distance Test

These functions accept a data frame and perhaps test specific arguments (like whether or not the test will be asymptotic or simulation based). It produces a p-value.

Usage

pIndepDist(
  dat,
  fmla = YcontNorm ~ trtF | blockF,
  simthresh = 20,
  sims = 1000,
  parallel = "yes",
  ncpu = NULL,
  groups = NULL,
  distfn = dists_and_trans,
  adaptive_dist_function = TRUE
)

Arguments

dat: An object inheriting from class data.frame
fmla: A formula appropriate to the function. Here it should be something like outcome~treatment|block
simthresh: is the size of the data below which we use direct permutations for p-values
sims: Either NULL (meaning use an asymptotic reference dist) or a number (meaning sampling from the randomization distribution implied by the formula)
parallel: is "no" then parallelization is not required, otherwise it is "multicore" or "snow" in the call to coin::independence_test() (see help for coin::approximate()). Also, if parallel is not "no" and adaptive_dist_function is TRUE, then an openmp version of the distance creation function is called using ncpu threads (or parallel::detectCores(logical=FALSE) cores).
ncpu: is number of cpus to be used for parallel operation.
groups: is a vector defining the groups within which the inter-unit distances are calculated. Not used here.
distfn: is a function that produces one or more vectors (a data frame or matrix) of the same number of rows as the dat
adaptive_dist_function: is TRUE if the distance calculation function should be chosen using previous benchmarks. See the code.

Value

A p-value

Details

For now, this function does an omnibus-style chi-square test using (1) the ratio of distances to controls to distances to treated observations within block; (2) the rank of distances to controls for each unit; and (3) the raw outcome.

Although the distances are calculated by block, our profiling suggests that it is better to parallelize the distance creation distfn (done here in C++ in the fastfns.cpp file) rather than use the data.table approach of setDTthreads(). So, here we assume that the threads for data.table are 1.