Overview

This document explains how our team, the Office of Evaluation Sciences in the General Services Administration (the OES), tends to do statistical analysis of randomized experiments. It also explains why we do what we do. 1 The research integrity processes OES follows are already documented on our Evaluation Resources Web Page. For example, on that page we provide templates for our research design and analysis pre-registration process. Here, we instead get into the nitty gritty of our statistical work.

Purposes of this document

First, this document educates new team members about the decisions past team members have made regarding research design and analysis. It also serves as a place to record decisions for our own future selves. That is, current and past team members have made decisions about how to approach statistical analyses that may differ from those that are common in any given academic discipline. This document helps explain why we have landed on those decisions (for now), and also illustrates how to implement them.

Second, this document records decisions that we have made in the absence of pre-analysis plans, or in the context of circumstances unforeseen by our pre-analysis planning. Projects will sometimes encounter good reasons to make different decisions than those we describe here. But the SOP represents our methodological thinking in “all else equal” situations.

Third, on a related note, this document should help us write better analysis plans and speed our practice of re-analysis. (Our team insists on a blind re-analysis of every study as a quality control for our results before they are reported to our agency partners.)

Fourth, and finally, this document will hopefully help other teams working to learn about the causal impacts of policy interventions.

Nature and limitations of this document

We (mostly) focus on randomized field experiments.

This document focuses on design and analysis of randomized field experiments. Although we may include some discussion of non-randomized studies, often known as observational studies, until now, our team has focused primarily on randomized field experiments. We plan to include more discussion of observational studies as we pursue more in the future.

We (mostly) present examples using R

We use the R statistical analysis language in this document because it is (a) one of the two industry standards in the field of data science (along with Python), (b) free, open source, and multiplatform, and (c) a locus of development for many of the latest statistical techniques for social and behavioral scientists.

Of course, members of our team also use other software like Stata, SAS, SPSS, and Python. To help provide better guidance for Stata users in particular, almost all of the R code in this SOP is accompanied by code showing how the same task, or something similar, could be accomplished in Stata. But reported results and figures are generated based only on the R code.

Structure

This page provides a basic introduction to what the OES SOP hopes to accomplish. The rest of the document can be thought of as consisting of two parts:

  • For policymakers and agency partners - Chapters 1-3 provide a high level overview of how statistical tests can inform policy learning, our priorities when designing tests, and how we prefer to justify the tests we use. This provides more context for how OES make analysis decisions and what we aim to learn from impact evaluations. Chapter 3 goes into more technical detail than Chapters 1 and 2, overlapping with the next part of this document.

  • For OES team members - Chapters 3 and onward are intended to serve as a reference for design and analysis decisions that need to be made at different stages of our project process. Chapter 3 reviews a randomization based framework for statistical decision-making that motivates many of the recommendations in later chapters. Chapter 4 provides guidance for decision-making about how to randomly assignment treatment or ensure that randomization occurred as planned. Chapter 5 provides guidance about how to analyze data after we collect it. Finally, Chapter 6 provides guidance on performing ex ante power simulations before making design choices.

Help us improve our work!

Since we hope to improve our analytic workflow with every project, this document should be seen as provisional — as a record and a guide for our continuous learning and improvement. We invite comments in the form of submissions to our Google Form (only for OES team members), or as Issues or pull requests on the SOP’s Github.

The corresponding author and current maintainer is Bill Schultz. Feel free to just reach out to him with questions: . Since taking responsibility for this project Bill has updated the back-end code, helped to rewrite the text, and revised code examples throughout. This includes drafting new sections of this document on balance testing and multiple testing corrections, and also coming up with all of the parallel Stata examples.

Additionally, special thanks are owed to:

  • Jake Bowers, Ryan Moore, Lula Chen, Paul Testa, and Nate Higgins for drafting the first edition of this SOP

  • Miles Williams for helping to update the backend code that allows this document to work

  • Many other OES team members for their thoughts and contributions to this SOP over time, including Oliver McClellan and Tyler Simko.

Technical details

This book was written in bookdown. The complete source is available from GitHub. This version of the book was built with R version 4.5.0 (2025-04-11) and the following packages.

package version source
blockTools 0.6.6 CRAN (R 4.5.0)
bookdown 0.43 CRAN (R 4.5.0)
coin 1.4-3 RSPM
DeclareDesign 1.0.10 CRAN (R 4.5.0)
devtools 2.4.5 CRAN (R 4.5.0)
estimatr 1.0.6 CRAN (R 4.5.0)
fabricatr 1.0.2 CRAN (R 4.5.0)
foreach 1.5.2 CRAN (R 4.5.0)
future 1.58.0 CRAN (R 4.5.0)
future.apply 1.20.0 CRAN (R 4.5.0)
here 1.0.1 CRAN (R 4.5.0)
ICC 2.4.0 RSPM (R 4.5.0)
kableExtra 1.4.0 CRAN (R 4.5.0)
katex 1.5.0 RSPM
klippy 0.0.0.9500 Github (rlesur/klippy@378c247fbbc76ec662f6c1ed1103121b87091be4)
knitr 1.50 CRAN (R 4.5.0)
lmtest 0.9-40 RSPM
multcomp 1.4-28 RSPM
nbpMatching 1.5.6 RSPM
quickblock 0.2.2 RSPM
randomizr 1.0.0 CRAN (R 4.5.0)
remotes 2.5.0 RSPM
ri2 0.4.0 CRAN (R 4.5.0)
sandwich 3.1-1 RSPM
tidyverse 2.0.0 CRAN (R 4.5.0)
V8 6.0.4 RSPM
withr 3.0.2 CRAN (R 4.5.0)

  1. We call this document a standard operating procedure (SOP) because we are inspired by the Green, Lin and Coppock SOP.↩︎