This vignette provides more specific details of how
causal_tbl objects work and how to extend them. Most users
won’t need to know much about
causal_tbls except that
they’re (1) extensions of
tibbles and (2) they rely on a
causal_cols attribute that makes things “just work”. The
causal_cols are the columns for different causal variables
that play an important role. The package provides various getter and
setter functions for these.
This vignette covers: 1. How
internally. 2. How to extend the type if your model needs shiny new
Like in the README, here we use a simple difference-in-differences example: 8 observations for 2 units, across 4 years.
df <- data.frame( id = c("a", "a", "a", "a", "b", "b", "b", "b"), year = rep(2015:2018, 2), trt = c(0, 0, 0, 0, 0, 0, 1, 1), y = c(1, 3, 2, 3, 2, 4, 4, 5) )
Here, when we create the
causal_tbl, we can specify the
outcome and treatment directly via
did <- causal_tbl(df, .outcome = y, .treatment = trt)
All causal attributes can be recovered with
causal_cols(did) #> $outcomes #>  "y" #> #> $treatments #> y #> "trt"
Each of these elements is a character vector, with each element being a name of a column in the data frame. For some variables, this vector should be of length 1, but for other variables, there may be multiple columns of that type.
In our case, the
causal_cols() are the
treatment. The outcome has no
name, i.e., it’s just
"y". The treatments entry indicates
trt automatically corresponds to
the outcome related to this treatment. This is indicated by the
names() of the columns within a particular
causal_cols convey information on any associated
variable. For example, the treatment variable is by default associated
with a particular outcome. And a propensity score or outcome model is
associated with a particular treatment or outcome variable.
However, you are not limited to one treatment or one outcome. For
example, if a package author was developing methods for causal inference
with multiple continuous treatments, the treatment element of
causal_cols could have an entry for each
Once set, these column names within
automatically updated if columns are renamed, or set to
NULL if columns are dropped. This reassignment happens
automatically and silently in all cases.
Now, if you need something fancy, odds are should implement a new
causal_cols. As we saw before,
causal_cols attributes can be gotten via
causal_cols(). They can be set using
causal_cols() <- ....
Each new entry to
causal_cols should be a named list,
- the name of the list denotes, in short form, what the thing is
(i.e. if they’re propensity scores, the name should be
- each entry in the list denotes one of those things
- each name of each entry indicates what that entry corresponds to