Tidy Causal Data Frames and Tools • causaltbl

This package provides a causal_tbl class for causal inference. A causal_tbl is a subclass of tibble which keeps track of information on the roles of variables like treatment and outcome, and provides functionality to store models and their fitted values as columns in a data frame.

Installation

You can install the development version of causaltbl from GitHub with:

# install.packages("remotes")
remotes::install_github("CoryMcCartan/causaltbl")

Using `causaltbl`

A causal tibble, causal_tbl, is a data frame with attributes identifying which columns correspond to common inputs in causal inference analyses. At the most basic level, you can indicate the outcome and treatment columns. For more involved analyses, causal_tbls can keep track of additional columns including multiple outcomes and multiple treatments.

The primary entryway to causaltbl is through . You can create a causal_tbl directly via causal_tbl().

Suppose we have data from a really simple differences in differences design. Our data looks like this:

df <- data.frame(
  id = c("a", "a", "a", "a", "b", "b", "b", "b"),
  year = rep(2015:2018, 2),
  trt = c(0, 0, 0, 0, 0, 0, 1, 1),
  y = c(1, 3, 2, 3, 2, 4, 4, 5)
)

There are two units (id), a and b. We have 4 yearly observations from 2015 to 2018 (year) for each unit. a is never treated and b is treated in 2017 and 2018 (trt). Some outcome (y) is measured yearly.

We first can make a causal_tbl by passing df to causal_tbl(). We don’t need to specify any options.

library(causaltbl)
did <- causal_tbl(df)

Now did is a causal_tbl version of df.

did
#> # A <causal_tbl> [8 × 4]
#>                          
#>   id     year   trt     y
#>   <chr> <int> <dbl> <dbl>
#> 1 a      2015     0     1
#> 2 a      2016     0     3
#> 3 a      2017     0     2
#> 4 a      2018     0     3
#> 5 b      2015     0     2
#> 6 b      2016     0     4
#> 7 b      2017     1     4
#> 8 b      2018     1     5

To set outcome , we can use the corresponding functions set_outcome(). causal_tbl uses tidy evaluation, so we can use the bare column name.

did <- did |>
    set_outcome(outcome = y)
did
#> # A <causal_tbl> [8 × 4]
#>                     [out]
#>   id     year   trt     y
#>   <chr> <int> <dbl> <dbl>
#> 1 a      2015     0     1
#> 2 a      2016     0     3
#> 3 a      2017     0     2
#> 4 a      2018     0     3
#> 5 b      2015     0     2
#> 6 b      2016     0     4
#> 7 b      2017     1     4
#> 8 b      2018     1     5

Similarly, we can indicate that did has a treatment column trt or panel structure for each id-year with the corresponding set_treatment() and set_panel() functions.

did <- did |>
    set_treatment(treatment = trt) |>
    set_panel(unit = id, time = year)
did
#> # A <causal_tbl> [8 × 4]
#>   [unit] [time] [trt] [out]
#>   id       year   trt     y
#>   <chr>   <int> <dbl> <dbl>
#> 1 a        2015     0     1
#> 2 a        2016     0     3
#> 3 a        2017     0     2
#> 4 a        2018     0     3
#> 5 b        2015     0     2
#> 6 b        2016     0     4
#> 7 b        2017     1     4
#> 8 b        2018     1     5

This sets attributes that are used down-the-line by other packages. We can retrieve them by calling their getters. For the outcome, get_outcome():

get_outcome(did)
#> [1] "y"

For the treatment, get_treatment():

get_treatment(did)
#>     y 
#> "trt"

And for the panel structure, get_panel():

get_panel(did)
#> $unit
#> [1] "id"
#> 
#> $time
#> [1] "year"

For more information on using causal_tbls or designing functions that use causal_tbls, see the Advanced causal_tbl vignette.

causaltbl

Installation

Using causaltbl

Using `causaltbl`