Build a causal data frame — causal

A causal_tbl is a tibble with additional attribute information stored in causal_cols. See the 'Internal structure' for more on the structure of this attribute.

Usage

causal_tbl(..., .outcome = NULL, .treatment = NULL)

new_causal_tbl(..., .outcome = NULL, .treatment = NULL)

as_causal_tbl(x)

is_causal_tbl(x)

Arguments

...: passed on to vctrs::df_list() (for causal_tbl only) then vctrs::new_data_frame() (both constructors).
.outcome: the column containing the outcome variable (tidy-selected). Can be set later with set_outcome().
.treatment: the column containing the treatment variable (tidy-selected). Can be set later with set_treatment().
x: A data frame to be checked or coerced

Value

A causal_tbl object

Details

At its core, a causal_tbl is just a tibble, and it should behave like a tibble in every meaningful way. What sets a causal_tbl apart is that it keeps track of causal columns: variables or objects which play a particular causal role. These can be accessed with the various getter and setter functions included in this package, like get_outcome() and set_outcome().

Functions

new_causal_tbl(): Construct a causal_tbl with no checks
as_causal_tbl(): Coerce a data frame to a causal_tbl
is_causal_tbl(): Return TRUE if a data frame is a causal_tbl

Internal structure

The causal_cols attribute is considered mostly internal, and end users do not have to worry about its internal structure. However, for those developing packages based off of causal_tbl, it is useful to understand the underlying structure of causal_cols.

The causal_cols attribute is a named list, with each element corresponding to a type of causal variable or object: outcomes, treatments, panel_unit, but also potentially pscore, matches, model, etc. Each of these elements is a character vector, with each element being a name of a column in the data frame. For some variables, this vector should be of length 1, but for other variables, there may be multiple columns of that type. So, for example, if a package author was developing methods for causal inference with multiple continuous treatments, the treatment element of causal_cols could have an entry for each treatment column.

The optional names() of the columns within a particular element of causal_cols convey information on any associated variable. For example, the treatment variable is by default associated with a particular outcome. And a propensity score or outcome model is associated with a particular treatment or outcome variable.

The column names stored within any part of causal_cols will be automatically updated if columns are renamed, or set to NULL if columns are dropped. This reassignment happens automatically and silently in all cases. It is the responsibility of implementers of particular methods to check that a causal_tbl has the necessary columns set via helpers like has_treatment(), has_outcome(), etc.

Examples

data <- causal_tbl(
  milk_first = c(0, 1, 0, 1, 1, 0, 0, 1),
  guess = c(0, 1, 0, 1, 1, 0, 0, 1)
)
is_causal_tbl(data)
#> [1] TRUE
print(data)
#> # A <causal_tbl> [8 × 2]
#>                   
#>   milk_first guess
#>        <dbl> <dbl>
#> 1          0     0
#> 2          1     1
#> 3          0     0
#> 4          1     1
#> 5          1     1
#> 6          0     0
#> 7          0     0
#> 8          1     1