Fits one of two possible Bayesian Instrumental Regression for Disparity
Estimation (BIRDiE) models to BISG probabilities and covariates. The simplest
Multinomial-Dirichlet model (dir
) is appropriate when there are no covariates or when
all covariates are discrete and fully interacted with another. The more
general Multinomial mixed-effects model (mmm
) is a supports any number of
fixed effects and up to one random intercept.
Usage
birdie(
r_probs,
formula,
data = NULL,
model = c("auto", "dir", "mmm"),
prior = NULL,
prefix = "pr_",
se_boot = 0,
ctrl = birdie.ctrl()
)
Arguments
- r_probs
A data frame or matrix of BISG probabilities, with one row per individual. The output of
bisg()
can be used directly here.- formula
A two-sided formula object describing the model structure. The left-hand side is the outcome variable, which must be discrete. A single random intercept term, denoted with a vertical bar (
"(1 | <term>)"
), is supported on the right-hand side.- data
An optional data frame containing the variables named in
formula
.- model
A string specifying the type of model to fit: either
"dir"
for the Multinomial-Dirichlet model or"mmm"
for the Multinomial mixed-effects model. The default,"auto"
, will select the most computationally efficient model available:"dir"
ifformula
has no covariates or a fully-interacted structure, and"mmm"
otherwise. More details on the model specifications can be found in the "Details" section below.- prior
A list with entries specifying the model prior. When
model="dir"
the only entry isalpha
, which should be a matrix of Dirichlet hyperparameters. The matrix should have one row for every level of the outcome variable and one column for every racial group. The default prior is a matrix with all entries set to \(1+\epsilon\). Whenmodel="mmm"
, theprior
list should contain two scalar entries:scale_beta
, the standard deviation on the Normal prior for the fixed effects, andscale_sigma
, the prior mean of the standard deviation of the random intercepts.- prefix
If
r_probs
is a data frame, the columns containing racial probabilities will be selected as those with names starting withprefix
. The default will work with the output ofbisg()
.- se_boot
The number of bootstrap replicates to use to compute approximate standard errors for the main model estimates. Only available when
model="dir"
. When there are fewer than 1,000 individuals or 100 or fewer replicates, a Bayesian bootstrap is used instead (i.e., weights are drawn from a \(\text{Dirichlet}(1, 1, ..., 1)\) distribution, which produces more reliable estimates.- ctrl
A list containing control parameters for the EM algorithm and optimization routines. A list in the proper format can be made using
birdie.ctrl()
.
Value
An object of class birdie
, for which many
methods are available. The model estimates may be accessed with
coef.birdie()
, and updated BISG probabilities (conditioning on the
outcome) may be accessed with fitted.birdie()
.
Details
birdie()
uses an expectation-maximization (EM) routine to find the maximum
a posteriori (MAP) estimate for the specified model. Asymptotic
variance-covariance matrices for the MAP estimate are available for the
Multinomial-Dirichlet model via bootstrapping (se_boot
).
The Multinomial-Dirichlet model is specified as follows: $$
Y_i \mid R_i, X_i, \Theta \sim \text{Categorical}(\theta_{R_iX_i}) \\
\theta_{rx} \sim \text{Dirichlet}(\alpha_r),
$$ where \(Y\) is the outcome variable, \(R\) is race, \(X\) are
covariates (fixed effects), and \(\theta_{rx}\) and \(\alpha_r\) are
vectors with length matching the number of levels of the outcome variable.
There is one vector \(\theta_{rx}\) for every combination of race and
covariates, hence the need for formula
to either have no covariates or a
fully interacted structure.
The Multinomial mixed-effects model is specified as follows: $$ Y_i \mid R_i, X_i, \Theta \sim \text{Categorical}(g^{-1}(\mu_{R_iX_i})) \\ \mu_{rxy} = W\beta_{ry} + Zu_{ry} \\ u_{ry} \mid \sigma^2_{ry} \sim \mathcal{N}(0, \sigma^2_{ry}) \\ \beta_{ry} \sim \mathcal{N}(0, s_\beta) \\ \sigma_{ry} \sim \text{Gamma}(2, 2/s_\sigma), $$ where \(\beta_{ry}\) are the fixed effects, \(u_{ry}\) is the random intercept, and \(g\) is a softmax link function.
More details on the models and their properties may be found in the paper referenced below.
References
McCartan, C., Goldin, J., Ho, D.E., & Imai, K. (2022). Estimating Racial Disparities when Race is Not Observed. Available at https://arxiv.org/abs/2303.02580.
Examples
data(pseudo_vf)
r_probs = bisg(~ nm(last_name) + zip(zip), data=pseudo_vf)
# Process zip codes to remove missing values
pseudo_vf$zip = proc_zip(pseudo_vf$zip)
birdie(r_probs, turnout ~ 1, data=pseudo_vf)
#> Using c(1+ε, 1+ε, ..., 1+ε) prior for Pr(X | R)
#> This message is displayed once per session.
#> Multinomial-Dirichlet BIRDiE model
#> Formula: turnout ~ 1
#> Data: pseudo_vf
#> Number of obs: 5,000; groups: 1
#> Estimated distribution:
#> white black hisp asian aian other
#> no 0.301 0.358 0.392 0.613 0.778 0.254
#> yes 0.699 0.642 0.608 0.387 0.222 0.746
birdie(r_probs, turnout ~ zip, data=pseudo_vf)
#> Multinomial-Dirichlet BIRDiE model
#> Formula: turnout ~ zip
#> Data: pseudo_vf
#> Number of obs: 5,000; groups: 618
#> Estimated distribution:
#> white black hisp asian aian other
#> no 0.286 0.358 0.376 0.55 0.644 0.534
#> yes 0.714 0.642 0.624 0.45 0.356 0.466
fit = birdie(r_probs, turnout ~ (1 | zip), data=pseudo_vf,
ctrl=birdie.ctrl(abstol=1e-3))
#> Using default prior for Pr(X | R):
#> → Prior scale on fixed effects coefficients: 1.0
#> → Prior mean of random effects standard deviation: 0.20
#> This message is displayed once per session.
summary(fit)
#> Multinomial mixed-effects BIRDiE model
#> Formula: turnout ~ (1 | zip)
#> Data: pseudo_vf
#>
#> 6 iterations and 0.29 secs to convergence
#>
#> Number of observations: 5,000
#> Number of groups: 618
#>
#> Entropy decrease from marginal race distribution:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.4887 0.1460 0.3991 0.4117 0.7144 1.0568
#> Entropy decrease from BISG probabilities:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.68767 -0.01394 0.03638 0.04472 0.09864 0.58286
#>
#> Estimated outcome-by-race distribution:
#> white black hisp asian aian other
#> no 0.301 0.364 0.371 0.603 0.701 0.266
#> yes 0.699 0.636 0.629 0.397 0.299 0.734
coef(fit)
#> white black hisp asian aian other
#> no 0.3014152 0.3639849 0.3714893 0.603295 0.7005173 0.265966
#> yes 0.6985848 0.6360151 0.6285107 0.396705 0.2994827 0.734034
fitted(fit)
#> # A tibble: 5,000 × 6
#> pr_white pr_black pr_hisp pr_asian pr_aian pr_other
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.962 0.00332 0.0103 0.000372 0.00399 0.0202
#> 2 0.0156 0.962 0.00685 0.00100 0.000964 0.0132
#> 3 0.941 0.00571 0.0173 0.00804 0.000239 0.0278
#> 4 0.579 0.364 0.0201 0.000657 0.000510 0.0355
#> 5 0.967 0.00152 0.0129 0.00328 0.00292 0.0126
#> 6 0.560 0.295 0.0909 0.00270 0.000940 0.0500
#> 7 0.134 0.749 0.0637 0.00244 0.000730 0.0495
#> 8 0.965 0.00418 0.0311 0 0 0
#> 9 0.756 0.212 0.00696 0.000334 0.000321 0.0243
#> 10 0.886 0.0864 0.0103 0.000410 0.000550 0.0162
#> # … with 4,990 more rows