Fits one of two possible Bayesian Instrumental Regression for Disparity Estimation (BIRDiE) models to BISG probabilities and covariates. The simplest Multinomial-Dirichlet model (dir) is appropriate when there are no covariates or when all covariates are discrete and fully interacted with another. The more general Multinomial mixed-effects model (mmm) is a supports any number of fixed effects and up to one random intercept.

## Usage

birdie(
r_probs,
formula,
data = NULL,
model = c("auto", "dir", "mmm"),
prior = NULL,
prefix = "pr_",
se_boot = 0,
ctrl = birdie.ctrl()
)

## Arguments

r_probs

A data frame or matrix of BISG probabilities, with one row per individual. The output of bisg() can be used directly here.

formula

A two-sided formula object describing the model structure. The left-hand side is the outcome variable, which must be discrete. A single random intercept term, denoted with a vertical bar ("(1 | <term>)"), is supported on the right-hand side.

data

An optional data frame containing the variables named in formula.

model

A string specifying the type of model to fit: either "dir" for the Multinomial-Dirichlet model or "mmm" for the Multinomial mixed-effects model. The default, "auto", will select the most computationally efficient model available: "dir" if formula has no covariates or a fully-interacted structure, and "mmm" otherwise. More details on the model specifications can be found in the "Details" section below.

prior

A list with entries specifying the model prior. When model="dir" the only entry is alpha, which should be a matrix of Dirichlet hyperparameters. The matrix should have one row for every level of the outcome variable and one column for every racial group. The default prior is a matrix with all entries set to $$1+\epsilon$$. When model="mmm", the prior list should contain two scalar entries: scale_beta, the standard deviation on the Normal prior for the fixed effects, and scale_sigma, the prior mean of the standard deviation of the random intercepts.

prefix

If r_probs is a data frame, the columns containing racial probabilities will be selected as those with names starting with prefix. The default will work with the output of bisg().

se_boot

The number of bootstrap replicates to use to compute approximate standard errors for the main model estimates. Only available when model="dir". When there are fewer than 1,000 individuals or 100 or fewer replicates, a Bayesian bootstrap is used instead (i.e., weights are drawn from a $$\text{Dirichlet}(1, 1, ..., 1)$$ distribution, which produces more reliable estimates.

ctrl

A list containing control parameters for the EM algorithm and optimization routines. A list in the proper format can be made using birdie.ctrl().

## Value

An object of class birdie, for which many methods are available. The model estimates may be accessed with coef.birdie(), and updated BISG probabilities (conditioning on the outcome) may be accessed with fitted.birdie().

## Details

birdie() uses an expectation-maximization (EM) routine to find the maximum a posteriori (MAP) estimate for the specified model. Asymptotic variance-covariance matrices for the MAP estimate are available for the Multinomial-Dirichlet model via bootstrapping (se_boot).

The Multinomial-Dirichlet model is specified as follows: $$Y_i \mid R_i, X_i, \Theta \sim \text{Categorical}(\theta_{R_iX_i}) \\ \theta_{rx} \sim \text{Dirichlet}(\alpha_r),$$ where $$Y$$ is the outcome variable, $$R$$ is race, $$X$$ are covariates (fixed effects), and $$\theta_{rx}$$ and $$\alpha_r$$ are vectors with length matching the number of levels of the outcome variable. There is one vector $$\theta_{rx}$$ for every combination of race and covariates, hence the need for formula to either have no covariates or a fully interacted structure.

The Multinomial mixed-effects model is specified as follows: $$Y_i \mid R_i, X_i, \Theta \sim \text{Categorical}(g^{-1}(\mu_{R_iX_i})) \\ \mu_{rxy} = W\beta_{ry} + Zu_{ry} \\ u_{ry} \mid \sigma^2_{ry} \sim \mathcal{N}(0, \sigma^2_{ry}) \\ \beta_{ry} \sim \mathcal{N}(0, s_\beta) \\ \sigma_{ry} \sim \text{Gamma}(2, 2/s_\sigma),$$ where $$\beta_{ry}$$ are the fixed effects, $$u_{ry}$$ is the random intercept, and $$g$$ is a softmax link function.

More details on the models and their properties may be found in the paper referenced below.

## References

McCartan, C., Goldin, J., Ho, D.E., & Imai, K. (2022). Estimating Racial Disparities when Race is Not Observed. Available at https://arxiv.org/abs/2303.02580.

## Examples

data(pseudo_vf)

r_probs = bisg(~ nm(last_name) + zip(zip), data=pseudo_vf)

# Process zip codes to remove missing values
pseudo_vf$zip = proc_zip(pseudo_vf$zip)

birdie(r_probs, turnout ~ 1, data=pseudo_vf)
#> Using c(1+ε, 1+ε, ..., 1+ε) prior for Pr(X | R)
#> This message is displayed once per session.
#> Multinomial-Dirichlet BIRDiE model
#> Formula: turnout ~ 1
#>    Data: pseudo_vf
#> Number of obs: 5,000; groups: 1
#> Estimated distribution:
#>     white black  hisp asian  aian other
#> no  0.301 0.358 0.392 0.613 0.778 0.254
#> yes 0.699 0.642 0.608 0.387 0.222 0.746

birdie(r_probs, turnout ~ zip, data=pseudo_vf)
#> Multinomial-Dirichlet BIRDiE model
#> Formula: turnout ~ zip
#>    Data: pseudo_vf
#> Number of obs: 5,000; groups: 618
#> Estimated distribution:
#>     white black  hisp asian  aian other
#> no  0.286 0.358 0.376  0.55 0.644 0.534
#> yes 0.714 0.642 0.624  0.45 0.356 0.466

fit = birdie(r_probs, turnout ~ (1 | zip), data=pseudo_vf,
ctrl=birdie.ctrl(abstol=1e-3))
#> Using default prior for Pr(X | R):
#> → Prior scale on fixed effects coefficients: 1.0
#> → Prior mean of random effects standard deviation: 0.20
#> This message is displayed once per session.

summary(fit)
#> Multinomial mixed-effects BIRDiE model
#> Formula: turnout ~ (1 | zip)
#>    Data: pseudo_vf
#>
#> 6 iterations and 0.29 secs to convergence
#>
#> Number of observations: 5,000
#> Number of groups: 618
#>
#> Entropy decrease from marginal race distribution:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#> -0.4887  0.1460  0.3991  0.4117  0.7144  1.0568
#> Entropy decrease from BISG probabilities:
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#> -0.68767 -0.01394  0.03638  0.04472  0.09864  0.58286
#>
#> Estimated outcome-by-race distribution:
#>     white black  hisp asian  aian other
#> no  0.301 0.364 0.371 0.603 0.701 0.266
#> yes 0.699 0.636 0.629 0.397 0.299 0.734
coef(fit)
#>         white     black      hisp    asian      aian    other
#> no  0.3014152 0.3639849 0.3714893 0.603295 0.7005173 0.265966
#> yes 0.6985848 0.6360151 0.6285107 0.396705 0.2994827 0.734034
fitted(fit)
#> # A tibble: 5,000 × 6
#>    pr_white pr_black pr_hisp pr_asian  pr_aian pr_other
#>       <dbl>    <dbl>   <dbl>    <dbl>    <dbl>    <dbl>
#>  1   0.962   0.00332 0.0103  0.000372 0.00399    0.0202
#>  2   0.0156  0.962   0.00685 0.00100  0.000964   0.0132
#>  3   0.941   0.00571 0.0173  0.00804  0.000239   0.0278
#>  4   0.579   0.364   0.0201  0.000657 0.000510   0.0355
#>  5   0.967   0.00152 0.0129  0.00328  0.00292    0.0126
#>  6   0.560   0.295   0.0909  0.00270  0.000940   0.0500
#>  7   0.134   0.749   0.0637  0.00244  0.000730   0.0495
#>  8   0.965   0.00418 0.0311  0        0          0
#>  9   0.756   0.212   0.00696 0.000334 0.000321   0.0243
#> 10   0.886   0.0864  0.0103  0.000410 0.000550   0.0162
#> # … with 4,990 more rows