Fits a penalized regression model for ecological inference, allowing for
overall and unit-level estimates of conditional means using ei_est()
.
Usage
ei_ridge(x, ...)
# S3 method for class 'formula'
ei_ridge(
formula,
data,
weights,
bounds = FALSE,
penalty = NULL,
scale = TRUE,
vcov = TRUE,
...
)
# S3 method for class 'ei_spec'
ei_ridge(
x,
weights,
bounds = FALSE,
penalty = NULL,
scale = TRUE,
vcov = TRUE,
...
)
# S3 method for class 'data.frame'
ei_ridge(
x,
y,
z,
weights,
bounds = FALSE,
penalty = NULL,
scale = TRUE,
vcov = TRUE,
...
)
# S3 method for class 'matrix'
ei_ridge(
x,
y,
z,
weights,
bounds = FALSE,
penalty = NULL,
scale = TRUE,
vcov = TRUE,
...
)
# Default S3 method
ei_ridge(x, ...)
Arguments
- x
Depending on the context:
A data frame of predictors.
A matrix of predictors.
An ei_spec object containing the outcome, predictor, and covariates.
Predictors must be proportions that sum to 1 across rows. You can use
ei_proportions()
to assist in preparing predictor variables. Covariates in an ei_spec object are shifted to have mean zero. Ifscale=TRUE
(the default), they are also scaled to have unit variance.- ...
Not currently used, but required for extensibility.
- formula
A formula such as
y ~ x0 + x1 | z
specifying the outcomey
regressed on the predictors of interestx
and any covariatesz
. The predictors should form a partition, that is,x0 + x1 = 1
for each observation. Users can be include more than two predictors as well, e.g.pct_white + pct_black + pct_hisp + pct_other
. If there are just two predictors, it is acceptable to only include one in the formula; the other will be formed as 1 minus the provided predictor. Include additional covariates separated by a vertical bar|
. These covariates are strongly recommended for reliable ecological inference. Covariates are shifted to have mean zero. Ifscale=TRUE
(the default), they are also scaled to have unit variance.- data
When a formula is used,
data
is a data frame containing both the predictors and the outcome.- weights
<
data-masking
> A vector of unit weights for estimation. These may be the same or different from the total number of observations in each aggregate unit (see thetotal
argument toei_spec()
). See the discussion below under 'Weights' for choosing this parameter. The default, uniform weights, makes a slightly stronger-than-necessary assumption about the relationship between the unit totals and the unknown data.- bounds
A vector
c(min, max)
of bounds for the outcome. Ifbounds = NULL
, they will be inferred from the outcome variable: if it is contained within \([0, 1]\), for instance, then the bounds will bec(0, 1)
. The defaultbounds = FALSE
uses an unbounded outcome.- penalty
The ridge penalty (a non-negative scalar). Set to
NULL
to automatically determine the penalty which minimizes mean-square error, via an efficient leave-one-out cross validation procedure. The ridge regression solution is $$\hat\beta = (X^\top X + \lambda I)^{-1}X^\top y,$$ where \(\lambda\) is the value ofpenalty
. One can equivalently think of the penalty as imposing a \(\mathcal{N}(0, \sigma^2/\lambda^2)\) prior on the \(\beta\). Keep in mind when choosingpenalty
manually that covariates inz
are scaled to have mean zero and unit variance before fitting.- scale
If
TRUE
, scale covariatesz
to have unit variance.- vcov
If
TRUE
, calculate and return the covariance matrix of the estimated coefficients. Ignored whenbounds
are provided.- y
When
x
is a data frame or matrix,y
is the outcome specified as:A data frame with numeric columns.
A matrix
A numeric vector.
When the outcome is a proportion, you can use
ei_proportions()
to assist in preparing it.- z
When
x
is a data frame or matrix,w
are any covariates, specified as:A data frame with numeric columns.
A matrix
These are shifted to have mean zero. If
scale=TRUE
(the default), they are also scaled to have unit variance.
Value
An ei_ridge
object, which supports various ridge-methods.
Details
The regression is calculated using the singular value decomposition, which
allows for efficient recalculation under different penalty
values as part
of leave-one-out cross-validation.
When bounds
are provided, the regression is calculated via quadratic
programming, as there is no closed-form solution. The unbounded regression
is run to select the penalty
automatically in this case, if it is not
provided. Estimation is still efficient, though somewhat slower than in the
unbounded case. The covariance matrix of the estimates is not available when
bounds are applied.
Weights
The weakest identification result for ecological inference makes no assumption about the number of observations per aggregate unit (the totals). It requires, however, weighting the estimation steps according to the totals. This may reduce efficiency when the totals are variable and a slightly stronger condition holds.
Specifically, if the totals are conditionally mean-independent of the missing data (the aggregation-unit level means of the outcome within each predictor level), given covariates, then it is appropriate to use uniform weights in estimation, or any fixed set of weights.
In general, estimation efficiency is improved when units with larger variance
in the outcome receive less weight. Various bulit-in options are provided by
the helper functions in ei_wgt()
.
Examples
data(elec_1968)
spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs,
total = pres_total, covariates = c(pop_urban, farm))
ei_ridge(spec)
#> An ecological inference model with 4 outcomes, 3 groups, and 1143 observations
#> Fit with penalty = 1.01453e-08
ei_ridge(pres_dem_hum + pres_rep_nix + pres_ind_wal + pres_abs ~
vap_white + vap_black + vap_other | pop_urban + farm, data = elec_1968)
#> An ecological inference model with 4 outcomes, 3 groups, and 1143 observations
#> Fit with penalty = 1.01453e-08
# bounds inferred
all.equal(
fitted(ei_ridge(spec, bounds = NULL)),
fitted(ei_ridge(spec, bounds = 0:1))
)
#> [1] TRUE
# bounds enforced
min(fitted(ei_ridge(spec)))
#> [1] -0.1029559
min(fitted(ei_ridge(spec, bounds = 0:1)))
#> [1] -3.377815e-19