Fits a penalized Riesz regression for ecological inference, allowing for
overall estimates of conditional means using ei_est()
.
Usage
ei_riesz(x, ...)
# S3 method for class 'formula'
ei_riesz(formula, data, total, weights, penalty, scale = TRUE, ...)
# S3 method for class 'ei_spec'
ei_riesz(x, weights, penalty, scale = TRUE, ...)
# S3 method for class 'data.frame'
ei_riesz(x, z, total, weights, penalty, scale = TRUE, ...)
# S3 method for class 'matrix'
ei_riesz(x, z, total, weights, penalty, scale = TRUE, ...)
# Default S3 method
ei_riesz(x, ...)
Arguments
- x
Depending on the context:
A data frame of predictors.
A matrix of predictors.
An ei_spec object containing the outcome, predictor, and covariates.
Predictors must be proportions that sum to 1 across rows. You can use
ei_proportions()
to assist in preparing predictor variables. Covariates in an ei_spec object are shifted to have mean zero. Ifscale=TRUE
(the default), they are also scaled to have unit variance.- ...
Not currently used, but required for extensibility.
- formula
A formula such as
y ~ x0 + x1 | z
specifying the outcomey
regressed on the predictors of interestx
and any covariatesz
. The predictors should form a partition, that is,x0 + x1 = 1
for each observation. Users can be include more than two predictors as well, e.g.pct_white + pct_black + pct_hisp + pct_other
. If there are just two predictors, it is acceptable to only include one in the formula; the other will be formed as 1 minus the provided predictor. Include additional covariates separated by a vertical bar|
. These covariates are strongly recommended for reliable ecological inference. Covariates are shifted to have mean zero. Ifscale=TRUE
(the default), they are also scaled to have unit variance.- data
When a formula is used,
data
is a data frame containing both the predictors and the outcome.- total
<
tidy-select
> A variable containing the total number of observations in each aggregate unit. For example, the column containing the total number of voters. Required by default.- weights
<
data-masking
> A vector of unit weights for estimation. These may be the same or different from the total number of observations in each aggregate unit (see thetotal
argument toei_spec()
). See the discussion below under 'Weights' for choosing this parameter. The default, uniform weights, makes a slightly stronger-than-necessary assumption about the relationship between the unit totals and the unknown data.- penalty
The ridge penalty (a non-negative scalar), which must be specified. Recommended value is the same penalty used in
ei_ridge()
, which is stored in thepenalty
entry of the fitted model object.- scale
If
TRUE
, scale covariatesz
to have unit variance.- z
When
x
is a data frame or matrix,w
are any covariates, specified as:A data frame with numeric columns.
A matrix
These are shifted to have mean zero. If
scale=TRUE
(the default), they are also scaled to have unit variance.
Weights
The weakest identification result for ecological inference makes no assumption about the number of observations per aggregate unit (the totals). It requires, however, weighting the estimation steps according to the totals. This may reduce efficiency when the totals are variable and a slightly stronger condition holds.
Specifically, if the totals are conditionally mean-independent of the missing data (the aggregation-unit level means of the outcome within each predictor level), given covariates, then it is appropriate to use uniform weights in estimation, or any fixed set of weights.
In general, estimation efficiency is improved when units with larger variance
in the outcome receive less weight. Various bulit-in options are provided by
the helper functions in ei_wgt()
.
Examples
data(elec_1968)
# Recommended: get ridge penalty from ei_ridge()
spec = ei_spec(elec_1968, vap_white:vap_other, pres_dem_hum:pres_abs,
total = pres_total, covariates = c(pop_urban, farm))
m = ei_ridge(spec)
ei_riesz(spec, penalty = m$penalty)
#> A Riesz representer with 3 groups and 1143 observations
#> Fit with penalty = 1.01453e-08
rr = ei_riesz(~ vap_white + vap_black + vap_other | pop_urban + farm,
data = elec_1968, total = pres_total, penalty = m$penalty)
summary(rr)
#> Second moment of representer:
#> vap_white vap_black vap_other
#> 4.461456 48.486776 31468.779992
#>
#> Second moment of representer (leave-one-out):
#> vap_white vap_black vap_other
#> 4.98174 51.34839 42521.67381
# Examine the weights and check they have mean 1
head(weights(rr, group = "vap_black"))
#> [1] 3.3584139 -0.9872354 8.7084517 4.6428913 -4.6285576 6.0225661
colMeans(weights(rr))
#> vap_white vap_black vap_other
#> 1.0000000 1.0000000 0.9999999
mean(elec_1968$pres_ind_wal * weights(rr, "vap_white"))
#> [1] 0.3633165