## How the odds have changed

The model is continuously updated with new polling data and approval ratings. As new information comes in, the odds may shift towards one party or another. Additionally, as we get closer to the election, there is less uncertainty in the result, since there is less time for a large shift in public opinion. This also changes the odds over time.

The chart below plots the changing odds of each party winning control. The estimates are plotted on a log-odds scale, instead of a traditional probability scale ranging from 0 to 100%. This is to better reflect changes in probability—a shift from a 90% to 95% chance is much more consequential than a change from 50 to 55%.

## Distribution of outcomes

The model doesn't produce a single estimate of the number of seats each party will win. Rather, it estimates a probability for each possible arrangement of seats. The model's overall guess is the median of the distribution of seats—where it is just as likely that the Democrats will win more seats as fewer seats.

The table below summarizes the relative chances of each possible distribution of seats. The current distribution and the median outcome are highlighted. 218 seats are needed to control the House.

Dem. Seats |
Rep. Seats |
Majority | Dem. Gain |
Cuml. Likelihood* |
---|

## National Polling

The basis of the model is the generic congressional ballot, which asks survey respondents which parties' candidate they plan on supporting in their local House race. The model estimates the true support for each party over time, and forecasts this estimate forwards to Election Day. The chart below plots these estimates, along with an accompanying error bar indicating the uncertainty in the estimates. Notice how the uncertainty increases dramatically from today towards Election Day.

## Model

The model operates in two broad stages: first, it estimates national voter intent on a weekly basis and forecasts this estimate forwards to Election Day; second, it uses its voter intent estimate to predict the number of seats won by each party, using past elections as a guide. The models for each of these stages are described in more detail below.

Each model is fully Bayesian and is estimated using Hamiltonian Monte Carlo (through the statistical package STAN). Weakly informative priors are used on all parameters. The voter intent model is re-estimated with each new set of polling data. The final results model, in contrast, is fit once on past election data, and the posterior samples are used to provide predictions for the 2018 elections.

### Voter Intent Model

Voter intent is estimated from generic ballot polls conducted by various polling firms. These ask survey respondents which parties' candidate they plan on supporting in their local House race. By aggregating poll results we can arrive at an estimate of national voter intent.

Voter intent is estimated on a weekly basis, from the first week of
polling through Election Day. It is measured on a log-odds scale
and is denoted $\mu_t$, where $t$ is the week in question. Intent
is assumed to evolve as an AR(1) process, or mean-reverting random
walk: $$
\mu_t \sim \mathrm{t}_{\nu}(\rho\cdot\mu_{t-1}, \sigma^2_w),
$$
where $\rho$ is a parameter representing the strength of the mean
reversion (restricted to be between 0 and 1), $\sigma^2_w$ is the
variance of the random walk, and $\nu$ is the degrees of freedom of
the *t*-distribution. The heavier tails of the
*t*-distribution allow more extreme shifts in public opinion
to occur, which may happen after a significant political
development. While a more accurate generative model might be a
finite mixture model in which public opinion is static most of the
time, but able to change dramatically after a major shock, the
added complexity of such a model would be unlikely to provide a
commensurate increase in accuracy. The priors for the parameters
are:
$$ \begin{align}
\mu_0 &\sim \mathrm{t}_{\nu}(0, 10^2) \\
\nu &\sim \mathrm{Gamma}(2, 0.1) \\
\sigma_w &\sim \mathrm{Cauchy}^+(0, 10) \\
\rho &\sim \mathrm{Beta}(2, 1)
\end{align}$$

Polling results are derived from national voter intent. For each poll $i$ we record the sample size, $n_i$, the week it was conducted, $t_i$, the polling firm which conducted it, $f_i$, the number of respondents picking either major party (as opposed to “undecided”), $n^s_i$, and the number picking the Democratic party, $n^d_i$. Both $n^s_i$ and $n^d_i$ are drawn from a binomial distribution, $$ \begin{align} n^s_i &\sim \mathrm{Binom}(n_i, \pi_u) \\ n^d_i &\sim \mathrm{Binom}(n^s_i, \pi_i) \end{align}$$ where $\pi_u$ is the proportion of undecided voters nationally, and $\pi_i$ is the level of support for the Democrats in the given poll. This differs from $\mu$, since polling firms sample differently and have different methodologies. $\pi_i$ is modelled on a log-odds scale as a linear function of national voter intent, polling firm bias, and poll-specific (sampling) error: $$ \mathrm{logit}(\pi_i) = \mu_{t_i} + \alpha_{f_i} + \epsilon_i, $$ where $\alpha_{f_i}$ is the bias of the polling firm $f_i$ and $\epsilon_i$ is the poll-specific error. Both $\alpha_{f_i}$ and $\epsilon_i$ are assumed to be drawn from larger distributions of all possible firm and poll errors: $$ \begin{align} \alpha \sim \mathrm{t}_2(\mu_f, \sigma^2_f) \\ \epsilon \sim \mathrm{t}_2(0, \sigma^2_\epsilon) \end{align}$$ The priors for the hyperparameters are $$ \begin{align} \mu_f &\sim \mathcal{N}(0, 0.02) \\ \sigma_f &\sim \mathrm{t}^+_4(0, 1) \\ \sigma_\epsilon &\sim \mathrm{t}^+_4(0, 1) \end{align}$$ Polling firm errors are not fixed to have a mean of zero since in a given election year, there will be an aggregate polling error by all firms. This is generally within two percentage points, thus the prior or $\mu_f$.

$$ \begin{align} \end{align}$$### Final Results Model

The final results model is essentially a linear model regressing the change in Democratic seats on national voter intent and several other structural and economic covariates. Importantly, measurement error in voter intent is explicitly modelled.

The model is fit using data from the 1974, 1982, 1994, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, and 2014 House elections, which were all the years in which generic ballot polling data were available.

The number of seats won by the Democratic party in election $j$ is denoted $s_j$. Voter intent at Election Day, measured on a log-odds scale, is denoted $\mu$, and all other covariates are included in a vector $X$. Then the model specification is $$ s_j-s_{j-1} \sim \mathcal{N}(\beta_0 + \beta_1\mu + X\vec\beta, \sigma), $$ where $\sigma$ is the error. However, we cannot observe $\mu$ directly—we can only obtain an estimate of it, $\tilde\mu$, with some uncertainty $\sigma^2_\mu$ (both of which are taken from the posterior distribution of the voter intent model, which is very close to normal): $$ \tilde\mu \sim \mathcal{N}(\mu, \sigma^2_\mu). $$ The priors for the parameters are $$ \begin{align} \mu &\sim \mathcal{N}(0, 1) \\ \beta_0 &\sim \mathrm{t}_3(0, 10) \\ \beta_1 &\sim \mathrm{t}_3(0, 10) \\ \vec\beta &\sim \mathrm{t}_3(0, 10) \\ \sigma &\sim \mathrm{Cauchy}^+(0, 20) \end{align}$$

The model forecasts the change in Democratic seats, which is highly dependent on whether or not there is a Democratic president. Consequently, most of the terms in the model are interacted with an indicator representing the incumbent party in the White House (1 if a Democrat, –1 if a Republican). The full set of additional model covariates is described in the following table:

Term | Expected effect this election | Notes |
---|---|---|

$midterm$ | –7 seats | The midterm indicator takes on a value of 1 in midterm elections and 0 otherwise. Regardless of the party in power, Democrats generally underperform in midterm elections due to low turnout. |

$pres$ | +1 seat | The White House control indicator. |

$midterm\times pres$ | +7 seats | This is one of the most important terms in the model, since it captures the tendency of midterm elections to swing dramatically against the party of the incumbent president. |

$appr\times pres$ | +0.3 seats | appr is the incumbent president’s approval
rating. A popular president helps down-ballot
races. |

$earn\times pres$ | +0 seats | earn is production and nonsupervisory hourly
earnings growth over the previous year. This term and
the next capture voters’ perception of the state
of the economy, which is important in judging the
president’s party. |

$unemp\times pres$ | +0 seats | unemp is the current unemployment rate. |

$before - 218$ | +1 seat | before is the number of Democratic seats going
into the election, so $before - 218$ is the current
Democratic surplus or deficit of seats. |

$midterm\times(before - 218)$ | +17 seats |

Model data and code