Election day is just one week away. Polling data suggest a close race for control of the House and the Senate. RealClearPolitics and FiveThirtyEight, two well-followed websites that use polling data to forecast the outcome of the election are in agreement, though their specifics are not completely consistent.
Yet, as was seen in the 2016 Presidential election, polls can be wrong.
What makes polling so challenging, and election forecasting so difficult?
For polls to be useful, they must capture a representative sample of those who will cast votes in the election. For example, if a pollster oversamples Republican-leaning voters in a Democrat-leaning district, the resulting poll may not provide useful forecasting information.
The issue of registered voters versus likely voters versus those who actually vote further clouds the value of polling data. Even when people indicate that they are likely to vote, what they actually do may be different. This means that polling data may contain information about people who never cast a ballot.
How polling data are collected can bias the results. Some contact target voters by landlines, others by cell phones and yet others via the internet. Live callers versus robocalls also create different responses. Each technique carries with them hidden biases. For example, older Americans are more likely to have a landline than younger Americans. This means that their political leanings many be over-represented if pollsters primarily target landline owners in their polling and do not appropriately account for this bias.
Some may believe that poll sizes are an issue of concern. This is one of the less significant problems with polls. A poll size of 400 can yield a forecast whose margin of error is around plus or minus 5 percent with 95 percent confidence. Up the poll size to 1,000 and the accuracy improves to around plus or minus 3 percent. Of course, with many races, particularly those in the Senate running within just 1 or 2 percentage points, like Nevada, Arizona and Georgia, such races are labelled as toss-ups, with no clear favorite.
One way to overcome this margin of error is by aggregating polls. There are risks with doing this, since each pollster has their own methodologies. Mixing data can either increase or decrease the margin of error in unpredictable ways. However, the general rule of thumb is that larger sample sizes are likely overcome any such biases, so aggregating generally yields more reliable forecasts.
Some people who are polled may also wish to conceal their voting intentions and indicate a candidate that they will not vote for. There is no way to protect against such misinformation.
The impact former President Donald Trump has on polling, known as the “Trump effect,” is difficult to account for this year. Several candidates that Trump supported, like Mehmet Oz in the Pennsylvania Senate race, have been competitive but behind in most polls. How well Trump supporters show up to vote will determine their final outcomes.
Methodologies used to transform polling data into forecasts are highly variable. RealCleaPolitics is a poll aggregator, taking a simple average of the most recent polls, without adjusting for poll size. FiveThirtyEight uses more sophisticated models that appear to incorporate Bayesian statistics, including both prior information about races, expert opinion and polling data. This means that some of their models make assumptions that bring a human element of judgment into their forecasts, something that RealClearPolitics is not subject to. If such judgment is spot on, FiveThirtyEight will be more reliable. The impact of such judgement is likely mitigated as Election Day approaches, with polling data dominating their forecasts.
At present both websites forecast similar outcomes for the House and Senate, though individual race forecasts vary. Of course, with new House maps, this adds yet another wrinkle of uncertainty, particularly with House seats that represent completely new areas in a state that involve highly competitive races.
The ultimate polls occur on Election Day when votes are cast and counted. With midterm elections attracting around 40 percent of eligible voters, a minority of eligible voters will likely determine who will control Congress for the next two years. Given that the president’s party typically loses seats in Congress during the midterms, the headwinds for Democrats is obvious.
Polling is easy. Accurate polling is difficult. Coalescing all polling data into useful forecasts is the ultimate challenge. In one week, we will know how well everyone did.
Sheldon H. Jacobson, Ph.D., is a professor of Computer Science at the University of Illinois at Urbana-Champaign. A data scientist, he applies his expertise in data-driven risk-based assessment to evaluate and inform public policy.