# Adjusting biased samples

Nate Cohn at The New York Times reports on how one 19-year-old black man is having an outsized impact on the USC/LAT panel’s estimates of support for Clinton in the U.S. presidential election. It happens that the sample doesn’t have enough other people with similar demographics and voting history (covariates) to this panelist, so he is getting a large weight in computing the overall averages for the populations of interest, such as likely voters:

There is a 19-year-old black man in Illinois who has no idea of the role he is playing in this election.

He is sure he is going to vote for Donald J. Trump.

And he has been held up as proof by conservatives — including outlets like Breitbart News and The New York Post — that Mr. Trump is excelling among black voters. He has even played a modest role in shifting entire polling aggregates, like the Real Clear Politics average, toward Mr. Trump.

As usual, Andrew Gelman suggests that the solution to this problem is a technique he calls “Mr. P” (multilevel regression and post-stratification). I wanted to comment on some practical tradeoffs among common methods. Maybe these are useful notes, which can be read alongside another nice piece by Nate Cohn on how different adjustment methods can yield very different polling results.

### Post-stratification

Complete post-stratification is when you compute the mean outcome (e.g., support for Clinton) for each stratum of people, such as 18-24-year-old black men, defined by the covariates *X*. Then you combine these weighting by the size of each group in the population of interest. This really only works when you have a lot of data compared with the number of strata — and the number of strata grows very fast in the number of covariates you want to adjust for.

### Modeling sample inclusion and weighting

When people talk about survey weighting, often what they mean is weighting by inverse of the estimated probability of inclusion in the sample. You model selection into the survey *S* using, e.g., logistic regression on the covariates *X* and some interactions. This can be done with regularization (i.e., priors, shrinkage) since many of the terms in the model might be estimated with very few observations. Especially without enough regularization, this can result in very large weights when you don’t have enough of some particular type in your sample.

### Modeling the outcome and integrating

You fit a model predicting the response (e.g., support for Clinton) *Y* with the covariates *X.* You regularize this model in some way so that the estimate for each person is going to “borrow strength” from other people with similar *X*s. So now you have a fitted responses *Yhat* for each unique *X. *To get an estimate for a particular population of interest, integrate out over the distribution of *X* in that population. Gelman’s preferred version “Mr. P” uses a multilevel (aka hierarchical Bayes, random effects) model for the outcome, but other regularization methods may often be appealing.

This is nice because there can be some substantial efficiency gains (i.e. more precision) by making use of the outcome information. But there are also some practical issues. First, you need a model for each outcome in your analysis, rather than just having weights you could use for all outcomes and all recodings of outcomes. Second, the implicit weights that this process puts on each observation can vary from outcome to outcome — or even for different codings (i.e. a dichotomization of answers on a numeric scale) of the same outcome. In a reply to his post, Gelman notes that you would need a different model for each outcome, but that some joint model for all outcomes would be ideal. Of course, the latter joint modeling approach, while appealing in some ways (many statisticians love having one model that subsumes everything…) means that adding a new outcome to analysis would change all prior results.

Side note: Other methods, not described here, also work towards the aim of matching characteristics of the population distribution (e.g., iterative proportional fitting / raking). They strike me as overly specialized and not easy to adapt and extend.