Adjusting biased samples

Nate Cohn at The New York Times reports on how one 19-year-old black man is having an outsized impact on the USC/LAT panel’s estimates of support for Clinton in the U.S. presidential election. It happens that the sample doesn’t have enough other people with similar demographics and voting history (covariates) to this panelist, so he is getting a large weight in computing the overall averages for the populations of interest, such as likely voters:

There is a 19-year-old black man in Illinois who has no idea of the role he is playing in this election.

He is sure he is going to vote for Donald J. Trump.

And he has been held up as proof by conservatives — including outlets like Breitbart News and The New York Post — that Mr. Trump is excelling among black voters. He has even played a modest role in shifting entire polling aggregates, like the Real Clear Politics average, toward Mr. Trump.

As usual, Andrew Gelman suggests that the solution to this problem is a technique he calls “Mr. P” (multilevel regression and post-stratification). I wanted to comment on some practical tradeoffs among common methods. Maybe these are useful notes, which can be read alongside another nice piece by Nate Cohn on how different adjustment methods can yield very different polling results.


Complete post-stratification is when you compute the mean outcome (e.g., support for Clinton) for each stratum of people, such as 18-24-year-old black men, defined by the covariates X. Then you combine these weighting by the size of each group in the population of interest. This really only works when you have a lot of data compared with the number of strata — and the number of strata grows very fast in the number of covariates you want to adjust for.

Modeling sample inclusion and weighting

When people talk about survey weighting, often what they mean is weighting by inverse of the estimated probability of inclusion in the sample. You model selection into the survey S using, e.g., logistic regression on the covariates X and some interactions. This can be done with regularization (i.e., priors, shrinkage) since many of the terms in the model might be estimated with very few observations. Especially without enough regularization, this can result in very large weights when you don’t have enough of some particular type in your sample.

Modeling the outcome and integrating

You fit a model predicting the response (e.g., support for Clinton) Y with the covariates X. You regularize this model in some way so that the estimate for each person is going to “borrow strength” from other people with similar Xs. So now you have a fitted responses Yhat for each unique X. To get an estimate for a particular population of interest, integrate out over the distribution of X in that population. Gelman’s preferred version “Mr. P” uses a multilevel (aka hierarchical Bayes, random effects) model for the outcome, but other regularization methods may often be appealing.

This is nice because there can be some substantial efficiency gains (i.e. more precision) by making use of the outcome information. But there are also some practical issues. First, you need a model for each outcome in your analysis, rather than just having weights you could use for all outcomes and all recodings of outcomes. Second, the implicit weights that this process puts on each observation can vary from outcome to outcome — or even for different codings (i.e. a dichotomization of answers on a numeric scale) of the same outcome. In a reply to his post, Gelman notes that you would need a different model for each outcome, but that some joint model for all outcomes would be ideal. Of course, the latter joint modeling approach, while appealing in some ways (many statisticians love having one model that subsumes everything…) means that adding a new outcome to analysis would change all prior results.


Side note: Other methods, not described here, also work towards the aim of matching characteristics of the population distribution (e.g., iterative proportional fitting / raking). They strike me as overly specialized and not easy to adapt and extend.

It’s better for older workers to go a little fast: DocSend in Snow Crash

My friends at DocSend have just done their public launch (article, TechCrunch Disrupt presentation). DocSend provides easy ways to get analytics for documents (e.g., proposals, pitch decks, reports, memos) you send out, answering questions like: Who actually viewed the document? Which pages did they view? How much time did they spend on each page? The most common use cases for DocSend’s current customers involve sales, marketing, and startup fundraising — mainly sending documents to people outside an organization.

From when Russ, Dave, and Tony started floating these ideas, I’ve pointed out the similarity with a often forgotten scene1 in Snow Crash, in which a character — Y.T.’s mom — is tracked by her employer (the Federal Government actually) as she reads a memo on a cost-saving program. Here’s an except from Chapter 37:

Y.T.’s mom pulls up the new memo, checks the time, and starts reading it. The estimated reading time is 15.62 minutes. Later, when Marietta [her boss] does her end-of-day statistical roundup, sitting in her private office at 9:00 P.M., she will see the name of each employee and next to it, the amount of time spent reading this memo, and her reaction, based on the time spent, will go something like this:

• Less than 10 min.: Time for an employee conference and possible attitude counseling.
• 10-14 min.: Keep an eye on this employee; may be developing slipshod attitude.
• 14-15.61 min.: Employee is an efficient worker, may sometimes miss important details.
• Exactly 15.62 min.: Smartass. Needs attitude counseling.
• 15.63-16 min.: Asswipe. Not to be trusted.
• 16-18 min.: Employee is a methodical worker, may sometimes get hung up on minor details.
• More than 18 min.: Check the security videotape, see just what this employee was up to (e.g., possible unauthorized restroom break).

Y.T.’s mom decides to spend between fourteen and fifteen minutes reading the memo. It’s better for younger workers to spend too long, to show that they’re careful, not cocky. It’s better for older workers to go a little fast, to show good management potential. She’s pushing forty. She scans through the memo, hitting the Page Down button at reasonably regular intervals, occasionally paging back up to pretend to reread some earlier section. The computer is going to notice all this. It approves of rereading. It’s a small thing, but over a decade or so this stuff really shows up on your work-habits summary.

This is pretty much what DocSend provides. And, despite the emphasis on sales etc., some of their customers are using this for internal HR training — which shifts the power asymmetry in how this technology is used from salespeople selling to companies (which can choose not to buy, etc.) to employers tracking their employees.2

To conclude, it’s worth noting that, at least for a time, product managers at Facebook — Russ’ job before starting DocSend — were required to read Snow Crash as part of their internal training. Though I don’t think the folks running PM bootcamp actually tracked whether their subordinates looked at each page.

  1. I know it’s often forgotten because I’ve tried referring to the scene with many people who have read Snow Crash— or at least claim to have read it… []
  2. Of course, there are some products that do this kind of thing. What distinguishes DocSend is how easy it makes it to add such personalized tracking to simple documents and that this is the primary focus of the product, unlike larger sales tool sets like ClearSlide. []

Exploratory data analysis: Our free online course

Moira Burke, Solomon Messing, Chris Saden, and I have created a new online course on exploratory data analysis (EDA) as part of Udacity’s “Data Science” track. It is designed to teach students how to explore data sets. Students learn how to do EDA using R and the visualization package ggplot.

We emphasize the value of EDA for building and testing intuitions about a data set, identifying problems or surprises in data, summarizing variables and relationships, and supporting other data analysis tasks. The course materials are all free, and you can also sign up for tutoring, grading (especially useful for the final project), and certification.

Between providing general advice on data analysis and visualization, stepping students through exactly how to produce particular plots, and reasoning about how the data can answer questions of interest, the course includes interviews with four of our amazing colleagues on the Facebook Data Science team:

One unique feature of this course is that one of the data sets we use is a “pseudo-Facebook” data set that Moira and I created to share many features with real Facebook data, but to not describe any particular real Facebook users or reveal certain kinds of information about aggregate behavior. Other data sets used in the course include two different data sets giving sale prices for diamonds and panel “scanner” data describing yogurt purchases.

It was an fascinating and novel process putting together this course. We scripted almost everything in detail in advance — before any filming started — using first outlines, then drafts using Markdown in R with knitr, and then more detailed scripts with Udacity-specific notation for all the different shots and interspersed quizzes. I think this is part of what leads Kaiser Fung to write:

The course is designed from the ground up for online instruction, and it shows. If you have tried other online courses, you will immediately notice the difference in quality.

Check out the course and let me know what you think — we’re still incorporating feedback.

Interpreting discrete-choice models

Are individuals random-utility maximizers? Or do individuals have private knowledge of shocks to their utility?

“McFadden (1974) observed that the logit, probit, and similar discrete-choice models have two interpretations. The first interpretation is that of individual random utility. A decisionmaker draws a utility function at random to evaluate a choice situation. The distribution of choices then reflects the distribution of utility, which is the object of econometric investigation. The second interpretation is that of a population of decision makers. Each individual in the population has a deterministic utility function. The distribution of choices in the population reflects the population distribution of preferences. … One interpretation of this game theoretic approach is that the econometrician confronts a population of random-utility maximizers whose decisions are coupled. These models extend the notion of Nash equilibrium to random- utility choice. The other interpretation views an individual’s shock as known to the individual but not to others in the population (or to the econometrician). In this interpretation, the Brock-Durlauf model is a Bayes-Nash equilibrium of a game with independent types, where the type of individual i is the pair (x_i, e_i). Information is such that the first component of each player i’s type is common knowledge, while the second is known only to player i.” — Blume, Brock, Durlauf & Ioannides. 2011. Identification of Social Interactions. Handbook of Social Economics, Volume 1B.

Do what the virtuous person would do?

In the film The Descendents, George Clooney’s character Matt King wrestles — sometimes comically — with new and old choices involving his family and Hawaii. In one case, King decides he wants to meet a rival, both just to meet him and to give him some news; that is, he (at least explicitly) has generally good reason to meet him. Perhaps he even ought to meet him. When he actually does meet him, he cannot just do these things, he also argues with his rival, etc. King’s unplanned behaviors end up causing his rival considerable trouble.1

This struck me as related to some challenges in formulating what one should do — that is, in the “practical reasoning” side of ethics.

One way of getting practical advice out of virtue ethics is to say that one should do what the virtuous person would do in this situation. On its face, this seems right. But there are also some apparent counterexamples. Consider a short-tempered tennis player who has just lost a match.2 In this situation, the virtuous person would walk over to his opponent, shake his hand, and say something like “Good match.” But if this player does that, he is likely to become enraged and even assault his victorious opponent. So it seems better for him to walk off the court without attempting any of this — even though this is clearly rude.

The simple advice to do what the virtuous person would do in the present situation is, then, either not right or not so simple. It might be right, but not so simple to implement, if part of “the present situation” is one’s own psychological weaknesses. Aspects of the agent’s psychology — including character flaws — seem to license bad behavior and to remove reasons for taking the “best” actions.

King and other characters in The Descendents face this problem, both in the example above and at some other points in the movie. He begins a course of action (at least in part) because this is what the virtuous person would do. But then he is unable to really follow through because he lacks the necessary virtues.3 We might take this as a reminder of the ethical value to being humble — to account for our faults — when reasoning about what we ought to do.4 It is also a reminder of how frustrating this can be, especially when one can imagine (and might actually be able to) following through on doing what the virtuous person would do.

One way to cope with these weaknesses is to leverage other aspects of one’s situation. We can make public commitments to do the virtuous thing. We can change our environment, sometimes by binding our future selves, like Ulysses, from acting on our vices once we’ve begun our (hopefully) virtuous course of action. Perhaps new mobile technologies will be a substantial help here — helping us intervene in our own lives in this way.

  1. Perhaps deserved trouble. But this certainly didn’t play a stated role in the reasoning justifying King’s decision to meet him. []
  2. This example is first used by Gary Watson (“Free Agency”, 1975) and put to this use by Michael Smith in his “Internalism” (1995). Smith introduces it as a clear problem for the “example” model of how what a virtuous person would do matters for what we should each do. []
  3. Another reading of some of these events in The Descendents is that these characters actually want to do the “bad behaviors”, and they (perhaps unconciously) use their good intentions to justify the course of action that leads to the bad behavior. []
  4. Of course, the other side of such humility is being short on self-efficacy. []