Political arithmetic: The Joy of Stats

The Joy of Stats with Hans Rosling is quite engaging — and worth watching. I really enjoyed the historical threads running through the piece. I think he’s right to emphasize how data collection by states — to understand and control their populations — is at the origin of statistics. With increasing data collection today, this is a powerful and necessary reminder of the range of ends to which data analysis can be put.

Like others, I found the scenes with Rosling behind a bubble plot made difficult by the distracting lights and windows in the background. And the ending — with analyzing “what it means to be human” — was a bit much for me. But a small complaint about a compelling view.

Ideas behind their time: formal causal inference?

Alex Tabarrok at Marginal Revolution blogs about how some ideas seem notably behind their time:

We are all familiar with ideas said to be ahead of their time, Babbage’s analytical engine and da Vinci’s helicopter are classic examples. We are also familiar with ideas “of their time,” ideas that were “in the air” and thus were often simultaneously discovered such as the telephone, calculus, evolution, and color photography. What is less commented on is the third possibility, ideas that could have been discovered much earlier but which were not, ideas behind their time.

In comparing ideas behind and ahead of their times, it’s worth considering the processes that identify them as such.

In the case of ideas ahead of their time, we rely on records and other evidence of their genesis (e.g., accounts of the use of flamethrowers at sea by the Byzantines ). Later users and re-discoverers of these ideas are then in a position to marvel at their early genesis. In trying to see whether some idea qualifies as ahead of its time, this early genesis, lack or use or underuse, followed by extensive use and development together serve as evidence for “ahead of its time” status.

On the other hand, in identifying ideas behind their time, it seems that we need different sorts of evidence. Taborrok uses the standard of whether their fruits could have been produced a long time earlier (“A lot of the papers in say experimental social psychology published today could have been written a thousand years ago so psychology is behind its time”). We need evidence that people in a previous time had all the intellectual resources to generate and see the use of the idea. Perhaps this makes identifying ideas behind their time harder or more contentious.

P(y | do(x))

Perhaps formal causal inference — and some kind of corresponding new notation, such as Pearl’s do(x) operator — is an idea behind its time.1 Judea Pearl’s account of the history of structural equation modeling seems to suggest just this: exactly what the early developers of path models (Wright, Haavelmo, Simon) needed was new notation that would have allowed them to distinguish what they were doing (making causal claims with their models) from what others were already doing (making statistical claims).2

In fact, in his recent talk at Stanford, Pearl suggested just this — that if the, say, the equality operator = had been replaced with some kind of assignment operator (say, :=), formal causal inference might have developed much earlier. We might be a lot further along in social science and applied evaluation of interventions if this had happened.

This example raises some questions about the criterion for ideas behind their time that “people in a previous time had all the intellectual resources to generate and see the use of the idea” (above). Pearl is a computer scientist by training and credits this background with his approach to causality as a problem of getting the formal language right — or moving between multiple formal languages. So we may owe this recent development to comfort with creating and evaluating the qualities of formal languages for practical purposes — a comfort found among computer scientists. Of course, e.g., philosophers and logicians also have been long comfortable with generating new formalisms. I think of Frege here.

So I’m not sure whether formal causal inference is an idea behind its time (or, if so, how far behind). But I’m glad we have it now.

  1. Of course, there is also Rubin’s formalism of potential outcomes. I have found that this becomes very awkward as the causal models get at all complex. []
  2. See chapter 5 of Pearl, J. (2009). Causality: Models, Reasoning and Inference. 2nd Ed. Cambridge University Press. []

Ambiguous signals: “the Facebook”

When Facebook was sweeping Stanford in Spring 2004, it wasn’t yet just Facebook — it was [thefacebook.com]. Many of my friends who were undergrads at Stanford around that time (and shortly after) will still refer to it as “The Facebook” or “the facebook dot com”. This usage can be a jokey signal to members of the in-group that one was an early user. This also may signal attendance at one of the universities Facebook was available at early on (e.g., Harvard, Stanford, Yale, Columbia).1

Of course, this signal can fail for various reasons. The audience may not understand — may see “the Facebook” as a grammatical error. Or widespread attention to Facebook’s history (say, via a fictionalized movie) may put many people in possession of the ability to use this signal, even though they weren’t early users and are not alumni at the appropriate universities.

Worse still, for some audiences, this usage might seem to put the speaker in a late-adopting category, rather than an early-adopting one! For example, in President G. W. Bush’s visit to Facebook today, he said he is now on “the Facebook”. So to many ears, “the Facebook” does exactly the opposite of the effects described above.

In fact, at least one friend has had just this experience: she used “the Facebook” and got a “are you a luddite?” kind of response. To avoid ambiguity (but also subtlety), “the facebook dot com” is still available.

  1. Though it is worth noting that by the time of the domain-name change, many more schools had access to Facebook. But I would guess the likelihood of adoption and attachment to the name is lower. Update: see this more detailed timeline of Facebook university launches. []

Academia vs. industry: Harvard CS vs. Google edition

Matt Welsh, a professor in the Harvard CS department, has decided to leave Harvard to continue his post-tenure leave working at Google. Welsh is obviously leaving a sweet job. In fact, it was not long ago that he was writing about how difficult it is to get tenure at Harvard.

So why is he leaving? Well, CS folks doing research in large distributed systems are in a tricky place, since the really big systems are all in industry. And instead of legions of experienced engineers to help build and study these systems, they have a bunch of lazy grad students! One might think, then, that this kind of (tenured) professor to industry move is limited to people creating and studying large deployments of computer systems.

There is a broader pull, I think. For researchers studying many central topics in the social sciences (e.g., social influence), there is a big draw to industry, since it is corporations that are collecting broad and deep data sets describing human behavior. To some extent, this is also a case of industry being appealing for people studying deployment of large deployments of computer systems — but it applies even to those who don’t care much about the “computer” part. In further parallels to the case with CS systems researchers, in industry they have talented database and machine learning experts ready to help, rather than social science grad students who are (like the faculty) too often afraid of math.

The “friendly world syndrome” induced by simple filtering rules

I’ve written previously about how filtered activity streams can lead to biased views of behaviors in our social neighborhoods. Recent conversations with two people writing popular-press books on related topics have helped me clarify these ideas. Here I reprise previous comments on filtered activity streams, aiming to highlight how they apply even in the case of simple and transparent personalization rules, such as those used by Twitter.

Birds of a feather flock together. Once flying together, a flock is also subject to the same causes (e.g., storms, pests, prey). Our friends, family, neighbors, and colleagues are more similar to us for similar reasons (and others). So we should have no illusions that the behaviors, attitudes, outcomes, and beliefs of our social neighborhood are good indicators of those of other populations — like U.S. adults, Internet users, or homo sapiens of the past, present, or future. The apocryphal Pauline Kael quote “How could Nixon win? No one I know voted for him” suggests both the ease and error of this kind of inference. I take it as a given that people’s estimates of larger populations’ behaviors and beliefs are often biased in the direction of the behaviors and beliefs in their social neighborhoods. This is the case with and without “social media” and filtered activity streams — and even mediated communication in general.

That is, even without media, our personal experiences are not “representative” of the American experience, human experience, etc., but we do (and must) rely on it anyway. One simple cognitive tool here is using “ease of retrieval” to estimate how common or likely some event is: we can estimate how common something is based on how easy it is to think of. So if something prompts someone to consider how common a type of event is, they will (on average) estimate the event as more common if it is more easy to think of an example of the event, imagine the event, etc. And our personal experiences provide these examples and determine how easy they are to bring to mind. Both prompts and immediately prior experience can thus affect these frequency judgments via ease of retrieval effects.

Now this is not to say that we should think as ease of retrieval heuristics as biases per se. Large classes and frequent occurrences are often more available to mind than those that are smaller or less frequent. It is just that this is also often not the case, especially when there is great diversity in frequency among physical and social neighborhoods. But certainly we can see some cases where these heuristics fail.

Media are powerful sources of experiences that can make availability and actual frequency diverge, whether by increasing the biases in the direction of projecting our social neighborhoods onto larger population or in other, perhaps unexpected directions. In a classic and controversial line of research in the 1970s and 80s, Gerbner and colleagues argued that increased television-watching produces a “mean world syndrome” such that watching more TV causes people to increasingly overestimate, e.g., the fraction of adult U.S. men employed in law enforcement and the probability of being a victim of violent crime. Their work did not focus on investigating heuristics producing these effects, but others have suggested the availability heuristic (and related ease of retrieval effects) as at work. So even if my social neighborhood has fewer cops or victims of violent crime than the national average, media consumption and the availability heuristic can lead me to overestimate both.

Personalized and filtered activity streams certainly also affect us through some of the same psychological processes, leading to biases in users’ estimates of population-wide frequencies. They can aIso bias inference about our own social neighborhoods. If I try to estimate how likely a Facebook status update by a friend is to receive a comment, this estimate will be affected by the status updates I have seen recently. And if content with comments is more likely to be shown to me in my personalized filtered activity stream (a simple rule for selecting more interesting content, when there is too much for me to consume it all), then it will be easier for me to think of cases in which status updates by my friends do receive comments.

In my previous posts on these ideas, I have mainly focused on effects on beliefs about my social neighborhood and specifically behaviors and outcomes specific to the service providing the activity stream (e.g., receiving comments). But similar effects apply for beliefs about other behaviors, opinions, and outcomes. In particular, filtered activity streams can increase the sense that my social neighborhood (and perhaps the world) agrees with me. Say that content produced by my Facebook friends with comments and interaction from mutual friends is more likely to be shown in my filtered activity streams. Also assume that people are more likely to express their agreement in such a way than substantial disagreement. As long as I am likely to agree with most of my friends, then this simple rule for filtering produces an activity stream with content I agree with more than an unfiltered stream would. Thus, even if I have a substantial minority of friends with whom I disagree on politics, this filtering rule would likely make me see less of their content, since it is less likely to receive (approving) comments from mutual friends.

I’ve been casually calling this larger family of effects this the “friendly world syndrome” induced by filtered activity streams. Like the mean world syndrome of the television cultivation research described above, this picks out a family of unintentional effects of media. Unlike the mean world syndrome, the friendly world syndrome includes such results as overestimating how many friends I have in common with my friends, how much positive and accomplishment-reporting content my friends produce, and (as described) how much I agree with my friends.1

Even though the filtering rules I’ve described so far are quite simple and appealing, they still are more consistent with versions of activity streams that are filtered by fancy relevance models, which are often quite opaque to users. Facebook News Feed — and “Top News” in particular — is the standard example here. On the other hand, one might think that these arguments do not apply to Twitter, which does not apply any kind of machine learning model estimating relevance to filtering users’ streams. But Twitter actually does implement a filtering rule with important similarities to the “comments from mutual friends” rule described above. Twitter only shows “@replies” to a user on their home page when that user is following both the poster of the reply and the person being replied to.2 This rule makes a lot of sense, as a reply is often quite difficult to understand without the original tweet. Thus, I am much more likely to see people I follow replying to people I follow than to others (since the latter replies are encountered only from browsing away from the home page. I think this illustrates how even a straightforward, transparent rule for filtering content can magnify false consensus effects.

One aim in writing this is to clarify that a move from filtering activity streams using opaque machine learning models of relevance to filtering them with simple, transparent, user-configurable rules will likely be insufficient to prevent the friendly world syndrome. This change might have many positive effects and even reduce some of these effects by making people mindful of the filtering.3 But I don’t think these effects are so easily avoided in any media environment that includes sensible personalization for increased relevance and engagement.

  1. This might suggest that some of the false consensus effects observed in recent work using data collected about Facebook friends could be endogenous to Facebook. See Goel, S., Mason, W., & Watts, D. J. (2010). Real and perceived attitude agreement in social networks. Journal of Personality and Social Psychology, 99(4), 611-621. doi:10.1037/a0020697 []
  2. Twitter offers the option to see all @replies written by people one is following, but 98% of users use the default option. Some users were unhappy with an earlier temporary removal of this feature. My sense is that the biggest complaint was that removing this feature removed a valuable means for discovering new people to follow. []
  3. We are investigating this in ongoing experimental research. Also see Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., & Simons, A. (1991). Ease of retrieval as information: Another look at the availability heuristic. Journal of Personality and Social Psychology, 61(2), 195-202. doi:10.1037/0022-3514.61.2.195 []