Traits, adaptive systems & dimensionality reduction

Psychologists have posited numerous psychological traits and described causal roles they ought to play in determining human behavior. Most often, the canonical measure of a trait is a questionnaire. Investigators obtain this measure for some people and analyze how their scores predict some outcomes of interest. For example, many people have been interested in how psychological traits affect persuasion processes. Traits like need for cognition (NFC) have been posited and questionnaire items developed to measure them. Among other things, NFC affects how people respond to messages with arguments for varying quality.

How useful are these traits for explanation, prediction, and adaptive interaction? I can’t address all of this here, but I want to sketch an argument for their irrelevance to adaptive interaction — and then offer a tentative rejoinder.

Interactive technologies can tailor their messages to the tastes and susceptibilities of the people interacting with and through them. It might seem that these traits should figure in the statistical models used to make these adaptive selections. After all, some of the possible messages fit for, e.g., coaching a person to meet their exercise goals are more likely to be effective for low NFC people than high NFC people, and vice versa. However, the standard questionnaire measures of NFC cannot often be obtained for most users — certainly not in commerce settings, and even people signing up for a mobile coaching service likely don’t want to answer pages of questions. On the other hand, some Internet and mobile services have other abundant data available about their users, which could perhaps be used to construct an alternative measure of these traits. The trait-based-adaptation recipe is:

  1. obtain the questionnaire measure of the trait for a sample,
  2. predict this measure with data available for many individuals (e.g., log data),
  3. use this model to construct a measure for out-of-sample individuals.

This new measure could then be used to personalize the interactive experience based on this trait, such that if a version performs well (or poorly) for people with a particular score on the trait, then use (or don’t use) that version for people with similar scores.

But why involve the trait at all? Why not just personalize the interactive experience based on the responses of similar others? Since the new measure of the trait is just based on the available behavioral, demographic, and other logged data, one could simply predict responses based on those measure. Put in geometric terms, if the goal is to project the effects of different message onto available log data, why should one project the questionnaire measure of the trait onto the available log data and then project the effects onto this projection? This seems especially unappealing if one doesn’t fully trust the questionnaire measure to be accurate or one can’t be sure about which the set of all the traits that make a (substantial) difference.

I find this argument quite intuitively appealing, and it seems to resonate with others.1 But I think there are some reasons the recipe above could still be appealing.

One way to think about this recipe is as dimensionality reduction guided by theory about psychological traits. Available log data can often be used to construct countless predictors (or “features”, as the machine learning people call them). So one can very quickly get into a situation where the effective number of parameters for a full model predicting the effects of different messages is very large and will make for poor predictions. Nothing — no, not penalized regression, not even a support vector machine — makes this problem go away. Instead, one has to rely on the domain knowledge of the person constructing the predictors (i.e., doing the “feature engineering”) to pick some good ones.

So the tentative rejoinder is this: established psychological traits might often make good dimensions to predict effects of different version of a message, intervention, or experience with. And they may “come with” suggestions about what kinds of log data might serve as measures of them. They would be expected to be reusable across settings. Thus, I think this recipe is nonetheless deserves serious attention.

  1. I owe some clarity on this to some conversations with Mike Nowak and Maurits Kaptein. []

Applying social psychology

Some reflections on how “quantitative” social psychology is and how this matters for its application to design and decision-making — especially in industries touched by the Internet.

In many ways, contemporary social psychology is dogmatically quantitative. Investigators run experiments, measure quantitative outcomes (even coding free responses to make them amenable to analysis), and use statistics to characterize the collected data. On the other hand, social psychology’s processes of stating and integrating its conclusions remain largely qualitative. Many hypotheses in social psychology state that some factor affects a process or outcome in one direction (i.e., “call” either beta > 0 or beta < 0). Reviews of research in social psychology often start with a simple effect and then note how many other variables moderate this effect. This is all quite fitting with the dominance of null-hypothesis significance testing (NHST) in much of psychology: rather than producing point estimates or confidence intervals for causal effects, it is enough to simply see how likely the observed data is given there there is no effect.1 Of course, there have been many efforts to change this. Many journals require reporting effect sizes. This is a good thing, but these effect sizes are rarely predicted by social psychological theory. Rather, they are reported to aid judgments of whether a finding is not only statistically significant but substantively or practically significant, and the theory predicts the direction of the effect. Not only is this process of reporting and combining results not quantitative in many ways, but it requires substantial inference from the particular settings of conducted experiments to the present settings. This actually helps to make sense of the practices described above: many social psychology experiments are conducted in conditions and with populations that are so different from those in which people would like to apply the resulting theories, that expecting consistency of effect sizes is implausible.2 This is not to say that these studies cannot tell us a good deal about how people will behave in many circumstances. It's just that figuring out what they predict and whether these predictions are reliable is a very messy, qualitative process. Thus, when it comes to making decisions -- about a policy, intervention, or service -- based on social-psychological research, this process is largely qualitative. Decision-makers can ask, which effects are in play? What is their direction? With interventions and measurement that are very likely different from the present case, how large were the effects?3

Sometimes this is the best that social science can provide. And such answers can be quite useful in design. The results of psychology experiments can often be very effective when used generatively. For example, designers can use taxonomies of persuasive strategies to dream up some ways of producing desired behavior change.

Nonetheless, I think all this can be contrasted with some alternative practices that are both more quantitative and require less of this uneasy generalization. First, social scientists can give much more attention to point estimates of parameters. While not without its (other) flaws, the economics literature on financial returns to education has aimed to provide, criticize, and refine estimates of just how much wages increase (on average) with more education.4

Second, researchers can avoid much of the messiest kinds of generalization altogether. Within the Internet industry, product optimization experiments are ubiquitous. Google, Yahoo, Facebook, Microsoft, and many others are running hundreds to thousands of simultaneous experiments with parts of their services. This greatly simplifies generalization: the exact intervention under consideration has just been tried with a random sample from the very population it will be applied to. If someone wants to tweak the intervention, just try it again before launching. This process still involves human judgment about how to react to these results.5 An even more extreme alternative is when machine learning is used to fine-tune, e.g., recommendations without direct involvement (or understanding) by humans.

So am I saying that social psychology — at least as an enterprise that is useful to designers and decision-makers — is going to be replaced by simple “bake-off” experiments and machine learning? Not quite. Unlike product managers at Google, many decision-makers don’t have the ability to cheaply test a proposed intervention on their population of interest.6 Even at Google, many changes (or new products) under consideration are too difficult to build to them all: one has to decide among an overabundance of options before the most directly applicable data could be available. This is consistent with my note above that social-psychological findings can make excellent inspiration during idea generation and early evaluation.

  1. To parrot Andrew Gelman, in social phenomena, everything affects everything else. There are no betas that are exactly zero. []
  2. It's also often implausible that the direction of the effect must be preserved. []
  3. Major figures in social psychology, such as Lee Ross, have worked on trying to better anticipate the effects of social interventions from theory. It isn’t easy. []
  4. The diversity of the manipulations used by social psychologists ostensibly studying the same thing can make this more difficult. []
  5. Generalization is not avoided. In particular, decision-makers often have to consider what would happen if an intervention tested with 1% of the population is launched for the whole population. There are all kinds of issues relating to peer influence, network effects, congestion, etc., here that don’t allow for simple extrapolation from the treatment effects identified by the experiment. Nonetheless, these challenges obviously apply to most research that aims to predict the effects of causes. []
  6. However, Internet services play a more and more central role in many parts of our life, so this doesn’t just have to be limited to the Internet industry itself. []

Political arithmetic: The Joy of Stats

The Joy of Stats with Hans Rosling is quite engaging — and worth watching. I really enjoyed the historical threads running through the piece. I think he’s right to emphasize how data collection by states — to understand and control their populations — is at the origin of statistics. With increasing data collection today, this is a powerful and necessary reminder of the range of ends to which data analysis can be put.

Like others, I found the scenes with Rosling behind a bubble plot made difficult by the distracting lights and windows in the background. And the ending — with analyzing “what it means to be human” — was a bit much for me. But a small complaint about a compelling view.

Ideas behind their time: formal causal inference?

Alex Tabarrok at Marginal Revolution blogs about how some ideas seem notably behind their time:

We are all familiar with ideas said to be ahead of their time, Babbage’s analytical engine and da Vinci’s helicopter are classic examples. We are also familiar with ideas “of their time,” ideas that were “in the air” and thus were often simultaneously discovered such as the telephone, calculus, evolution, and color photography. What is less commented on is the third possibility, ideas that could have been discovered much earlier but which were not, ideas behind their time.

In comparing ideas behind and ahead of their times, it’s worth considering the processes that identify them as such.

In the case of ideas ahead of their time, we rely on records and other evidence of their genesis (e.g., accounts of the use of flamethrowers at sea by the Byzantines ). Later users and re-discoverers of these ideas are then in a position to marvel at their early genesis. In trying to see whether some idea qualifies as ahead of its time, this early genesis, lack or use or underuse, followed by extensive use and development together serve as evidence for “ahead of its time” status.

On the other hand, in identifying ideas behind their time, it seems that we need different sorts of evidence. Taborrok uses the standard of whether their fruits could have been produced a long time earlier (“A lot of the papers in say experimental social psychology published today could have been written a thousand years ago so psychology is behind its time”). We need evidence that people in a previous time had all the intellectual resources to generate and see the use of the idea. Perhaps this makes identifying ideas behind their time harder or more contentious.

Y(X = x) and P(Y | do(x))

Perhaps formal causal inference — and some kind of corresponding new notation, such as Pearl’s do(x) operator or potential outcomes — is an idea behind its time.1 Judea Pearl’s account of the history of structural equation modeling seems to suggest just this: exactly what the early developers of path models (Wright, Haavelmo, Simon) needed was new notation that would have allowed them to distinguish what they were doing (making causal claims with their models) from what others were already doing (making statistical claims).2

In fact, in his recent talk at Stanford, Pearl suggested just this — that if the, say, the equality operator = had been replaced with some kind of assignment operator (say, :=), formal causal inference might have developed much earlier. We might be a lot further along in social science and applied evaluation of interventions if this had happened.

This example raises some questions about the criterion for ideas behind their time that “people in a previous time had all the intellectual resources to generate and see the use of the idea” (above). Pearl is a computer scientist by training and credits this background with his approach to causality as a problem of getting the formal language right — or moving between multiple formal languages. So we may owe this recent development to comfort with creating and evaluating the qualities of formal languages for practical purposes — a comfort found among computer scientists. Of course, e.g., philosophers and logicians also have been long comfortable with generating new formalisms. I think of Frege here.

So I’m not sure whether formal causal inference is an idea behind its time (or, if so, how far behind). But I’m glad we have it now.

  1. There is a “lively” debate about the relative value of these formalisms. For many of the dense causal models applicable to the social sciences (everything is potentially a confounder), potential outcomes seem like a good fit. But they can become awkward as the causal models get complex, with many exclusion restrictions (i.e. missing edges). []
  2. See chapter 5 of Pearl, J. (2009). Causality: Models, Reasoning and Inference. 2nd Ed. Cambridge University Press. []

Economic imperialism and causal inference

And I, for one, welcome our new economist overlords…

Readers not in academic social science may take the title of this post as indicating I’m writing about the use of economic might to imperialist ends.1 Rather, economic imperialism is a practice of economists (and acolytes) in which they invade research territories that traditionally “belong” to other social scientific disciplines.2 See this comic for one way you can react to this.3

Economists bring their theoretical, statistical, and research-funding resources to bear on problems that might not be considered economics. For example, freakonomists like Levitt study sumo wrestlers and the effects of the legalization of abortion on crime. But, hey, if the Commerce Clause means that Congress can legislate everything, then, for the same reasons, economists can — no, must — study everything.

I am not an economist by training, but I have recently had reason to read quite a bit in econometrics. Overall, I’m impressed.4 Economists have recently taken causal inference — learning about cause and effect relationships, often from observational data — quite seriously. In the eyes of some, this has precipitated a “credibility revolution” in economics. Certainly, papers in economics and (especially) econometrics journals consider threats to the validity of causal inference at length.

On the other hand, causal inference in the rest of the social sciences is simultaneously over-inhibited and under-inhibited. As Judea Pearl observes in his book Causality, lack of clarity about statistical models (that social scientists often don’t understand) and causality has induced confusion about distinctions between statistical and causal issues (i.e., between estimation methods and identification).5

So, on the one had, many psychologists stick to experiments. Randomized experiments are, generally, the gold standard for investigating cause–effect relationships, so this can and often does go well. However, social psychologists have recently been obsessed with using “mediation analysis” to investigate the mechanisms by which causes they can manipulate produce effects of interest. Investigators often manipulate some factors experimentally and then measure one or more variables they believe fully or partially mediate the effect of those factors on their outcome. Then, under the standard Baron & Kenny approach, psychologists fit a few regression models, including regressing the outcome on both the experimentally manipulated variables and the simply measured (mediating) variables. The assumptions required for this analysis to identify any effects of interest are rarely satisfied (e.g., effects on individuals are homogenous).6 So psychologists are often over-inhibited (experiments only please!) and under-inhibited (mediation analysis).

Likewise, in more observational studies (in psychology, sociology, education, etc.), investigators are sometimes wary of making explicit causal claims. So instead of carefully stating the causal assumptions that would justify different causal conclusions, readers are left with phrases like “suggests” and “is consistent with” followed by causal claims. Authors then recommend that further research be conducted to better support these causal conclusions. With these kinds of recommendations awaiting, no wonder that economists find the territory ready for taking: they can just show up with econometrics tools and get to work on hard-won questions that “rightly belong to others”.

  1. Well, if economists have better funding sources, this might apply in some sense. []
  2. For arguments in favor of economic imperialism, see Lazear, E.P. (1999). Economic imperialism. NBER Working Paper No. 7300. []
  3. Or see this comic for imperialism by physicists. []
  4. At least by the contemporary literature on what I’ve been reading on — IVs, encouragement designs, endogenous interactions, matching estimators. But it is true that in some of these areas econometrics has been able to fruitfully borrow from work on potential outcomes in statistics and epidemiology. []
  5. Econometricians have made similar observations. []
  6. For a bit on this topic, see the discussion and links to papers here. []