Some reflections on how “quantitative” social psychology is and how this matters for its application to design and decision-making — especially in industries touched by the Internet.
In many ways, contemporary social psychology is dogmatically quantitative. Investigators run experiments, measure quantitative outcomes (even coding free responses to make them amenable to analysis), and use statistics to characterize the collected data. On the other hand, social psychology’s processes of stating and integrating its conclusions remain largely qualitative. Many hypotheses in social psychology state that some factor affects a process or outcome in one direction (i.e., “call” either beta > 0 or beta < 0). Reviews of research in social psychology often start with a simple effect and then note how many other variables moderate this effect. This is all quite fitting with the dominance of null-hypothesis significance testing (NHST) in much of psychology: rather than producing point estimates or confidence intervals for causal effects, it is enough to simply see how likely the observed data is given there there is no effect.1 Of course, there have been many efforts to change this. Many journals require reporting effect sizes. This is a good thing, but these effect sizes are rarely predicted by social psychological theory. Rather, they are reported to aid judgments of whether a finding is not only statistically significant but substantively or practically significant, and the theory predicts the direction of the effect. Not only is this process of reporting and combining results not quantitative in many ways, but it requires substantial inference from the particular settings of conducted experiments to the present settings. This actually helps to make sense of the practices described above: many social psychology experiments are conducted in conditions and with populations that are so different from those in which people would like to apply the resulting theories, that expecting consistency of effect sizes is implausible.2 This is not to say that these studies cannot tell us a good deal about how people will behave in many circumstances. It's just that figuring out what they predict and whether these predictions are reliable is a very messy, qualitative process. Thus, when it comes to making decisions -- about a policy, intervention, or service -- based on social-psychological research, this process is largely qualitative. Decision-makers can ask, which effects are in play? What is their direction? With interventions and measurement that are very likely different from the present case, how large were the effects?3
Sometimes this is the best that social science can provide. And such answers can be quite useful in design. The results of psychology experiments can often be very effective when used generatively. For example, designers can use taxonomies of persuasive strategies to dream up some ways of producing desired behavior change.
Nonetheless, I think all this can be contrasted with some alternative practices that are both more quantitative and require less of this uneasy generalization. First, social scientists can give much more attention to point estimates of parameters. While not without its (other) flaws, the economics literature on financial returns to education has aimed to provide, criticize, and refine estimates of just how much wages increase (on average) with more education.4
Second, researchers can avoid much of the messiest kinds of generalization altogether. Within the Internet industry, product optimization experiments are ubiquitous. Google, Yahoo, Facebook, Microsoft, and many others are running hundreds to thousands of simultaneous experiments with parts of their services. This greatly simplifies generalization: the exact intervention under consideration has just been tried with a random sample from the very population it will be applied to. If someone wants to tweak the intervention, just try it again before launching. This process still involves human judgment about how to react to these results.5 An even more extreme alternative is when machine learning is used to fine-tune, e.g., recommendations without direct involvement (or understanding) by humans.
So am I saying that social psychology — at least as an enterprise that is useful to designers and decision-makers — is going to be replaced by simple “bake-off” experiments and machine learning? Not quite. Unlike product managers at Google, many decision-makers don’t have the ability to cheaply test a proposed intervention on their population of interest.6 Even at Google, many changes (or new products) under consideration are too difficult to build to them all: one has to decide among an overabundance of options before the most directly applicable data could be available. This is consistent with my note above that social-psychological findings can make excellent inspiration during idea generation and early evaluation.
- To parrot Andrew Gelman, in social phenomena, everything affects everything else. There are no betas that are exactly zero. [↩]
- It's also often implausible that the direction of the effect must be preserved. [↩]
- Major figures in social psychology, such as Lee Ross, have worked on trying to better anticipate the effects of social interventions from theory. It isn’t easy. [↩]
- The diversity of the manipulations used by social psychologists ostensibly studying the same thing can make this more difficult. [↩]
- Generalization is not avoided. In particular, decision-makers often have to consider what would happen if an intervention tested with 1% of the population is launched for the whole population. There are all kinds of issues relating to peer influence, network effects, congestion, etc., here that don’t allow for simple extrapolation from the treatment effects identified by the experiment. Nonetheless, these challenges obviously apply to most research that aims to predict the effects of causes. [↩]
- However, Internet services play a more and more central role in many parts of our life, so this doesn’t just have to be limited to the Internet industry itself. [↩]
Alex Tabarrok at Marginal Revolution blogs about how some ideas seem notably behind their time:
We are all familiar with ideas said to be ahead of their time, Babbage’s analytical engine and da Vinci’s helicopter are classic examples. We are also familiar with ideas “of their time,” ideas that were “in the air” and thus were often simultaneously discovered such as the telephone, calculus, evolution, and color photography. What is less commented on is the third possibility, ideas that could have been discovered much earlier but which were not, ideas behind their time.
In comparing ideas behind and ahead of their times, it’s worth considering the processes that identify them as such.
In the case of ideas ahead of their time, we rely on records and other evidence of their genesis (e.g., accounts of the use of flamethrowers at sea by the Byzantines ). Later users and re-discoverers of these ideas are then in a position to marvel at their early genesis. In trying to see whether some idea qualifies as ahead of its time, this early genesis, lack or use or underuse, followed by extensive use and development together serve as evidence for “ahead of its time” status.
On the other hand, in identifying ideas behind their time, it seems that we need different sorts of evidence. Taborrok uses the standard of whether their fruits could have been produced a long time earlier (“A lot of the papers in say experimental social psychology published today could have been written a thousand years ago so psychology is behind its time”). We need evidence that people in a previous time had all the intellectual resources to generate and see the use of the idea. Perhaps this makes identifying ideas behind their time harder or more contentious.
Y(X = x) and P(Y | do(x))
Perhaps formal causal inference — and some kind of corresponding new notation, such as Pearl’s do(x) operator or potential outcomes — is an idea behind its time.1 Judea Pearl’s account of the history of structural equation modeling seems to suggest just this: exactly what the early developers of path models (Wright, Haavelmo, Simon) needed was new notation that would have allowed them to distinguish what they were doing (making causal claims with their models) from what others were already doing (making statistical claims).2
In fact, in his recent talk at Stanford, Pearl suggested just this — that if the, say, the equality operator = had been replaced with some kind of assignment operator (say, :=), formal causal inference might have developed much earlier. We might be a lot further along in social science and applied evaluation of interventions if this had happened.
This example raises some questions about the criterion for ideas behind their time that “people in a previous time had all the intellectual resources to generate and see the use of the idea” (above). Pearl is a computer scientist by training and credits this background with his approach to causality as a problem of getting the formal language right — or moving between multiple formal languages. So we may owe this recent development to comfort with creating and evaluating the qualities of formal languages for practical purposes — a comfort found among computer scientists. Of course, e.g., philosophers and logicians also have been long comfortable with generating new formalisms. I think of Frege here.
So I’m not sure whether formal causal inference is an idea behind its time (or, if so, how far behind). But I’m glad we have it now.
- There is a “lively” debate about the relative value of these formalisms. For many of the dense causal models applicable to the social sciences (everything is potentially a confounder), potential outcomes seem like a good fit. But they can become awkward as the causal models get complex, with many exclusion restrictions (i.e. missing edges). [↩]
- See chapter 5 of Pearl, J. (2009). Causality: Models, Reasoning and Inference. 2nd Ed. Cambridge University Press. [↩]
Much of current human-computer interaction (HCI) research focuses on novice users in “walk-up and use” scenarios. I can think of three major causes for this:
- A general shift from examining non-discretionary use to discretionary use
- How much easier it is to find (and not train) study participants unfamiliar with a system than experts (especially with a system that is only a prototype)
- The push from practitioners in the direction, especially with the advent of the Web, where new users just show up at your site, often deep-linked
This focus sometimes comes in for criticism, especially when #2 is taken as a main cause of the choice.
On the other hand, some research threads in HCI continue to focus on expert use. As I’ve been reading a lot of research on both human performance modeling and situated & embodied approaches to HCI, it has been interesting to note that both instead have (comparatively) a much bigger focus on the performance and experience of expert and skilled use.
Grudin’s “Three Faces of Human-Computer Interaction” does a good job of explaining the human performance modeling (HPM) side of this. HPM owes a lot to human factors historically, and while The Psychology of Human-Computer Interaction successfully brought engineering-oriented cognitive psychology to the field, it was human factors, said Stuart Card, “that we were trying to improve” (Grudin 2005, p. 7). And the focus of human factors, which arose from maximizing productivity in industrial settings like factories, has been non-discretionary use. Fundamentally, it is hard for HPM to exist without a focus on expert use because many of the differences — and thus research contributions through new interaction techniques — can only be identified and are only important for use by experts or at least trained users. Grudin notes:
A leading modeler discouraged publication of a 1984 study of a repetitive task that showed people preferred a pleasant but slower interaction technique—a result significant for discretionary use, but not for modeling aimed at maximizing performance.
Situated action and embodied interaction approaches to HCI, which Harrison, Tatar, and Senger (2007) have called the “third paradigm of HCI”, are a bit different story. While HPM research, like a good amount in traditional cognitive science generally, contributes to science and design by assimilating people to information processors with actuators, situated and embodied interaction research borrows a fundamental concern of ethnomethodology, focusing on how people actively make behaviors intelligible by assimilating them to social and rational action.
There are at least three ways this motivates the study of skilled and expert users:
- Along with this research topic comes a methodological concern for studying behavior in context with the people who really do it. For example, to study publishing systems and technology, the existing practices of people working in such a setting of interest are of critical importance.
- These approaches emphasize the skills we all have and the value of drawing on them for design. For example, Dourish (2001) emphasizes the skills with which we all navigate the physical and social world as a resource for design. This is not unrelated to the first way.
- These approaches, like and through their relationships to the participatory design movement, have a political, social, and ethical interest in empowering those who will be impacted by technology, especially when otherwise its design — and the decision to adopt it — would be out of their control. Non-discretionary use in institutions is the paradigm prompting situation for this.
I don’t have a broad conclusion to make. Rather, I just find it of note and interesting that these two very different threads in HCI research stand out from much other work as similar in this regard. Some of my current research is connecting these two threads, so expect more on their relationship.
Dourish, P. (2001). Where the Action Is: The Foundations of Embodied Interaction. MIT Press.
Grudin, J. (2005). Three Faces of Human-Computer Interaction. IEEE Ann. Hist. Comput. 27, 4 (Oct. 2005), 46-62.
Harrison, S., Tatar, D., and Senger, P. (2007). The Three Paradigms of HCI. Extended Abstracts CHI 2007.
Two personal-professional narratives that I’ve been somewhat familiar with for a while have recently highlighted for me the significance of riskful decisions and thinking in academia. I think the stories are interesting on their own, but they also emphasize some questions and concerns for the functioning of scholarly inquiry.
The first is about the American philosopher Donald Davidson, whose work has long been of great interest to me (and was the topic of my undergraduate Honors thesis). The second is about Cliff Nass (Clifford Nass), Professor of Communication at Stanford, an advisor and collaborator. The major published source I draw on for each of these narratives is an interview: for Davidson’s story, it is an interview by Ernest Lepore (2004), a critic and expositor of Davidson’s philosophy; for Cliff Nass, it is an interview by Tamara Adlin (2007). After sharing these stories, I’ll discuss some similarities and briefly discuss risk-taking in decisions and thinking.
Donald Davidson is considered one of the most important and influential philosophers of the past 60 years, and he is my personal favorite. Davidson is often described as a highly systematic philosopher — uncharacteristically so for 20th century philosophy, in that his contributions to several areas of philosophy (philosophy of language, mind, and action, semantics, and epistemology) are deeply connected in their method and the proposed theories. He is the paradigmatic programmatic philosopher of the 20th century.
Despite this, Davidson’s philosophical program did not emerge until relatively late in his career. The same is true of his publications in general. Only after accepting a tenure track position at Stanford in 1951 (which was then still up-and-coming, though quickly, in philosophy) did he begin to publish (nothing was even in the “pipeline” previous to this). This began under the wing of the younger Patrick Suppes, with whom Davidson co-authored a book (1957) on decision theory. His first philosophical article appears in 1963 (which he authored alone only through an unexpected death). As Davidson puts it in an interview with Ernest Lepore, “I was very inhibited so far as publication was concerned” and was worried “that the minute I actually published something, everyone was going to jump on me” (Davidson 2004).
Then Davidson published “Actions, Reasons and Causes” (1963), twelve years after joining the Stanford faculty. It argues against the late-Wittgensteinian dogma that reasons are not also causes. It is only with this paper that there was a publication by Davidson that drew significant attention from the community (beginning with a presentation of the paper at a meeting of the American Philosophical Association). This paper has been hugely influential and alone identified Davidson as an important thinker in the field, though he was surprised the reception was not as overwhelming as he had thought: “I didn’t realize that if you publish, as far as I can tell, no one was going to pay any attention.” Many responses, both positive and critical, did eventually come, and Davidson went on to publish many highly influential papers, reaching the height of his immense scholarly influence in the 1970s and 1980s.
Clifford Nass is widely known researcher in the psychology of human-computer interaction (HCI). With Byron Reeves, he wrote The Media Equation (1996), which presents research carried out at Stanford University on how people respond in mediated interactions (e.g. with computers and televisions) by overextending social rules normally applied to other people. This hints at the (here simplified) straight, bold line of Nass’s research program: take a finding from social psychology, replace the second human with a computer, see if you get the same results. This exact strategy has been modified and expanded from, but the general consistency of Nass’s program over many years is striking for HCI: unlike in psychology, for example, in HCI there are many investigators seeking low-hanging fruit and quickly moving on to new projects.
Nass likes to refer to his “accidental PhD”, as he hadn’t intended to get a PhD in sociology. After working for a year at Intel, he was planning to matriculate in a electrical engineering PhD program, but an unexpected death postponed that. “[J]ust to bide my time and to have some flexibility, I ended up doing a sociology degree,” says Nass. He did his dissertation on the role of pre-processing jobs in labor, taking an approach that was radical in its elimination of a role for people and that connected with contemporary research by social science outsiders doing “sociocybernetics”. With such a dissertation topic (and the dissertation itself unfinished), finding a job did not seem easy at the outset: “It’s a nutty topic. I was going to be in trouble getting jobs. I had published stuff and was doing work and all that, but my dissertation was so weird” (Adlin 2007).
There was, however, a bit of luck, well taken advantage of by Nass: the Stanford Communication Department was under construction and looking to hire some folks doing weird work. So when Nass interviewed, impressing both them and the Sociology Department, he got the job, despite knowing nothing about Communication as a discipline and having been to no conferences in the field. After beginning at Stanford, Nass was seeking a research program, as clearly there was something wrong, at least when it came to getting it accepted for academic publication, with his previous work: “I was having a terrible time getting my work accepted. In fact, to this day I’ve still never published anything off my dissertation, 20-odd years later. Because again, no field could figure out who owned the material. I got reviews like, ‘This work is offensive.‘”
But Nass couldn’t settle on any normal research program. He wanted to examine how people might treat computers socially. Getting funding for this work wouldn’t have been easy, but he got a grant that the grant administrator described as the 1 of 35 given that they chose to give to the “weirdest project that was proposed”. It wasn’t all easy from there, of course. For example, it took some time to design and carry out successful experiments in this program — and even longer to get the results published. But this risk-taking in distributing this grant helped enable the work to continue.
Cliff Nass is very clear about the role riskful decisions, in admissions, hiring, and funding, played in his success:
I was very lucky. I fear that those times are gone. I really do fear to a tremendous degree that the risk-taking these people were willing to do for me, to give me an opportunity, are gone. I try to remember that. […]
I benefited from the willingness of people to say, “We’re just going to roll the dice here.”
Of course, it isn’t just Cliff who got lucky; in a big sense we all did. His work has been an important influence in HCI and has contributed to our stores of both generalizable knowledge and new lenses for approaching how we get on in the world.
What does it mean for academic research, and science generally, if this choice and ability to take these risks evaporates? There is incredible competition for academic positions now, more so in some fields than others. And the best tool in getting a job is a whole list of publications accepted in important, mainstream journals in the field. There is a lot written about the competition for academic jobs and criteria for wading through applicants to sometimes a safe option. There are case studies of families of disciplines; for example, a study of the biosciences argues that market forces are failing to create sufficient job prospects for young investigators (Freeman et al. 2001).
I won’t review them all here. Instead I suggest an article for general readers from The New York Times about state and regional colleges’ use of non-tenure track positions, which has an impact of the institutions’ bottom line and flexibility (Finder 2007). This is part of a wider trend in how tenure is used that also impacts the academic freedom and resources that scholars have to pursue new research (Richardson 1999).
Enabling riskful thinking
Hans Ulrich Gumbrecht argues that “riskful thinking” is central to the value of the humanities and arts in academia. He defines riskful thinking as investigation that can’t be expected to produce results interpretable as easy answers, but that instead is likely to produce or highlight complex and confusing phenomena and problems. But I think that this is more broadly true. Riskful thinking is critical to interdisciplinary and pre-paradigmatic sciences, or disciplines long doing normal science but in need of a shake-up. These are situations where compelling phenomena can become paradigmatic cases for study and powerful vocabularies can allow formulating new problems and theories.
What threatens riskful thinking, and how can we enable it? What is so great about riskful thinking anyway, and what makes some riskful thinking so successful, while much of it is likely to fail? At Nokia Research Center in Palo Alto, our lab head John Shen champions the importance of risk taking in industry research, but also argues that risk-taking is often misunderstood and that it is only some kinds of risk-taking that are most important to cultivate in industry research.
Finally, a list of Davidson–Nass similarities, just for fun:
- Both were hired to tenure track positions at Stanford, where they first did and published highly influential work
- Both are easily and widely seen as highly programmatic, having defined a clear research program challenging to currently popular approaches and beliefs in their fields
- Both had great difficulty finding early, publishable success with their research programs, even after ceasing their early work (Davidson: Plato, empirical decision theory; Nass: information processing models of the labor force)
- Both had other draws and distractions (Davidson: business school, teaching plane identification in WWII; Nass: being a professional magician, working at Intel)
- Both produced dissertations viewed by others in the discipline as odd (Davison: Quine “was a little mystified by my writing on this. He never talked to me about it.”; Nass: “my PhD thesis was so bizarre”)
Finder, A. (2007, November 20). Decline of the Tenure Track Raises Concerns. The New York Times.
Freeman, R., Weinstein, E., Marincola, E., Rosenbaum, J., & Solomon, F. (2001). Careers: Competition and Careers in Biosciences. Science, 294(5550), 2293-2294.
Lepore, E. (2004). Interview with Donald Davidson. In Problems of Rationality, Oxford University Press, 2004, pp. 231-266.
Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are social actors. In Proc. of CHI 1994. ACM Press.
Reeves, B., & Nass, C. (1996). The media equation: how people treat computers, television, and new media like real people and places. Cambridge University Press.
Richardson, J. T. (1999). Tenure in the New Millenium. National Forum, 79(1), 19-23.
Sanford, J. (2000, November 17). ‘Elementary pleasures’ and ‘riskful thinking’ matter to Gumbrecht. Stanford Report.