“Discovering Supertaskers”: Challenges in identifying individual differences from behavior
Some new research from the University of Utah suggests that a small fraction of the population consists of “supertaskers” whose performance is not reduced by multitasking, such as when completing tasks on a mobile phone while driving.
“Supertaskers did a phenomenal job of performing several different tasks at once,” Watson says. “We’d all like to think we could do the same, but the odds are overwhelmingly against it.” (Wired News & Science News)
The researchers, Watson and Strayer, argue that they have good evidence for the existence of this individual variation. One can find many media reports of this “discovery” of “supertaskers” (e.g., Psychology Today). I do not think this conclusion is well justified.
First, let’s consider the methods used in this research. 100 college students each completed driving tasks and an auditory task on a mobile phone — separately and in combination — over a single 1.5 hour session. The auditory task is designed to measure differences in executive attention by requiring participants do hold past items in memory while completing math tasks. The researchers identified “supertaskers” as those participants who met the following “stringent” requirements: they were both (a) in the top 25% of participants in performance in the single-task portions and (b) and not different in their dual-task performance on at least three of the four measures by more than the standard error. Since two of the four measures are associated with each of the two tasks (driving: brake reaction time, following distance; mobile phone task: memory performance, math performance), this requires that ”supertaskers” do as well on both measures of either the driving or mobile phone task and one measure of the other task.
There may be many issues with the validity of the inference in this work. I want to focus on one in particular: the inference from the observation of differences between participants’ performance in a single 1.5 hour session to the conclusion that there are stable, “trait” differences among participants, such that some are “supertaskers”. This conclusion is simply not justified. To illustrate this, let’s consider how the methods of this study differ from those usually (and reasonably) used by psychologists to reach such conclusions.
Psychologists often study individual differences using the following approach. First, identify some plausible trait of individuals. Second, construct a questionnaire or other (perhaps behavioral) test that measures that trait. Third, demonstrate that this test has high reliability — that is, that the differences between people are much larger than the differences between the same person taking the test at different times. Fourth, then use this test to measure the trait and see if it predicts differences in some experiment. A key point here is that in order to conclude that the test measures a stable individual difference (i.e., a trait) researchers need to establish high test-retest reliability; otherwise, the test might just be measuring differences in temporary mood.
Returning to Watson and Strayer’s research, it is easy to see the problem: we have no idea whether the variation observed should be attributed to stable individual differences (i.e., being a “supertasker”) or to unstable differences. That is, if we brought those same “supertasker” participants back into the lab and they did another session, would they still exhibit the same lack of performance difference between the single- and dual-task conditions? This research gives us no reason that expect that they would.
Watson and Strayer do some additional analysis with the aim of ruling out their observations being a fluke. One might think this addresses my criticism, but it does not. They
performed a Monte Carlo simulation in which randomly selected single-dual task pairs of variables from the existing data set were obtained for each of the 4 dependent measures and then subjected to the same algorithm that was used to classify the supertaskers.
That is, they broke apart the single-task and dual-task data for each participant and created new simulated participants by randomly sampling pairs single- and dual-task data. They found that on this analysis there would be only 1/15th of the observed ”supertaskers”. This is a good analysis to do. However, this just demonstrates that being labeled a “supertasker” is likely caused by the single- and dual-task data being generated by the same person in the same session. This stills leaves it quite open (and more plausible to me) that participants’ were in varying states for the session and this explains their (temporary) “supertasking”. It also allows that this greater frequency of “supertaskers” is due to participants who do well in whatever task they are given first being more likely to do well in subsequent tasks.
My aim in this post is to suggest some challenges that this kind of approach has to face. Part of my interest in this is that I’m quite sympathetic to identifying stable, observed differences in behavior and then “working backwards” to characterizing the traits that explain these downstream differences. This exactly the approach that Maurits Kaptein and I are taking in our work on persuasion profiling: we observe how individuals respond to the use of different influence strategies and use this to (a) construct a “persuasion profile” for that individual and (b) characterize how much variation in the effects of these strategies there is in the population.
However, a critical step in this process is ruling out the alternative explanation that the observed differences are primarily due to differences in, e.g., mood, rather than stable individual differences. One way to do this is to observe the behavior in multiple sessions and multiple contexts. Another way to rule out this alternative explanation is if you observe a complex pattern of behavioral differences that previous work suggests could not be the result of temporary, unstable differences — or at least is more easily explained by previous theories about the relevant traits. That is, I’m enthusiastic about identifying stable, observed differences in behavior, but I don’t want to see researchers abandon the careful methods that have been used in the past to make the case for a new individual difference.
Watson, Strayer, and colleagues have apparently begun doing work that could be used to show the stability of the observed differences. The discussion section of their paper refers to some additional unpublished research in which they invited their “supertaskers” from this study and another study back into the lab and had them do some similar tasks measuring executive attention (but not driving) while in an fMRI machine. They report greater “coherence” in their performance in this second study and the previous study than control participants and better performance for “supertaskers” on dual-N-back tasks. But this is short of showing high test-retest reliability.
Since little is said about this work, I hesitate to conclude anything from it or criticize it. I’ve contacted the authors with the hope of learning more. My current sense is that Watson and Strayer’s entire case for “supertaskers” hinges on research of this kind.
References
Watson, J. M., & Strayer, D. L. (2010). Supertaskers: Profiles in Extraordinary Multi-tasking Ability. Psychonomic Bulletin and Review. Forthcoming. Retrieved from http://www.psych.utah.edu/lab/appliedcognition/publications/supertaskers.pdf
Clearly explained… so even those of us only moderately educated in researching techniques could clearly understand what your objections and/or queries mean with regards to the conclusions published in this study and why your suggestions for additional research are important.
The results from so many studies are published/announced in mainstream media with the firm suggestion that their findings are absolute with, of course, no discussion of the study’s methods. The public is often confused by those reports and/or erroneously come away the impression that the conclusions are facts. Your clear discussion of this study’s conclusion was helpful to me. You could do this more often.
Glad you found it clear & helpful.
I enjoyed this depiction of the science news cycle.