<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ready-to-hand &#187; research methods</title>
	<atom:link href="http://www.deaneckles.com/blog/category/research-methods/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.deaneckles.com/blog</link>
	<description>Dean Eckles on people, technology &#38; inference</description>
	<lastBuildDate>Wed, 11 Jan 2012 01:51:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>A deluge of experiments</title>
		<link>http://www.deaneckles.com/blog/632_a-deluge-of-experiments/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-deluge-of-experiments</link>
		<comments>http://www.deaneckles.com/blog/632_a-deluge-of-experiments/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 07:43:51 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[causal inference]]></category>
		<category><![CDATA[data collection]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[experiments]]></category>
		<category><![CDATA[HCI]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=632</guid>
		<description><![CDATA[The Atlantic reports on the data deluge and its value for innovation.1 I particularly liked how Erik Brynjolfsson and Andrew McAfee, who wrote the Atlantic piece, highlight the value of experimentation for addressing causal questions &#8212; and that many of the questions we care about are causal.2 In writing about experimentation, they report that Hal [...]]]></description>
			<content:encoded><![CDATA[<p><em>The Atlantic</em> <a href="http://www.theatlantic.com/business/archive/2011/11/the-big-data-boom-is-the-innovation-story-of-our-time/248215/">reports on the data deluge and its value for innovation</a>.<sup><a href="http://www.deaneckles.com/blog/632_a-deluge-of-experiments/#footnote_0_632" id="identifier_0_632" class="footnote-link footnote-identifier-link" title="I don&amp;#8217;t know that I would call much of it &amp;#8216;innovation&amp;#8217;. There is some outright innovation, but a lot of that is in the general strategies for using the data. There is much more gained in minor tweaking and optimization of products and services.">1</a></sup> I particularly liked how Erik Brynjolfsson and Andrew McAfee, who wrote the <em>Atlantic</em> piece, highlight the value of experimentation for addressing causal questions &#8212; and that many of the questions we care about are causal.<sup><a href="http://www.deaneckles.com/blog/632_a-deluge-of-experiments/#footnote_1_632" id="identifier_1_632" class="footnote-link footnote-identifier-link" title="Perhaps they even overstate the power of simple experiments. For example, they do not mention the fact that many times the results these kinds of experiments often change over time, so that what you learned 2 months ago is no longer true.">2</a></sup></p>
<p>In writing about experimentation, they report that Hal Varian, Google&#8217;s Chief Economist, estimates that Google runs &#8220;100-200 experiments on any given day&#8221;. This struck me as incredibly low! I would have guessed more like 10,000 or maybe more like 100,000. </p>
<p>The trick of course is how one individuates experiments. Say Google has an automatic procedure whereby each ad has a (small) random set of users who are prevented from seeing it and are shown the next best ad instead. Is this one giant experiment? Or one experiment for each ad?</p>
<p>This is a bit of a silly question.<sup><a href="http://www.deaneckles.com/blog/632_a-deluge-of-experiments/#footnote_2_632" id="identifier_2_632" class="footnote-link footnote-identifier-link" title="Note that two single-factor experiments over the same population with independent random assignment can be regarded as a single experiment with two factors.">3</a></sup> </p>
<p>But when most people &#8212; even statisticians and scientists &#8212; think of an experiment in this context, they think of something like Google or Amazon making a particular button bigger. (Maybe somebody thought making <em>that</em> button bigger would improve a particular metric.) They likely don&#8217;t think of automatically generating an experiment for every button, such that a random sample see that particular button slightly bigger. It&#8217;s these latter kinds of procedures that lead to thinking about tens of thousands of experiments. </p>
<p>That&#8217;s the real deluge of experiments.</p>
<ol class="footnotes"><li id="footnote_0_632" class="footnote">I don&#8217;t know that I would call much of it &#8216;innovation&#8217;. There is some outright innovation, but a lot of that is in the general strategies for using the data. There is much more gained in minor tweaking and optimization of products and services.</li><li id="footnote_1_632" class="footnote">Perhaps they even overstate the power of simple experiments. For example, they do not mention the fact that many times the results these kinds of experiments often change over time, so that what you learned 2 months ago is no longer true.</li><li id="footnote_2_632" class="footnote">Note that two single-factor experiments over the same population with independent random assignment can be regarded as a single experiment with two factors.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/632_a-deluge-of-experiments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Against between-subjects experiments</title>
		<link>http://www.deaneckles.com/blog/577_against-between-subjects-experiments/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=against-between-subjects-experiments</link>
		<comments>http://www.deaneckles.com/blog/577_against-between-subjects-experiments/#comments</comments>
		<pubDate>Thu, 09 Jun 2011 06:11:23 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[average treatment effects]]></category>
		<category><![CDATA[individual differences]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research methods]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=577</guid>
		<description><![CDATA[A less widely known reason for using within-subjects experimental designs in psychological science. In a within-subjects experiment, each participant experiences multiple conditions (say, multiple persuasive messages), while in a between-subjects experiment, each participant experiences only one condition. If you ask a random social psychologist, &#8220;Why would you run a within-subjects experiment instead of a between-subjects [...]]]></description>
			<content:encoded><![CDATA[<p><em>A less widely known reason for using within-subjects experimental designs in psychological science. In a </em>within-subjects<em> experiment, each participant experiences multiple conditions (say, multiple persuasive messages), while in a </em>between-subjects<em> experiment, each participant experiences only one condition.</em></p>
<p>If you ask a random social psychologist, &#8220;Why would you run a within-subjects experiment instead of a between-subjects experiments?&#8221;, the most likely answer is &#8220;power&#8221; &#8212; within-subjects experiments provide more power. That is, with the same number of participants, within-subjects experiments allow investigators to more easily tell that observed differences between conditions are not due to chance.<sup><a href="http://www.deaneckles.com/blog/577_against-between-subjects-experiments/#footnote_0_577" id="identifier_0_577" class="footnote-link footnote-identifier-link" title="And to more precisely estimate these differences. Though social psychologist often don&amp;#8217;t care about estimation, since many social psychological theories are only directional.">1</a></sup></p>
<p>Why do within-subjects experiments increase power? Because responses by the same individual are generally dependent; more specifically, they are often positively correlated. Say an experiment involves evaluating products, people, or policy proposals under different conditions, such as the presence of different persuasive cues or following different primes. It is often the case that participants who rate an item high on a scale under one condition will rate other items high on that scale under other condition. Or participants with short response times for one task will have relatively short response times for another task. Et cetera. This positive association might be due to stable characteristics of people or transient differences such as mood. Thus, the increase in power is due to heterogeneity in how individuals respond to the stimuli. </p>
<p>However, this advantage of within-subjects designs is frequently overridden in social psychology by the appeal of between-subjects designs. The latter are widely regarded as &#8220;cleaner&#8221; as they avoid carryover effects &#8212; in which one condition may effect responses to subsequent conditions experienced by the same participant. They can also be difficult to design when studies involve deception &#8212; even just deception about the purpose of the study &#8212; and one-shot encounters. Because of this, between-subjects designs are much more common in social psychology than within-subjects designs: investigators don&#8217;t regard the complexity of conducting within-subjects designs as worth it for the gain in power, which they regard as the primary advantage of within-subjects designs.</p>
<p>I want to point out another &#8212; but related &#8212; reason for using within-subjects designs: between-subjects experiments often do not allow consistent estimation of the parameters of interest. Now, between-subjects designs are great for estimating average treatment effects (ATEs), and ATEs can certainly be of great interest. For example, if one is interested how a design change to a web site will effect sales, an ATE estimated from an A-B test with the very same population will be useful. But this isn&#8217;t enough for psychological science for two reasons. First, social psychology experiments are usually very different from the circumstances of potential application: the participants are undergraduate students in psychology and the manipulations and situations are not realistic. So the ATE from a psychology experiment might not say much about the ATE for a real intervention. Second, social psychologists regard themselves as building and testing theories about psychological processes. By their nature, psychological processes occur within individuals. So an ATE won&#8217;t do &#8212; in fact, it can be a substantially biased estimate of the psychological parameter of interest. </p>
<p>To illustrate this problem, consider an example where the outcome of an experiment is whether the participant says that a job candidate should be hired. For simplicity, let&#8217;s say this is a binary outcome: either they say to hire them or not. Their judgements might depend on some discrete scalar X. Different participants may have different thresholds for hiring the applicant, but otherwise be effected by X in the same way. In a logistic model, that is, each participant has their own intercept but all the slopes are the same. This is depicted with the grey curves below.<sup><a href="http://www.deaneckles.com/blog/577_against-between-subjects-experiments/#footnote_1_577" id="identifier_1_577" class="footnote-link footnote-identifier-link" title="This example is very directly inspired by Alan Agresti&amp;#8217;s Categorical Data Analysis, p. 500.">2</a></sup><br />
<div id="attachment_583" class="wp-caption alignnone" style="width: 452px"><a href="http://www.deaneckles.com/blog/wp-content/uploads/2011/06/marginal_conditional_logit.png"><img src="http://www.deaneckles.com/blog/wp-content/uploads/2011/06/marginal_conditional_logit.png" alt="Comparison of marginal and conditional logit functions" title="Comparison of marginal and conditional logit functions" width="442" height="233" class="size-full wp-image-583" /></a><p class="wp-caption-text">Marginal (blue) and conditional (grey) expectation functions</p></div></p>
<p>These grey curves can be estimated if one has multiple observations per participant at different values of X. However, in a between-subjects experiment, this is not the case. As an estimate of a parameter of the psychological process common to all the participants, the estimated slope from a between-subjects experiment will be biased. This is clear in the figure above: the blue curve (the marginal expectation function) is shallower than any of the individual curves.</p>
<p>More generally, between-subjects experiments are good for estimating ATEs and making striking demonstrations. But they are often insufficient for investigating psychological processes since any heterogeneity &#8212; even only in intercepts &#8212; produces biased estimates of the parameters of psychological processes, including parameters that are universal in the population.</p>
<p>I see this as a strong motivation for doing more within-subjects experiments in social psychology. Unlike the power motivation for within-subjects designs, this isn&#8217;t solved by getting a larger sample of individuals. Instead, investigators need to think carefully about whether their experiments estimate any quantity of interest when there is substantial heterogeneity &#8212; as there generally is.<sup><a href="http://www.deaneckles.com/blog/577_against-between-subjects-experiments/#footnote_2_577" id="identifier_2_577" class="footnote-link footnote-identifier-link" title="The situation is made a bit &amp;#8220;better&amp;#8221; by the fact that social psychologists are often only concerned with determining the direction of effects, so maybe aren&amp;#8217;t worried that their estimates of parameters are biased. Of course, this is a problem in itself if the direction of the effect varies by individual. Here I have only treated the simpler case of universal function subject to a random shift.">3</a></sup></p>
<ol class="footnotes"><li id="footnote_0_577" class="footnote">And to more precisely estimate these differences. Though social psychologist often don&#8217;t care about estimation, since many social psychological theories are only directional.</li><li id="footnote_1_577" class="footnote">This example is very directly inspired by Alan Agresti&#8217;s <em>Categorical Data Analysis</em>, p. 500.</li><li id="footnote_2_577" class="footnote">The situation is made a bit &#8220;better&#8221; by the fact that social psychologists are often only concerned with determining the direction of effects, so maybe aren&#8217;t worried that their estimates of parameters are biased. Of course, this is a problem in itself if the direction of the effect varies by individual. Here I have only treated the simpler case of universal function subject to a random shift.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/577_against-between-subjects-experiments/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Marginal evidence for psychological processes</title>
		<link>http://www.deaneckles.com/blog/555_marginal-evidence-for-psychological-processes/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=marginal-evidence-for-psychological-processes</link>
		<comments>http://www.deaneckles.com/blog/555_marginal-evidence-for-psychological-processes/#comments</comments>
		<pubDate>Mon, 02 May 2011 07:38:27 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[causal inference]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=555</guid>
		<description><![CDATA[Some comments on problems with investigating psychological processes using estimates of average (i.e. marginal) effects. Hence the play on words in the title. Social psychology makes a lot of being theoretical. This generally means not just demonstrating an effect, but providing evidence about the psychological processes that produce it. Psychological processes are, it is agreed, [...]]]></description>
			<content:encoded><![CDATA[<p><em>Some comments on problems with investigating psychological processes using estimates of average (i.e. marginal) effects. Hence the play on words in the title.</em></p>
<p>Social psychology makes a lot of being theoretical. This generally means not just demonstrating an effect, but providing evidence about the psychological processes that produce it. Psychological processes are, it is agreed, <em>intra-individual processes</em>. To tell a story about a psychological process is to posit something going on &#8220;inside&#8221; people. It is quite reasonable that this is how social psychology should work &#8212; and it makes it consistent with much of cognitive psychology as well.</p>
<p>But the evidence that social psychology uses to support these theories about these intra-individual processes is largely evidence about effects of experimental conditions (or, worse, non-manipulated measures) <em>averaged across many participants</em>. That is, it is using estimates of marginal effects as evidence of conditional effects. This is intuitively problematic. Now, there is no problem when using experiments to study effects and processes that are homogenous in the population. But, of course, they aren&#8217;t: heterogeneity abounds. There is variation in how factors affect different people. This is why the causal inference literature has emphasized the differences among the average treatment effect, (average) treatment effect on the treated, local average treatment effect, etc.</p>
<p>Not only is this disconnect between marginal evidence and conditional theory trouble in the abstract, we know it has already produced many problems in the social psychology literature.<sup><a href="http://www.deaneckles.com/blog/555_marginal-evidence-for-psychological-processes/#footnote_0_555" id="identifier_0_555" class="footnote-link footnote-identifier-link" title="The situation is bad enough that I (and some colleagues) certainly don&amp;#8217;t even take many results in social psych as more than providing a possibly interesting vocabulary.">1</a></sup> Baron and Kenny (1986) is the most cited paper published in the <em>Journal of Personality and Social Psychology</em>, the leading journal in the field. It paints an rosy picture of what it is like to investigate psychological processes. The methods of analysis it proposes for investigating processes are almost ubiquitous in social psych.<sup><a href="http://www.deaneckles.com/blog/555_marginal-evidence-for-psychological-processes/#footnote_1_555" id="identifier_1_555" class="footnote-link footnote-identifier-link" title="Luckily, my sense is that they are waning a bit, partially because of illustrations of the method&amp;#8217;s bias.">2</a></sup> The trouble is that this approach is severely biased in the face of heterogeneity in the processes under study. This is usually described as problem of correlated error terms, omitted-variables bias, or adjusting for post-treatment variables. This is all true. But, in the most common uses, it is perhaps more natural to think of it as a problem of mixing up marginal (i.e. average) and conditional effects.<sup><a href="http://www.deaneckles.com/blog/555_marginal-evidence-for-psychological-processes/#footnote_2_555" id="identifier_2_555" class="footnote-link footnote-identifier-link" title="To translate to the terms used before, note that we want to condition on unobserved (latent) heterogeneity. If one doesn&amp;#8217;t, then there is omitted variable bias. This can be done with models designed for this purpose, such as random effects models.">3</a></sup></p>
<p>What&#8217;s the solution? First, it is worth saying that average effects are worth investigating! Especially if you are evaluating a intervention or drug that might really be used &#8212; or if you are working at another level of analysis than psychology. But if psychological processes are your thing, you must do better. </p>
<p>Social psychologists sometimes do condition on individual characteristics, but often this is a measure of a single trait (e.g., need for cognition) that cannot plausibly exhaust all (or even much) of the heterogeneity in the effects under study. Without much larger studies, they cannot condition on more characteristics because of estimation problems (too many parameters for their N). So there is bound to be substantial heterogeneity.</p>
<p>Beyond this, I think social psychology could benefit from a lot more within-subjects experiments. Modern statistical computing (e.g., tools for fitting mixed-effects or multilevel models) makes it possible &#8212; even easy &#8212; to use such data to estimate effects of the manipulated factors for each participant. If they want to make credible claims about processes, then within-subjects designs &#8212; likely with many measurements of each person &#8212; are a good direction to more thoroughly explore.</p>
<ol class="footnotes"><li id="footnote_0_555" class="footnote">The situation is bad enough that I (and some colleagues) certainly don&#8217;t even take many results in social psych as more than providing a possibly interesting vocabulary.</li><li id="footnote_1_555" class="footnote">Luckily, my sense is that they are waning a bit, partially because of illustrations of the method&#8217;s bias.</li><li id="footnote_2_555" class="footnote">To translate to the terms used before, note that we want to condition on unobserved (latent) heterogeneity. If one doesn&#8217;t, then there is omitted variable bias. This can be done with models designed for this purpose, such as random effects models.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/555_marginal-evidence-for-psychological-processes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Traits, adaptive systems &amp; dimensionality reduction</title>
		<link>http://www.deaneckles.com/blog/495_traits-adaptive-systems-dimensionality-reduction/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=traits-adaptive-systems-dimensionality-reduction</link>
		<comments>http://www.deaneckles.com/blog/495_traits-adaptive-systems-dimensionality-reduction/#comments</comments>
		<pubDate>Fri, 22 Apr 2011 03:07:58 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[data collection]]></category>
		<category><![CDATA[HCI]]></category>
		<category><![CDATA[influence]]></category>
		<category><![CDATA[persuasion profiling]]></category>
		<category><![CDATA[persuasive technology]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=495</guid>
		<description><![CDATA[Psychologists have posited numerous psychological traits and described causal roles they ought to play in determining human behavior. Most often, the canonical measure of a trait is a questionnaire. Investigators obtain this measure for some people and analyze how their scores predict some outcomes of interest. For example, many people have been interested in how [...]]]></description>
			<content:encoded><![CDATA[<p>Psychologists have posited numerous psychological traits and described causal roles they ought to play in determining human behavior. Most often, the canonical measure of a trait is a questionnaire. Investigators obtain this measure for some people and analyze how their scores predict some outcomes of interest. For example, many people have been interested in how psychological traits affect persuasion processes. Traits like need for cognition (NFC) have been posited and questionnaire items developed to measure them. Among other things, NFC affects how people respond to messages with arguments for varying quality.</p>
<p><strong>How useful are these traits for explanation, prediction, and adaptive interaction?</strong> I can&#8217;t address all of this here, but I want to sketch an argument for their irrelevance to adaptive interaction &#8212; and then offer a tentative rejoinder.</p>
<p>Interactive technologies can tailor their messages to the tastes and susceptibilities of the people interacting with and through them. It might seem that these traits should figure in the statistical models used to make these adaptive selections. After all, some of the possible messages fit for, e.g., coaching a person to meet their exercise goals are more likely to be effective for low NFC people than high NFC people, and vice versa. However, the standard questionnaire measures of NFC cannot often be obtained for most users &#8212; certainly not in commerce settings, and even people signing up for a mobile coaching service likely don&#8217;t want to answer pages of questions. On the other hand, some Internet and mobile services have other abundant data available about their users, which could perhaps be used to construct an alternative measure of these traits. <strong>The trait-based-adaptation recipe is</strong>: </p>
<ol>
<li>obtain the questionnaire measure of the trait for a sample, </li>
<li>predict this measure with data available for many individuals (e.g., log data), </li>
<li>use this model to construct a measure for out-of-sample individuals. </li>
</ol>
<p>This new measure could then be used to personalize the interactive experience based on this trait, such that if a version performs well (or poorly) for people with a particular score on the trait, then use (or don&#8217;t use) that version for people with similar scores.</p>
<p><strong>But why involve the trait at all?</strong> Why not just personalize the interactive experience based on the responses of similar others? Since the new measure of the trait is just based on the available behavioral, demographic, and other logged data, one could simply predict responses based on those measure. Put in geometric terms, if the goal is to project the effects of different message onto available log data, why should one project the questionnaire measure of the trait onto the available log data and then project the effects onto this projection? This seems especially unappealing if one doesn&#8217;t fully trust the questionnaire measure to be accurate or one can&#8217;t be sure about which the set of all the traits that make a (substantial) difference.</p>
<p>I find this argument quite intuitively appealing, and it seems to resonate with others.<sup><a href="http://www.deaneckles.com/blog/495_traits-adaptive-systems-dimensionality-reduction/#footnote_0_495" id="identifier_0_495" class="footnote-link footnote-identifier-link" title="I owe some clarity on this to some conversations with Mike Nowak, Maurits Kaptein, and others.">1</a></sup> But I think there are some reasons the recipe above could still be appealing.</p>
<p>One way to think about this recipe is as dimensionality reduction guided by theory about psychological traits. Available log data can often be used to construct countless predictors (or &#8220;features&#8221;, as the machine learning people call them). So one can very quickly get into a situation where the effective number of parameters for a full model predicting the effects of different messages is very large and will make for poor predictions. Nothing &#8212; no, not penalized regression, not even a support vector machine &#8212; makes this problem go away. Instead, one has to rely on the domain knowledge of the person constructing the predictors (i.e., doing the &#8220;feature engineering&#8221;) to pick some good ones.</p>
<p>So the tentative rejoinder is this: established psychological traits might often make good dimensions to predict effects of different version of a message, intervention, or experience with. And they may &#8220;come with&#8221; suggestions about what kinds of log data might serve as measures of them. They would be expected to be reusable across settings. Thus, I think this recipe is nonetheless deserves serious attention.</p>
<ol class="footnotes"><li id="footnote_0_495" class="footnote">I owe some clarity on this to some conversations with Mike Nowak, Maurits Kaptein, and others.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/495_traits-adaptive-systems-dimensionality-reduction/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Applying social psychology</title>
		<link>http://www.deaneckles.com/blog/333_applying-social-psychology/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=applying-social-psychology</link>
		<comments>http://www.deaneckles.com/blog/333_applying-social-psychology/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 22:00:59 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[causal inference]]></category>
		<category><![CDATA[econometrics]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[science studies]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=333</guid>
		<description><![CDATA[Some reflections on how &#8220;quantitative&#8221; social psychology is and how this matters for its application to design and decision-making &#8212; especially in industries touched by the Internet. In many ways, contemporary social psychology is dogmatically quantitative. Investigators run experiments, measure quantitative outcomes (even coding free responses to make them amenable to analysis), and use statistics [...]]]></description>
			<content:encoded><![CDATA[<p><em>Some reflections on how &#8220;quantitative&#8221; social psychology is and how this matters for its application to design and decision-making &#8212; especially in industries touched by the Internet.</em></p>
<p>In many ways, contemporary social psychology is dogmatically quantitative. Investigators run experiments, measure quantitative outcomes (even coding free responses to make them amenable to analysis), and use statistics to characterize the collected data. On the other hand, social psychology&#8217;s processes of stating and integrating its conclusions remain largely qualitative. Many hypotheses in social psychology state that some factor affects a process or outcome in one direction (i.e., &#8220;call&#8221; either beta > 0 or beta < 0). Reviews of research in social psychology often start with a simple effect and then note how many other variables moderate this effect. This is all quite fitting with the dominance of null-hypothesis significance testing (NHST) in much of psychology: rather than producing point estimates or confidence intervals for causal effects, it is enough to simply see how likely the observed data is given there there is no effect.<sup><a href="http://www.deaneckles.com/blog/333_applying-social-psychology/#footnote_0_333" id="identifier_0_333" class="footnote-link footnote-identifier-link" title="To parrot Andrew Gelman, in social phenomena, everything affects everything else. There are no betas that are exactly zero.">1</a></sup> Of course, there have been many efforts to change this. Many journals require reporting effect sizes. This is a good thing, but these effect sizes are rarely predicted by social psychological theory. Rather, they are reported to aid judgments of whether a finding is not only statistically significant but substantively or practically significant, and the theory predicts the direction of the effect.</p>
<p>Not only is this process of reporting and combining results not quantitative in many ways, but it requires substantial inference from the particular settings of conducted experiments to the present settings. This actually helps to make sense of the practices described above: many social psychology experiments are conducted in conditions and with populations that are so different from those in which people would like to apply the resulting theories, that expecting consistency of effect sizes is implausible.<sup><a href="http://www.deaneckles.com/blog/333_applying-social-psychology/#footnote_1_333" id="identifier_1_333" class="footnote-link footnote-identifier-link" title="It&#039;s also often implausible that the direction of the effect must be preserved.">2</a></sup> This is not to say that these studies cannot tell us a good deal about how people will behave in many circumstances. It's just that figuring out what they predict and whether these predictions are reliable is a very messy, qualitative process.</p>
<p>Thus, when it comes to making decisions -- about a policy, intervention, or service -- based on social-psychological research, this process is largely qualitative. Decision-makers can ask, <a href="http://brenocon.com/blog/2008/12/statistics-vs-machine-learning-fight/#comment-908">which effects are in play?</a> What is their direction? With interventions and measurement that are very likely different from the present case, how large were the effects?<sup><a href="http://www.deaneckles.com/blog/333_applying-social-psychology/#footnote_2_333" id="identifier_2_333" class="footnote-link footnote-identifier-link" title="Major figures in social psychology, such as Lee Ross, have worked on trying to better anticipate the effects of social interventions from theory. It isn&amp;#8217;t easy.">3</a></sup></p>
<p>Sometimes this is the best that social science can provide. And such answers can be quite useful in design. The results of psychology experiments can often be very effective when used generatively. For example, designers can use taxonomies of persuasive strategies to dream up some ways of producing desired behavior change.</p>
<p>Nonetheless, I think all this can be contrasted with some alternative practices that are both more quantitative and require less of this uneasy generalization. First, social scientists can give much more attention to point estimates of parameters. While not without its (other) flaws, the economics literature on financial returns to education has aimed to provide, criticize, and refine estimates of just how much wages increase (on average) with more education.<sup><a href="http://www.deaneckles.com/blog/333_applying-social-psychology/#footnote_3_333" id="identifier_3_333" class="footnote-link footnote-identifier-link" title="The diversity of the manipulations used by social psychologists ostensibly studying the same thing can make this more difficult.">4</a></sup></p>
<p>Second, researchers can avoid much of the messiest kinds of generalization altogether. Within the Internet industry, product optimization experiments are ubiquitous. Google, Yahoo, Facebook, Microsoft, and many others are running hundreds to thousands of simultaneous experiments with parts of their services. This greatly simplifies generalization: the exact intervention under consideration has just been tried with a random sample from the very population it will be applied to. If someone wants to tweak the intervention, just try it again before launching. This process still involves human judgment about how to react to these results.<sup><a href="http://www.deaneckles.com/blog/333_applying-social-psychology/#footnote_4_333" id="identifier_4_333" class="footnote-link footnote-identifier-link" title="Generalization is not avoided. In particular, decision-makers often have to consider what would happen if an intervention tested with 1% of the population is launched for the whole population. There are all kinds of issues relating to peer influence, network effects, congestion, etc., here that don&amp;#8217;t allow for simple extrapolation from the treatment effects identified by the experiment. Nonetheless, these challenges obviously apply to most research that aims to predict the effects of causes.">5</a></sup> An even more extreme alternative is when machine learning is used to fine-tune, e.g., recommendations without direct involvement (or understanding) by humans.</p>
<p>So am I saying that <strong>social psychology &#8212; at least as an enterprise that is useful to designers and decision-makers &#8212; is going to be replaced by simple &#8220;bake-off&#8221; experiments and machine learning</strong>? Not quite. Unlike product managers at Google, many decision-makers don&#8217;t have the ability to cheaply test a proposed intervention on their population of interest.<sup><a href="http://www.deaneckles.com/blog/333_applying-social-psychology/#footnote_5_333" id="identifier_5_333" class="footnote-link footnote-identifier-link" title="However, Internet services play a more and more central role in many parts of our life, so this doesn&amp;#8217;t just have to be limited to the Internet industry itself.">6</a></sup> Even at Google, many changes (or new products) under consideration are too difficult to build to them all: one has to decide among an overabundance of options before the most directly applicable data could be available. This is consistent with my note above that social-psychological findings can make excellent inspiration during idea generation and early evaluation. </p>
<ol class="footnotes"><li id="footnote_0_333" class="footnote">To parrot Andrew Gelman, in social phenomena, everything affects everything else. There are no betas that are exactly zero.</li><li id="footnote_1_333" class="footnote">It's also often implausible that the direction of the effect must be preserved.</li><li id="footnote_2_333" class="footnote">Major figures in social psychology, such as Lee Ross, have worked on trying to better anticipate the effects of social interventions from theory. It isn&#8217;t easy.</li><li id="footnote_3_333" class="footnote">The diversity of the manipulations used by social psychologists ostensibly studying the same thing can make this more difficult.</li><li id="footnote_4_333" class="footnote">Generalization is not avoided. In particular, decision-makers often have to consider what would happen if an intervention tested with 1% of the population is launched for the whole population. There are all kinds of issues relating to peer influence, network effects, congestion, etc., here that don&#8217;t allow for simple extrapolation from the treatment effects identified by the experiment. Nonetheless, these challenges obviously apply to most research that aims to predict the effects of causes.</li><li id="footnote_5_333" class="footnote">However, Internet services play a more and more central role in many parts of our life, so this doesn&#8217;t just have to be limited to the Internet industry itself.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/333_applying-social-psychology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Academia vs. industry: Harvard CS vs. Google edition</title>
		<link>http://www.deaneckles.com/blog/409_academia-vs-industry-harvard-cs-vs-google-edition/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=academia-vs-industry-harvard-cs-vs-google-edition</link>
		<comments>http://www.deaneckles.com/blog/409_academia-vs-industry-harvard-cs-vs-google-edition/#comments</comments>
		<pubDate>Tue, 16 Nov 2010 07:02:55 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[sociology]]></category>
		<category><![CDATA[surveillance]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=409</guid>
		<description><![CDATA[Matt Welsh, a professor in the Harvard CS department, has decided to leave Harvard to continue his post-tenure leave working at Google. Welsh is obviously leaving a sweet job. In fact, it was not long ago that he was writing about how difficult it is to get tenure at Harvard. So why is he leaving? [...]]]></description>
			<content:encoded><![CDATA[<p>Matt Welsh, a professor in the Harvard CS department, has <a href="http://matt-welsh.blogspot.com/2010/11/why-im-leaving-harvard.html">decided to leave Harvard</a> to continue his post-tenure leave working at Google. Welsh is obviously leaving a sweet job. In fact, it was not long ago that he was writing about <a href="http://matt-welsh.blogspot.com/2010/06/how-to-get-tenure-at-harvard.html">how difficult it is to get tenure at Harvard</a>.</p>
<p>So why is he leaving? Well, CS folks doing research in large distributed systems are in a tricky place, since the really big systems are all in industry. And instead of legions of experienced engineers to help build and study these systems, <a href="http://vonahn.blogspot.com/2010/06/outsourcing-my-research-group.html">they have a bunch of lazy grad students</a>! One might think, then,  that this kind of (tenured) professor to industry move is limited to people creating and studying large deployments of computer systems. </p>
<p>There is a broader pull, I think. For researchers studying many central topics in the social sciences (e.g., social influence), there is a big draw to industry, since it is corporations that are collecting broad and deep data sets describing human behavior. To some extent, this is also a case of industry being appealing for people studying deployment of large deployments of computer systems &#8212; but it applies even to those who don&#8217;t care much about the &#8220;computer&#8221; part. In further parallels to the case with CS systems researchers, in industry they have talented database and machine learning experts ready to help, rather than social science grad students who are (like the faculty) too often afraid of math.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/409_academia-vs-industry-harvard-cs-vs-google-edition/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Economic imperialism and causal inference</title>
		<link>http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=economic-imperialism-and-causal-inference</link>
		<comments>http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/#comments</comments>
		<pubDate>Tue, 05 Oct 2010 07:05:58 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[academia]]></category>
		<category><![CDATA[causal inference]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[sociology]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=324</guid>
		<description><![CDATA[And I, for one, welcome our new economist overlords&#8230; Readers not in academic social science may take the title of this post as indicating I&#8217;m writing about the use of economic might to imperialist ends.1 Rather, economic imperialism is a practice of economists (and acolytes) in which they invade research territories that traditionally &#8220;belong&#8221; to [...]]]></description>
			<content:encoded><![CDATA[<p><em>And I, for one, welcome our new economist overlords&#8230;</em></p>
<p>Readers not in academic social science may take the title of this post as indicating I&#8217;m writing about the use of economic might to imperialist ends.<sup><a href="http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/#footnote_0_324" id="identifier_0_324" class="footnote-link footnote-identifier-link" title="Well, if economists have better funding sources, this might apply in some sense.">1</a></sup> Rather, <em>economic imperialism</em> is a practice of economists (and acolytes) in which they invade research territories that traditionally &#8220;belong&#8221; to other social scientific disciplines.<sup><a href="http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/#footnote_1_324" id="identifier_1_324" class="footnote-link footnote-identifier-link" title="For arguments in favor of economic imperialism, see Lazear, E.P. (1999). Economic imperialism. NBER Working Paper No. 7300.">2</a></sup> See <a href="http://www.gocomics.com/bliss/2010/10/02/">this comic</a> for one way you can react to this.<sup><a href="http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/#footnote_2_324" id="identifier_2_324" class="footnote-link footnote-identifier-link" title="Or see this comic for imperialism by physicists.">3</a></sup></p>
<p>Economists bring their theoretical, statistical, and research-funding resources to bear on problems that might not be considered economics. For example, freakonomists like Levitt study sumo wrestlers and the effects of the legalization of abortion on crime. But, hey, if the <a href="http://www.fff.org/freedom/0895g.asp">Commerce Clause means that Congress can legislate everything</a>, then, for the same reasons, economists can &#8212; no, must &#8212; study everything.</p>
<p>I am not an economist by training, but I have recently had reason to read quite a bit in econometrics. Overall, I&#8217;m impressed.<sup><a href="http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/#footnote_3_324" id="identifier_3_324" class="footnote-link footnote-identifier-link" title="At least by the contemporary literature on what I&amp;#8217;ve been reading on &amp;#8212; IVs, encouragement designs, endogenous interactions, matching estimators. But it is true that in some of these areas econometrics has been able to fruitfully borrow from work on potential outcomes in statistics and epidemiology.">4</a></sup> Economists have recently taken causal inference &#8212; learning about cause and effect relationships, often from observational data &#8212; quite seriously. In the eyes of some, this has precipitated a &#8220;credibility revolution&#8221; in economics. Certainly, papers in economics and (especially) econometrics journals consider threats to the validity of causal inference at length.</p>
<p>On the other hand, causal inference in the rest of the social sciences is <em>simultaneously over-inhibited and under-inhibited</em>. As Judea Pearl observes in his book <em>Causality</em>, lack of clarity about statistical models (that social scientists often don&#8217;t understand) and causality has induced confusion about distinctions between statistical and causal issues (i.e., between estimation methods and identification).<sup><a href="http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/#footnote_4_324" id="identifier_4_324" class="footnote-link footnote-identifier-link" title="Econometricians have made similar observations.">5</a></sup></p>
<p>So, on the one had, <a href=" http://www.blog.sethroberts.net/2010/09/23/why-psychologists-dont-imitate-economists/">many psychologists stick to experiments</a>. Randomized experiments are, generally, the gold standard for investigating cause&#8211;effect relationships, so this can and often does go well. However, social psychologists have recently been obsessed with using &#8220;mediation analysis&#8221; to investigate the mechanisms by which causes they can manipulate produce effects of interest. Investigators often manipulate some factors experimentally and then measure one or more variables they believe fully or partially mediate the effect of those factors on their outcome. Then, under the standard Baron &#038; Kenny approach, psychologists fit a few regression models, including regressing the outcome on both the experimentally manipulated variables and the simply measured (mediating) variables. The assumptions required for this analysis to identify any effects of interest are rarely satisfied (e.g., effects on individuals are homogenous).<sup><a href="http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/#footnote_5_324" id="identifier_5_324" class="footnote-link footnote-identifier-link" title="For a bit on this topic, see the discussion and links to papers here.">6</a></sup> So psychologists are often over-inhibited (experiments only please!) and under-inhibited (mediation analysis).</p>
<p>Likewise, in more observational studies (in psychology, sociology, education, etc.), investigators are sometimes wary of making explicit causal claims. So instead of carefully stating the causal assumptions that would justify different causal conclusions, readers are left with phrases like &#8220;suggests&#8221; and &#8220;is consistent with&#8221; followed by causal claims. Authors then recommend that further research be conducted to better support these causal conclusions. With these kinds of recommendations awaiting, no wonder that economists find the territory ready for taking: they can just show up with econometrics tools and get to work on hard-won questions the rightly belong to others!</p>
<ol class="footnotes"><li id="footnote_0_324" class="footnote">Well, if economists have better funding sources, this might apply in some sense.</li><li id="footnote_1_324" class="footnote">For arguments in favor of economic imperialism, see Lazear, E.P. (1999). <a href="http://www.nber.org/papers/w7300">Economic imperialism</a>. NBER Working Paper No. 7300.</li><li id="footnote_2_324" class="footnote">Or see <a href="http://xkcd.com/793/">this comic</a> for imperialism by physicists.</li><li id="footnote_3_324" class="footnote">At least by the contemporary literature on what I&#8217;ve been reading on &#8212; IVs, encouragement designs, endogenous interactions, matching estimators. But it is true that in some of these areas econometrics has been able to fruitfully borrow from work on potential outcomes in statistics and epidemiology.</li><li id="footnote_4_324" class="footnote">Econometricians have made similar observations.</li><li id="footnote_5_324" class="footnote">For a bit on this topic, see the discussion and links to papers <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2010/03/criticizing_sta.html">here</a>.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/324_economic-imperialism-and-causal-inference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Homophily and peer influence are messy business</title>
		<link>http://www.deaneckles.com/blog/317_homophily-and-peer-influence-are-messy-business/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=homophily-and-peer-influence-are-messy-business</link>
		<comments>http://www.deaneckles.com/blog/317_homophily-and-peer-influence-are-messy-business/#comments</comments>
		<pubDate>Fri, 01 Oct 2010 22:55:05 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[causal inference]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[influence]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[sociology]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=317</guid>
		<description><![CDATA[Some social scientists have recently been getting themselves into trouble (and limelight) claiming that they have evidence of direct and indirect &#8220;contagion&#8221; (peer influence effects) in obesity, happiness, loneliness, etc. Statisticians and methodologists &#8212; and even science journalists &#8212; have pointed out their troubles. In observational data, peer influence effects are confounded with those of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://connectedthebook.com/">Some social scientists</a> have recently been getting themselves into trouble (and limelight) claiming that they have evidence of direct and indirect &#8220;contagion&#8221; (peer influence effects) in obesity, happiness, loneliness, etc. Statisticians and methodologists &#8212; and <a href="http://www.slate.com/id/2250102/entry/2250103/">even science journalists</a> &#8212; have pointed out their troubles. In observational data, peer influence effects <a href="http://arxiv.org/abs/1004.4704">are confounded with those of homophily and common external causes</a>. That is, people are similar to other people in their social neighborhood because ties are more likely to form between similar people, and many external events that could cause the outcome are localized in networks (e.g., fast food restaurant opens down the street).</p>
<p>Econometricians<sup><a href="http://www.deaneckles.com/blog/317_homophily-and-peer-influence-are-messy-business/#footnote_0_317" id="identifier_0_317" class="footnote-link footnote-identifier-link" title="They do statistics but speak a different language than big &amp;#8220;S&amp;#8221; statisticians &amp;#8212; kind of like machine learning folks.">1</a></sup> have worked out the conditions necessary for peer influence effects to be identifiable.<sup><a href="http://www.deaneckles.com/blog/317_homophily-and-peer-influence-are-messy-business/#footnote_1_317" id="identifier_1_317" class="footnote-link footnote-identifier-link" title="For example, see Manski, C. F. (2000). Economic analysis of social interactions.  Journal of Economic Perspectives, 14(3):115&ndash;136. Economists call peer influence effects endogenous interactions and contextual interactions.">2</a></sup> Very few studies have plausibly satisfied these requirements. But even if an investigator meets these requirements, it is worth remembering that homophily and peer influence are still tricky to think about &#8212; let along produce credible quantitative estimates of.</p>
<p>As <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2010/04/controversy_ove_1.html">Andrew Gelman notes</a>, homophily can depend on network structure and information cascades (a kind of peer influence effect) to enable the homophilous relationships to form. Likewise, the success or failure of influence in a relationship can affect that relationship. For example, once I convert you to my way of thinking &#8212; let&#8217;s say, about climate change, we&#8217;ll be better friends. To me, it seems like some of the downstream consequences of our similarity should be attributed to peer influence. If I get fat and so you do, it could be peer influence in many ways: maybe that&#8217;s because I convinced you that owning a propane grill is more environmentally friendly (and then we both ended up grilling a lot more red meat). Sounds like peer influence to me. But it&#8217;s not that me getting fat caused you to.</p>
<p>Part of the problem here is looking only at peer influence effects in a single behavior or outcome at once. I look forward to the &#8220;clear thinking and adequate data&#8221; (Manski) that will allow us to better understand these processes in the future. Until then: scientists, please at least be modest in your claims and radical policy recommendations. This is messy business.</p>
<ol class="footnotes"><li id="footnote_0_317" class="footnote">They do statistics but speak a different language than big &#8220;S&#8221; statisticians &#8212; kind of like machine learning folks.</li><li id="footnote_1_317" class="footnote">For example, see Manski, C. F. (2000). <a href="http://www.cmap.polytechnique.fr/~rama/ehess/manski2.pdf">Economic analysis of social interactions.</a>  <em>Journal of Economic Perspectives, 14</em>(3):115–136. Economists call peer influence effects endogenous interactions and contextual interactions.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/317_homophily-and-peer-influence-are-messy-business/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Aardvark&#8217;s use of Wizard of Oz prototyping to design their social interfaces</title>
		<link>http://www.deaneckles.com/blog/305_aardvarks-use-of-wizard-of-oz-prototyping-to-design-their-social-interfaces/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=aardvarks-use-of-wizard-of-oz-prototyping-to-design-their-social-interfaces</link>
		<comments>http://www.deaneckles.com/blog/305_aardvarks-use-of-wizard-of-oz-prototyping-to-design-their-social-interfaces/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 02:25:41 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[communication]]></category>
		<category><![CDATA[data collection]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[HCI]]></category>
		<category><![CDATA[information needs]]></category>
		<category><![CDATA[markets]]></category>
		<category><![CDATA[Mechanical Turk]]></category>
		<category><![CDATA[needfinding]]></category>
		<category><![CDATA[prototyping]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[social responses to communication technologies]]></category>
		<category><![CDATA[social software]]></category>
		<category><![CDATA[source orientation]]></category>
		<category><![CDATA[usability]]></category>
		<category><![CDATA[Wizard of Oz]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=305</guid>
		<description><![CDATA[The Wall Street Journal&#8217;s Venture Capital Dispatch reports on how Aardvark, the social question asking and answering service recently acquired by Google, used a Wizard of Oz prototype to learn about how their service concept would work without building all the tech before knowing if it was any good. Aardvark employees would get the questions [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://blogs.wsj.com/venturecapital/2010/04/24/how-a-start-up-grew-by-paying-attention-to-whats-behind-the-curtain/">Wall Street Journal&#8217;s Venture Capital Dispatch reports</a> on how <a href="http://blogs.wsj.com/venturecapital/2010/04/24/how-a-start-up-grew-by-paying-attention-to-whats-behind-the-curtain/">Aardvark</a>, the social question asking and answering service recently acquired by Google, used a <a href="http://www.usabilitynet.org/tools/wizard.htm">Wizard of Oz prototype</a> to learn about how their service concept would work without building all the tech before knowing if it was any good.</p>
<blockquote><p>Aardvark employees would get the questions from beta test users and route them to users who were online and would have the answer to the question. This was done to test out the concept before the company spent the time and money to build it, said Damon Horowitz, co-founder of Aardvark, who spoke at Startup Lessons Learned, a conference in San Francisco on Friday.</p>
<p>“If people like this in super crappy form, then this is worth building, because they’ll like it even more,” Horowitz said of their initial idea.</p>
<p>At the same time it was testing a “fake” product powered by humans, the company started building the automated product to replace humans. While it used humans “behind the curtain,” it gained the benefit of learning from all the questions, including how to route the questions and the entire process with users.</p></blockquote>
<p>This is a really good idea, as I&#8217;ve argued before <a href="http://www.deaneckles.com/blog/16_using-a-wizard-of-oz-technique-in-mobile-service-design-probing-with-realistic-motivations/">on this blog</a> and in <a href="http://www.amazon.com/dp/0979502543/">a chapter for developers of mobile health interventions</a>. What better way to (a) learn about how people will use and experience your service and (b) get training data for your machine learning system than to have humans-in-the-loop run the service?</p>
<p>My friend <a href="http://www.chrisstreeter.com/">Chris Streeter</a> wondered whether this was all done by Aardvark employees or whether workers on Amazon Mechanical Turk may have also been involved, especially in identifying the expertise of the early users of the service so that the employees could route the questions to the right place. I think this highlights how different parts of a service can draw on human and non-human intelligence in a variety of ways &#8212; via a micro-labor market, using skilled employees who will gain hands-on experience with customers, etc.</p>
<p>I also wonder what UIs the humans-in-the-loop used to accomplish this. It&#8217;d be great to get a peak. I&#8217;d expect that these were certainly rough around the edges, as was the Aardvark customer-facing UI.</p>
<p>Aardvark does a good job of being a quite sociable agent (e.g., when using it via instant messaging) that also gets out of the way of the human&#8211;human interaction between question askers and answers. I wonder how the language used by humans to coordinate and hand-off questions may have played into creating a positive para-social interaction with vark.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/305_aardvarks-use-of-wizard-of-oz-prototyping-to-design-their-social-interfaces/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>&#8220;Discovering Supertaskers&#8221;: Challenges in identifying individual differences from behavior</title>
		<link>http://www.deaneckles.com/blog/276_discovering-supertaskers-challenges-in-identifying-individual-differences-from-behavior/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=discovering-supertaskers-challenges-in-identifying-individual-differences-from-behavior</link>
		<comments>http://www.deaneckles.com/blog/276_discovering-supertaskers-challenges-in-identifying-individual-differences-from-behavior/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 21:29:47 +0000</pubDate>
		<dc:creator>Dean Eckles</dc:creator>
				<category><![CDATA[HCI]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[multitasking]]></category>
		<category><![CDATA[research methods]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.deaneckles.com/blog/?p=276</guid>
		<description><![CDATA[Some new research from the University of Utah suggests that a small fraction of the population consists of &#8220;supertaskers&#8221; whose performance is not reduced by multitasking, such as when completing tasks on a mobile phone while driving. “Supertaskers did a phenomenal job of performing several different tasks at once,” Watson says. “We’d all like to [...]]]></description>
			<content:encoded><![CDATA[<p>Some <a href="http://www.psych.utah.edu/lab/appliedcognition/publications/supertaskers.pdf">new research</a> from the University of Utah suggests that a small fraction of the population consists of &#8220;supertaskers&#8221; whose performance is not reduced by multitasking, such as when completing tasks on a mobile phone while driving.</p>
<blockquote><p>“Supertaskers did a phenomenal job of performing several different tasks at once,” Watson says. “We’d all like to think we could do the same, but the odds are overwhelmingly against it.” (<a href="http://www.wired.com/wiredscience/2010/04/supertasker/">Wired News &amp; Science News</a>)</p></blockquote>
<p>The researchers, Watson and Strayer, argue that they have good evidence for the existence of this individual variation. One can find many media reports of this &#8220;discovery&#8221; of &#8220;supertaskers&#8221; (e.g., <a href="http://www.psychologytoday.com/blog/the-science-willpower/201003/you-are-not-supertasker"><em>Psychology Today</em></a>). I do not think this conclusion is well justified.</p>
<p>First, let&#8217;s consider the methods used in this research. 100 college students each completed driving tasks and an auditory task on a mobile phone &#8212; separately and in combination &#8212; over a single 1.5 hour session. The auditory task is designed to measure differences in executive attention by requiring participants do hold past items in memory while completing math tasks.  The researchers identified &#8220;supertaskers&#8221; as those participants who met the following &#8220;stringent&#8221; requirements: they were both (a) in the top 25% of participants in performance in the single-task portions and (b) and not different in their dual-task performance on at least three of the four measures by more than the standard error. Since two of the four measures are associated with each of the two tasks (driving: brake reaction time, following distance; mobile phone task: memory performance, math performance), this requires that &#8221;supertaskers&#8221; do as well on both measures of either the driving or mobile phone task and one measure of the other task.</p>
<p>There may be many issues with the validity of the inference in this work. I want to focus on one in particular: the inference from the observation of differences between participants&#8217; performance in a single 1.5 hour session to the conclusion that there are stable, &#8220;trait&#8221; differences among participants, such that some are &#8220;supertaskers&#8221;. This conclusion is simply not justified. To illustrate this, let&#8217;s consider how the methods of this study differ from those usually (and reasonably) used by psychologists to reach such conclusions.</p>
<p>Psychologists often study individual differences using the following approach. First, identify some plausible trait of individuals. Second, construct a questionnaire or other (perhaps behavioral) test that measures that trait. Third, demonstrate that this test has high reliability &#8212; that is, that the differences between people are much larger than the differences between the same person taking the test at different times. Fourth, then use this test to measure the trait and see if it predicts differences in some experiment. A key point here is that in order to conclude that the test measures a stable individual difference (i.e., a trait) researchers need to establish high test-retest reliability; otherwise, the test might just be measuring differences in temporary mood.</p>
<p>Returning to Watson and Strayer&#8217;s research, it is easy to see the problem: we have no idea whether the variation observed should be attributed to stable individual differences (i.e., being a &#8220;supertasker&#8221;) or to unstable differences. That is, if we brought those same &#8220;supertasker&#8221; participants back into the lab and they did another session, would they still exhibit the same lack of performance difference between the single- and dual-task conditions? This research gives us no reason that expect that they would.</p>
<p>Watson and Strayer do some additional analysis with the aim of ruling out their observations being a fluke. One might think this addresses my criticism, but it does not. They</p>
<blockquote><p>performed a Monte Carlo simulation in which randomly selected single-dual task pairs of variables from the existing data set were obtained for each of the 4 dependent measures and then subjected to the same algorithm that was used to classify the supertaskers.</p></blockquote>
<p>That is, they broke apart the single-task and dual-task data for each participant and created new simulated participants by randomly sampling pairs single- and dual-task data. They found that on this analysis there would be only 1/15th of the observed &#8221;supertaskers&#8221;. This is a good analysis to do. However, this just demonstrates that being labeled a &#8220;supertasker&#8221; is likely caused by the single- and dual-task data being generated by the same person in the same session. This stills leaves it quite open (and more plausible to me) that participants&#8217; were in varying states for the session and this explains their (temporary) &#8220;supertasking&#8221;. It also allows that this greater frequency of &#8220;supertaskers&#8221; is due to participants who do well in whatever task they are given first being more likely to do well in subsequent tasks.</p>
<p>My aim in this post is to suggest some challenges that this kind of approach has to face. Part of my interest in this is that I&#8217;m quite sympathetic to identifying stable, observed differences in behavior and then &#8220;working backwards&#8221; to characterizing the traits that explain these downstream differences. This  exactly the approach that Maurits Kaptein and I are taking in our work on <a href="http://www.deaneckles.com/blog/category/persuasion-profiling/">persuasion profiling</a>: we observe how individuals respond to the use of different influence strategies and use this to (a) construct a &#8220;persuasion profile&#8221; for that individual and (b) characterize how much variation in the effects of these strategies there is in the population.</p>
<p>However, a critical step in this process is ruling out the alternative explanation that the observed differences are primarily due to differences in, e.g., mood, rather than stable individual differences. One way to do this is to observe the behavior in multiple sessions and multiple contexts. Another way to rule out this alternative explanation is if you observe a complex pattern of behavioral differences that previous work suggests could not be the result of temporary, unstable differences &#8212; or at least is more easily explained by previous theories about the relevant traits. That is, I&#8217;m enthusiastic about identifying stable, observed differences in behavior, but I don&#8217;t want to see researchers abandon the careful methods that have been used in the past to make the case for a new individual difference.</p>
<p>Watson, Strayer, and colleagues have apparently begun doing work that could be used to show the stability of the observed differences. The discussion section of their paper refers to some additional unpublished research in which they invited their &#8220;supertaskers&#8221; from this study and another study back into the lab and had them do some similar tasks measuring executive attention (but not driving) while in an fMRI machine. They report greater &#8220;coherence&#8221; in their performance in this second study and the previous study than control participants and better performance for &#8220;supertaskers&#8221; on <a href="http://dual-n-back.com/">dual-N-back tasks</a>. But this is short of showing high test-retest reliability.</p>
<p>Since little is said about this work, I hesitate to conclude anything from it or criticize it. I&#8217;ve contacted the authors with the hope of learning more. My current sense is that Watson and Strayer&#8217;s entire case for &#8220;supertaskers&#8221; hinges on research of this kind.</p>
<h3>References</h3>
<div class="references">
<p style="margin: 0pt;">Watson, J. M., &amp; Strayer, D. L. (2010). Supertaskers: Profiles in Extraordinary Multi-tasking Ability. <span style="font-style: italic;">Psychonomic Bulletin and Review</span>. Forthcoming. Retrieved from <a href="http://www.psych.utah.edu/lab/appliedcognition/publications/supertaskers.pdf">http://www.psych.utah.edu/lab/appliedcognition/publications/supertaskers.pdf</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.deaneckles.com/blog/276_discovering-supertaskers-challenges-in-identifying-individual-differences-from-behavior/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

