Using a Wizard of Oz technique in mobile service design: probing with realistic motivations

As I’ve blogged before, I spoke at the Texting 4 Health conference on the topic of research methods for mobile messaging. One method I covered was an interesting use of Wizard of Oz techniques for designing mobile services. I’ve since started getting some of this material in writing for the Texting 4 Health book. Here is a taste of that material, minus the health-specific focus and examples.
Just like the famous Wizard of Oz, one can simulate something impressive with a just a humble person behind the curtain — and use this simulation to inform design decisions. When using a Wizard of Oz technique to study a prototype, a human “wizard” carries out functions that, in a deployed application or service, would be handled by a computer. This can allow evaluating a design without fully building what can be expensive back-end parts of the system (Kelley 1984). The technique is often used in recognition-based interfaces, but it also has traditional applications to identifying usability problems and carrying out experiments in which the interaction is systematically manipulated.

Wizard of Oz techniques are well suited to prototyping mobile services, especially those using mobile messaging (SMS, MMS, voice messaging). When participants send a request, a wizard reads or listens to it and chooses the appropriate response, or just creates it on-the-fly. Since all user actions in mobile messaging are discrete messages and (depending on the application) the user can often tolerate a short delay, a few part-time wizards, such as you and a colleague, can manage a short field trial. As you’ll see, this can be used for purposes beyond many traditional uses of a Wizard of Oz.

Probing photo consumption needs with realistic motivations
One use for this technique in designing a mobile messaging service is a bit like a diary study. In designing an online and mobile photography service, we wanted to better understand what photos people wanted to view and what prompted these desires.1 Instead of just making diary entries, participants actually made voice requests to the system for photos – and received a mobile message with photos fitting the request in return. We didn’t need to first build a robust system that could do this; a few of us served as wizards, listening to the request, doing a couple manual searches, and choosing which photos to return on demand. Though this can be done with a normal voice call, we used a mobile client application that also recorded contextual information not available via a normal voice call (e.g. location), so that participants could make context-aware requests as they saw fit (e.g. “I want too see photos of this park”)

In this case, we didn’t plan to (specifically) create a voice-based photo search system; instead, like a diary study, this technique served as a probe to understand what we should build. As a probe it provided realistic motivations for submitting requests, as the request would actually be fulfilled. This design research, in additional to other interviews and a usability study, informed our creation of Zurfer, a mobile application that supports exploring and conversing around personalized, location-aware channels of photos.
It is great if the Wizard of Oz prototype is quite similar to what you later build, but it can yield valuable insights even if not. Sometimes it is precisely these insights that can lead you to substantially change your design.

This study design can apply in designing many mobile services. As in our photos study, participants can be interviewed about the trigger for the requests (why did they want that media or information) and how satisfied they were with the (human-created) responses.2

Kelley, J.F. (1984). An iterative design methodology for user-friendly natural language office information applications. In ACM Trans. Inf. Syst., vol. 2, pp. 26-41.

  1. This study was designed and executed at Yahoo! Research Berkeley by Shane Ahern, Nathan Good, Simon King, Mor Naaman, Rahul Nair, and myself. []
  2. Participants were informed that their requests would be seen by our research staff. Anonymization and strict limits of who the wizards are is necessary to protect participants’ privacy. Even if participants are not informed that a wizard is creating the responses until they are debriefed after the experiment, participants can nonetheless be notified that their responses are being reviewed by the research team. []

Texting 4 Health conference in review

As I blogged already, I attended and spoke at the first Texting 4 Health conference at Stanford University last week. You can see my presentation slides at SlideShare here, and the program, with links to the slides for most speakers is here.

The conference was very interesting, and there was quite the mix of participants — both speakers and others. There were medical school faculty, business people, people from NGOs and foundations, technologists, representatives of government agencies and centers, futurists, and social scientists. Everyone had something to learn — I know I did. This also made it somewhat difficult as a speaker because it is hard to know how best to reach, inform, and hold the interest of such a diverse audience: what is common ground with some is entirely new territory with others.

I think my favorite session was “Changing Health Behavior via SMS”. The methods used by the panelists to evaluate their interventions were both interesting to reflect on and good tools for persuading me of the importance and effectiveness of their work. One of my reflections was about what factors to vary in doing experiments on health interventions: there is (reasonable) focus on having a no-SMS control condition, and there are very few studies with manipulations of dimensions more fine-grained. Of course, the field is young and I understand how important true controls are in medical domains, but I think that real progress in understanding mobile messaging and designing effective interventions will require looking at more subtle and theoretically valuable manipulations.

You can see other posts about the conference here and here. And the conference Web site is also starting a blog to watch in the future.

Texting 4 Health

On February 29th I’m speaking at Texting 4 Health, a conference at Stanford University about using mobile messaging for health interventions and research. I’ll be talking about mobile messaging research methods I’ve used to study mobile persuasive technology. Like Mobile Persuasion 2007, it will feature a fast-paced, single-track program with time to meet and talk with participants from health, technology, policy, and research communities.

Taxonomy of diary research methods

Diary studies are widely used in human-computer interaction research, but also in user experience research as practiced in product R&D groups. Bolger, Davis, & Rafaeli (2003) is a good review of diary research methods from a Psychology perspective. It gives practical guidance in what research questions are suited to these methods, design decisions, tools, and analysis.

Though it covers state-of-the-art technology used for these methods, I think the argument below for the taxonomy of methods used in this paper needs revision in light of new diary methods, e.g. those made possible by using context-aware devices for signaling participants. Here is the argument for the two-way taxonomy (p. 588):

Diary studies have often been classified into the three categories of interval-, signal-, and event-contingent protocols (e.g., Wheeler & Reis 1991). The interval-contingent design, the oldest method of daily event recording, requires participants to report on their experiences at regular, predetermined intervals. Signal-contingent designs rely on some signaling device to prompt participants to provide diary reports at fixed, random, or a combination of fixed and random intervals. Event-contingent studies, arguably the most distinct design strategy, require participants to provide a self-report each time the event in question occurs. This design enables the assessment of rare or specialized occurrences that would not necessarily be captured by fixed or random interval assessments.

As we see it, diary studies serve one of two major purposes: the investigation of phenomena as they unfold over time, or the focused examination of specific, and often rare, phenomena. It appears to us that the three-way classification blends this conceptual distinction with the technological issue of signaling. Instead, we incorporate interval- and signal-contingent designs into a single category, which we call time-based designs.

This argument to collapse the taxonomy does not account for methods in which participants are signaled based on factors other than time. For example, diary studies can include signaling participants to create an entry based on events that are automatically detected by the system: this occurs when the system is immediately aware of the event because it is an interaction with the system (e.g. the participant has just completed a phone call) or because it can infer an appropriate change in state (e.g. the participant has just moved from one place to another, as detected by readings from GPS).

Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary Methods: Capturing Life as it is Lived. Annual Review of Psychology, 54(1), 579-616.
Wheeler, L., & Reis, H. (1991). Self-recording of everyday life events: origins, types, and uses. Journal of personality, 59(3), 339-354.