The Wall Street Journal’s Venture Capital Dispatch reports on how Aardvark, the social question asking and answering service recently acquired by Google, used a Wizard of Oz prototype to learn about how their service concept would work without building all the tech before knowing if it was any good.
Aardvark employees would get the questions from beta test users and route them to users who were online and would have the answer to the question. This was done to test out the concept before the company spent the time and money to build it, said Damon Horowitz, co-founder of Aardvark, who spoke at Startup Lessons Learned, a conference in San Francisco on Friday.
“If people like this in super crappy form, then this is worth building, because they’ll like it even more,” Horowitz said of their initial idea.
At the same time it was testing a “fake” product powered by humans, the company started building the automated product to replace humans. While it used humans “behind the curtain,” it gained the benefit of learning from all the questions, including how to route the questions and the entire process with users.
This is a really good idea, as I’ve argued before on this blog and in a chapter for developers of mobile health interventions. What better way to (a) learn about how people will use and experience your service and (b) get training data for your machine learning system than to have humans-in-the-loop run the service?
My friend Chris Streeter wondered whether this was all done by Aardvark employees or whether workers on Amazon Mechanical Turk may have also been involved, especially in identifying the expertise of the early users of the service so that the employees could route the questions to the right place. I think this highlights how different parts of a service can draw on human and non-human intelligence in a variety of ways — via a micro-labor market, using skilled employees who will gain hands-on experience with customers, etc.
I also wonder what UIs the humans-in-the-loop used to accomplish this. It’d be great to get a peak. I’d expect that these were certainly rough around the edges, as was the Aardvark customer-facing UI.
Aardvark does a good job of being a quite sociable agent (e.g., when using it via instant messaging) that also gets out of the way of the human–human interaction between question askers and answers. I wonder how the language used by humans to coordinate and hand-off questions may have played into creating a positive para-social interaction with vark.
As I’ve blogged before, I spoke at the Texting 4 Health conference on the topic of research methods for mobile messaging. One method I covered was an interesting use of Wizard of Oz techniques for designing mobile services. I’ve since started getting some of this material in writing for the Texting 4 Health book. Here is a taste of that material, minus the health-specific focus and examples.
Just like the famous Wizard of Oz, one can simulate something impressive with a just a humble person behind the curtain — and use this simulation to inform design decisions. When using a Wizard of Oz technique to study a prototype, a human “wizard” carries out functions that, in a deployed application or service, would be handled by a computer. This can allow evaluating a design without fully building what can be expensive back-end parts of the system (Kelley 1984). The technique is often used in recognition-based interfaces, but it also has traditional applications to identifying usability problems and carrying out experiments in which the interaction is systematically manipulated.
Wizard of Oz techniques are well suited to prototyping mobile services, especially those using mobile messaging (SMS, MMS, voice messaging). When participants send a request, a wizard reads or listens to it and chooses the appropriate response, or just creates it on-the-fly. Since all user actions in mobile messaging are discrete messages and (depending on the application) the user can often tolerate a short delay, a few part-time wizards, such as you and a colleague, can manage a short field trial. As you’ll see, this can be used for purposes beyond many traditional uses of a Wizard of Oz.
Probing photo consumption needs with realistic motivations
One use for this technique in designing a mobile messaging service is a bit like a diary study. In designing an online and mobile photography service, we wanted to better understand what photos people wanted to view and what prompted these desires.1 Instead of just making diary entries, participants actually made voice requests to the system for photos – and received a mobile message with photos fitting the request in return. We didn’t need to first build a robust system that could do this; a few of us served as wizards, listening to the request, doing a couple manual searches, and choosing which photos to return on demand. Though this can be done with a normal voice call, we used a mobile client application that also recorded contextual information not available via a normal voice call (e.g. location), so that participants could make context-aware requests as they saw fit (e.g. “I want too see photos of this park”)
In this case, we didn’t plan to (specifically) create a voice-based photo search system; instead, like a diary study, this technique served as a probe to understand what we should build. As a probe it provided realistic motivations for submitting requests, as the request would actually be fulfilled. This design research, in additional to other interviews and a usability study, informed our creation of Zurfer, a mobile application that supports exploring and conversing around personalized, location-aware channels of photos.
It is great if the Wizard of Oz prototype is quite similar to what you later build, but it can yield valuable insights even if not. Sometimes it is precisely these insights that can lead you to substantially change your design.
This study design can apply in designing many mobile services. As in our photos study, participants can be interviewed about the trigger for the requests (why did they want that media or information) and how satisfied they were with the (human-created) responses.2
- This study was designed and executed at Yahoo! Research Berkeley by Shane Ahern, Nathan Good, Simon King, Mor Naaman, Rahul Nair, and myself. [↩]
- Participants were informed that their requests would be seen by our research staff. Anonymization and strict limits of who the wizards are is necessary to protect participants’ privacy. Even if participants are not informed that a wizard is creating the responses until they are debriefed after the experiment, participants can nonetheless be notified that their responses are being reviewed by the research team. [↩]