Increasing valuable annotation behaviors was a practical end of a good deal of work at Yahoo! Research Berkeley. ZoneTag is a mobile application and service that suggests tags when users choose to upload a photo (to Flickr) based on their past tags, the relevant tags of others, and events and places nearby. Through social influence and removing barriers, these suggestions influence users to expand and consistently use their tagging vocabulary (Ahern et al. 2006).
Context-aware suggestion techniques such as those used in ZoneTag can increase tagging, but what about users’ motivations for considering tagging in the first place? And how can these motivations for annotation be considered in designing services that involve annotation? In this post, I consider existing work on motivations for tagging, and I use tagging on Facebook as an example of how multiple motivations can combine to increase desired annotation behaviors.
Using photo-elicitation interviews with ZoneTag users who tag, Ames & Naaman (2007) present a two factor taxonomy of motivations for tagging. First, they categorize tagging motivations by function: is the motivating function of the tagging organizational or communicative? Organizational functions include supporting search, presenting photos by event, etc., while communicative functions include when tags provide information about the photos, their content, or are otherwise part of a communication (e.g., telling a joke). Second, they categorize tagging motivations by intended audience (or sociality): are the tags intended for my future self, people known to me (friends, family, coworkers, online contacts), or the general public?
On Flickr the function dimension generally maps onto the distinction between functionality that enables and is prior to arriving at the given photo or photos (organization) and functionality applicable once one is viewing a photo (communication). For example, I can find a photo (by me or someone else) by searching for a person’s name, and then use other tags applied to that photo to jog my memory of what event the photo was taken at.
Some Flickr users subscribe to RSS feeds for public photos tagged with their name, making for a communication function of tagging — particularly tagging of people in media — that is prior to “arriving” at a specific media object. These are generally techie power users, but this can matter for others. Some less techie participants in our studies reported noticing that their friends did this — so they became aware of tagging those friends’ names as a communicative act that would result in the friends finding the tagged photos.
This kind of function of tagging people is executed more generally — and for more than just techie power users — by Facebook. In tagging of photos, videos, and blog posts, tagging a person notifies them they have been tagged, and can add that they have been tagged to their friends’ News Feeds. This function has received a lot of attention from a privacy perspective (and it should). But I think it hints at the promise of making annotation behavior fulfill more of these functions simultaneously. When specifying content can also be used to specify recipients, annotation becomes an important trigger for communication.
See some interesting comments (from Twitter) about tagging on Facebook:
- noticing people tagging to gain eyeballs
- exhorting others not to tag bad photos (and thanks)
- collapsing time by tagging photos from long ago
- tagging by parents
Ames, M., & Naaman, M. (2007). Why we tag: motivations for annotation in mobile and online media. In Proceedings of CHI 2007 (pp. 971-980). San Jose, California, USA: ACM.
Ahern, S., Davis, M., Eckles, D., King, S., Naaman, M., Nair, R., et al. (2006). Zonetag: Designing context-aware mobile media capture to increase participation. Pervasive Image Capture and Sharing: New Social Practices and Implications for Technology Workshop. In Adjunct Proc. Ubicomp 2006.
Today I’m attending the Social Mobile Media Workshop at Stanford University. It’s organized by researchers from Stanford’s HStar, Tampere University of Technology, and the Naval Postgraduate School. What follows is some still jagged thoughts that were prompted by the presentation this morning, rather than a straightforward account of the presentations.1
A big theme of the workshop this morning has been transitions among production and consumption — and the critical role of annotations and context-awareness in enabling many of the user experiences discussed. In many ways, this workshop took me back to thinking about mobile media sharing, which was at the center of a good deal of my previous work. At Yahoo! Research Berkeley we were informed by Marc Davis’s vision of enabling “the billions of daily media consumers to become daily media producers.” With ZoneTag we used context-awareness, sociality, and simplicity to influence people to create, annotate, and share photos from their mobile phones (Ahern et al. 2006, 2007).
Enabling and encouraging these behaviors (for all media types) remains a major goal for designers of participatory media; and this was explicit at several points throughout the workshop (e.g., in Teppo Raisanen’s broad presentation on persuasive technology). This morning there was discussion about the technical requirements for consuming, capturing, and sending media. Cases that traditionally seem to strictly structure and separate production and consumption may be (1) in need of revision and increased flexibility or (2) actually already involve production and consumption together through existing tools. Media production to be part of a two-way communication, it must be consumed, whether by peers or the traditional producers.
As an example of the first case, Sarah Lewis (Stanford) highlighted the importance of making distance learning experiences reciprocal, rather than enforcing an asymmetry in what media types can be shared by different participants. In a past distance learning situation focused on the African ecosystem, it was frustrating that video was only shared from the participants at Stanford to participants at African colleges — leaving the latter to respond only via text. A prototype system, Mobltz, she and her colleagues have built is designed to change this, supporting the creation of channels of media from multiple people (which also reminded me of Kyte.tv).
As an example of the second case, Timo Koskinenen (Nokia) presented a trial of mobile media capture tools for professional journalists. In this case the work flow of what is, in the end, a media production practice, involves also consumption in the form of review of one’s own materials and other journalists, as they edit, consider what new media to capture.
Throughout the sessions themselves and conversations with participants during breaks and lunch, having good annotations continued to come up as a requirement for many of the services discussed. While I think our ZoneTag work (and the free suggested tags Web service API it provides) made a good contribution in this area, as has a wide array of other work (e.g., von Ahn & Dabbish 2004, licensed in Google Image Labeler), there is still a lot of progress to make, especially in bringing this work to market and making it something that further services can build on.
Ahern, S., Davis, M., Eckles, D., King, S., Naaman, M., Nair, R., et al. (2006). ZoneTag: Designing Context-Aware Mobile Media Capture. In Adjunct Proc. Ubicomp (pp. 357-366).
Ahern, S., Eckles, D., Good, N. S., King, S., Naaman, M., & Nair, R. (2007). Over-exposed?: privacy patterns and considerations in online and mobile photo sharing. In Proc. CHI 2007 (pp. 357-366). ACM Press.
Ahn, L. V., & Dabbish, L. (2004). Labeling images with a computer game. In Proc. CHI 2004 (pp. 319-326).
- Blogging something at this level of roughness is still new for me… [↩]
Much of current human-computer interaction (HCI) research focuses on novice users in “walk-up and use” scenarios. I can think of three major causes for this:
- A general shift from examining non-discretionary use to discretionary use
- How much easier it is to find (and not train) study participants unfamiliar with a system than experts (especially with a system that is only a prototype)
- The push from practitioners in the direction, especially with the advent of the Web, where new users just show up at your site, often deep-linked
This focus sometimes comes in for criticism, especially when #2 is taken as a main cause of the choice.
On the other hand, some research threads in HCI continue to focus on expert use. As I’ve been reading a lot of research on both human performance modeling and situated & embodied approaches to HCI, it has been interesting to note that both instead have (comparatively) a much bigger focus on the performance and experience of expert and skilled use.
Grudin’s “Three Faces of Human-Computer Interaction” does a good job of explaining the human performance modeling (HPM) side of this. HPM owes a lot to human factors historically, and while The Psychology of Human-Computer Interaction successfully brought engineering-oriented cognitive psychology to the field, it was human factors, said Stuart Card, “that we were trying to improve” (Grudin 2005, p. 7). And the focus of human factors, which arose from maximizing productivity in industrial settings like factories, has been non-discretionary use. Fundamentally, it is hard for HPM to exist without a focus on expert use because many of the differences — and thus research contributions through new interaction techniques — can only be identified and are only important for use by experts or at least trained users. Grudin notes:
A leading modeler discouraged publication of a 1984 study of a repetitive task that showed people preferred a pleasant but slower interaction technique—a result significant for discretionary use, but not for modeling aimed at maximizing performance.
Situated action and embodied interaction approaches to HCI, which Harrison, Tatar, and Senger (2007) have called the “third paradigm of HCI”, are a bit different story. While HPM research, like a good amount in traditional cognitive science generally, contributes to science and design by assimilating people to information processors with actuators, situated and embodied interaction research borrows a fundamental concern of ethnomethodology, focusing on how people actively make behaviors intelligible by assimilating them to social and rational action.
There are at least three ways this motivates the study of skilled and expert users:
- Along with this research topic comes a methodological concern for studying behavior in context with the people who really do it. For example, to study publishing systems and technology, the existing practices of people working in such a setting of interest are of critical importance.
- These approaches emphasize the skills we all have and the value of drawing on them for design. For example, Dourish (2001) emphasizes the skills with which we all navigate the physical and social world as a resource for design. This is not unrelated to the first way.
- These approaches, like and through their relationships to the participatory design movement, have a political, social, and ethical interest in empowering those who will be impacted by technology, especially when otherwise its design — and the decision to adopt it — would be out of their control. Non-discretionary use in institutions is the paradigm prompting situation for this.
I don’t have a broad conclusion to make. Rather, I just find it of note and interesting that these two very different threads in HCI research stand out from much other work as similar in this regard. Some of my current research is connecting these two threads, so expect more on their relationship.
Dourish, P. (2001). Where the Action Is: The Foundations of Embodied Interaction. MIT Press.
Grudin, J. (2005). Three Faces of Human-Computer Interaction. IEEE Ann. Hist. Comput. 27, 4 (Oct. 2005), 46-62.
Harrison, S., Tatar, D., and Senger, P. (2007). The Three Paradigms of HCI. Extended Abstracts CHI 2007.
As I’ve blogged before, I spoke at the Texting 4 Health conference on the topic of research methods for mobile messaging. One method I covered was an interesting use of Wizard of Oz techniques for designing mobile services. I’ve since started getting some of this material in writing for the Texting 4 Health book. Here is a taste of that material, minus the health-specific focus and examples.
Just like the famous Wizard of Oz, one can simulate something impressive with a just a humble person behind the curtain — and use this simulation to inform design decisions. When using a Wizard of Oz technique to study a prototype, a human “wizard” carries out functions that, in a deployed application or service, would be handled by a computer. This can allow evaluating a design without fully building what can be expensive back-end parts of the system (Kelley 1984). The technique is often used in recognition-based interfaces, but it also has traditional applications to identifying usability problems and carrying out experiments in which the interaction is systematically manipulated.
Wizard of Oz techniques are well suited to prototyping mobile services, especially those using mobile messaging (SMS, MMS, voice messaging). When participants send a request, a wizard reads or listens to it and chooses the appropriate response, or just creates it on-the-fly. Since all user actions in mobile messaging are discrete messages and (depending on the application) the user can often tolerate a short delay, a few part-time wizards, such as you and a colleague, can manage a short field trial. As you’ll see, this can be used for purposes beyond many traditional uses of a Wizard of Oz.
Probing photo consumption needs with realistic motivations
One use for this technique in designing a mobile messaging service is a bit like a diary study. In designing an online and mobile photography service, we wanted to better understand what photos people wanted to view and what prompted these desires.1 Instead of just making diary entries, participants actually made voice requests to the system for photos – and received a mobile message with photos fitting the request in return. We didn’t need to first build a robust system that could do this; a few of us served as wizards, listening to the request, doing a couple manual searches, and choosing which photos to return on demand. Though this can be done with a normal voice call, we used a mobile client application that also recorded contextual information not available via a normal voice call (e.g. location), so that participants could make context-aware requests as they saw fit (e.g. “I want too see photos of this park”)
In this case, we didn’t plan to (specifically) create a voice-based photo search system; instead, like a diary study, this technique served as a probe to understand what we should build. As a probe it provided realistic motivations for submitting requests, as the request would actually be fulfilled. This design research, in additional to other interviews and a usability study, informed our creation of Zurfer, a mobile application that supports exploring and conversing around personalized, location-aware channels of photos.
It is great if the Wizard of Oz prototype is quite similar to what you later build, but it can yield valuable insights even if not. Sometimes it is precisely these insights that can lead you to substantially change your design.
This study design can apply in designing many mobile services. As in our photos study, participants can be interviewed about the trigger for the requests (why did they want that media or information) and how satisfied they were with the (human-created) responses.2
- This study was designed and executed at Yahoo! Research Berkeley by Shane Ahern, Nathan Good, Simon King, Mor Naaman, Rahul Nair, and myself. [↩]
- Participants were informed that their requests would be seen by our research staff. Anonymization and strict limits of who the wizards are is necessary to protect participants’ privacy. Even if participants are not informed that a wizard is creating the responses until they are debriefed after the experiment, participants can nonetheless be notified that their responses are being reviewed by the research team. [↩]
Yes, that spells ASSIST.
Check out this call for proposals from DARPA (also see Wired News). This research program is designed to create and evaluate systems that use sensors to capture soldiers’ experiences in the field, thus allowing for (spatially and temporally) distant review and analysis of this data, as well as augmenting their abilities while still in the field.
I found it interesting to consider differences in requirements between this program and others that would apply some similar technologies and involve similar interactions — but for other purposes. For example, two such uses are (1) everyday life recording for social sharing and memory and (2) rich data collection as part of ethnographic observation and participation.
When doing some observation myself, I strung my cameraphone around my neck and used Waymarkr to automatically capture a photo every minute or so. Check out the results from my visit to a flea market in San Francisco.
Photos of two ways to wear a cameraphone from Waymarkr. Incidentally, Waymarkr uses the cell-tower-based location API created for ZoneTag, a project I worked on at Yahoo! Research Berkeley.
Also, for a use more like (1) in a fashion context, see Blogging in Motion. This project (for Yahoo! Hack Day) created a “auto-blogging purse” that captures photos (again using ZoneTag) whenever the wearer moves around (sensed using GPS).