Search queries in referrer headers: Technical knowledge, privacy, and the status quo

I have been fascinated by Christopher Soghoian‘s complaint to the FTC about Google’s practices of including search query information in the HTTP referrer header.

In summary, Google has taken proactive efforts to ensure that Web site owners that get visitors from Google search receive the search terms entered by Google’s users. Meanwhile, Google has agreed that search query data is personally sensitive information and that it does not disclosure this information, except under specific, limited circumstances; this is reflected in its privacy policy. Note that Google has not just let the URL do the work, but has specifically worked to make the referrer header include search terms (and additional information) when it has adopted techniques that would otherwise prevent these disclosures from being made. (For a fuller summary, see his blog post and this WSJ article. Or this article at Search Engine Land.)

I am not going to discuss the ethics and legal issues in this particular case. Instead, I just want to draw attention to how this issue reveals the importance of technical knowledge in thinking about privacy issues.

A common response from people working in the Internet industry is that Soghoian is a non-techie that has suddenly “discovered” referrer headers. For example, Danny Sullivan writes “former FTC employee discovers browsers sends referrer strings, turns it into google conspiracy”. (Of course, Soghoian is actually technically savvy, as reading the complaint to the FTC makes clear.)

What’s going on here? Folks with technical knowledge perceive search query disclosure as the status quo (though I bet most don’t often think about the consequences of clicking on a link after a sensitive search).

But how would most Internet users be aware of this? Certainly not through Google’s statements, or through warnings from Web browsers. One of the few ways I think users might realize this is happening is through query-highlighting — on forums, mailing list archives, and spammy pages. So a super-rational user who cares to think about how that works, might guess something like this is going on. But I doubt most users would actively work out the mechanisms involved. Futhermore, their observations likely radically underdetermine the mechanism anyway, since it is quite reasonable that a Web browser could do this kind of highlighting directly, especially for formulaic sites, like forums. Even casual use of Web analytics software (such as Google Analytics) may not make it clear that this per-user information is being provided, since aggregated data could reasonably be used to present summaries of top search queries leading to a Web site.1

This should be a reminder why empirical studies of privacy attitudes and behaviors are useful: us techie folks often have severe blind spots. I don’t know that this is just a matter of differences in expectations, but rather involves differences in preferences. Over time, these expectations change our sense of the status quo, from which we can calibrate our preferences and intentions.

Google has worked to ensure that referrer headers continue to include search query information — even as it adopts techniques that would make this not happen simply by the standard inclusion of the URL there.2 A difference in beliefs about the status quo puts these actions by Google in a different context. For us techies, that is just maintaining the status quo (which may seem more desirable, since we know it’s the industry-wide standard). For others, it might seem more like Google putting advertisers and Web site owners above its promises to its users about their sensitive data.

  1. Google does separately provide aggregated query data to Web site owners. []
  2. See Danny Sullivan’s post following some changes by Google that could have ended including search queries in referrer headers. []

Respond to this post