Not just predicting the present, but the future: Twitter and upcoming movies
Search queries have been used recently to “predict the present“, as Hal Varian has called it. Now some initial use of Twitter chatter to predict the future:
The chatter in Twitter can accurately predict the box-office revenues of upcoming movies weeks before they are released. In fact, Tweets can predict the performance of films better than market-based predictions, such as Hollywood Stock Exchange, which have been the best predictors to date. (Kevin Kelley)
Here is the paper by Asur and Huberman from HP Labs. Also see a similar use of online discussion forums.
But the obvious question from my previous post is, how much improvement do you get by adding more inputs to the model? That is, how does the combined Hollywood Stock Exchange and Twitter chatter model perform? The authors report adding the number of theaters the movie opens in to both models, but not combining them directly.
Sort of like a dissonance versus resonance idea in music…
Squared error (on the training set) can only go down by adding more inputs, but if the Holloywood Stock Exchange and Twitter data predict the pretty much the same variance in revenues, then there won’t be much improvement. This is the same as saying that R^2 — the fraction of “explained” variance” can only go up by adding more predictors.
However, the authors use adjusted R^2 in evaluating the models, and it is possible for that to decrease with new predictors that don’t “explain” much variance.
Actually, this is one point where I would have preferred a different approach: I wish they had used some kind of cross-validation in their model comparison, thus allowing them to report test error.
Agree with last comment. Also, would like to see some other competing models tried (neural nets, regression trees etc.) prior to cross-validation.