Traits, adaptive systems & dimensionality reduction
Psychologists have posited numerous psychological traits and described causal roles they ought to play in determining human behavior. Most often, the canonical measure of a trait is a questionnaire. Investigators obtain this measure for some people and analyze how their scores predict some outcomes of interest. For example, many people have been interested in how psychological traits affect persuasion processes. Traits like need for cognition (NFC) have been posited and questionnaire items developed to measure them. Among other things, NFC affects how people respond to messages with arguments for varying quality.
How useful are these traits for explanation, prediction, and adaptive interaction? I can’t address all of this here, but I want to sketch an argument for their irrelevance to adaptive interaction — and then offer a tentative rejoinder.
Interactive technologies can tailor their messages to the tastes and susceptibilities of the people interacting with and through them. It might seem that these traits should figure in the statistical models used to make these adaptive selections. After all, some of the possible messages fit for, e.g., coaching a person to meet their exercise goals are more likely to be effective for low NFC people than high NFC people, and vice versa. However, the standard questionnaire measures of NFC cannot often be obtained for most users — certainly not in commerce settings, and even people signing up for a mobile coaching service likely don’t want to answer pages of questions. On the other hand, some Internet and mobile services have other abundant data available about their users, which could perhaps be used to construct an alternative measure of these traits. The trait-based-adaptation recipe is:
- obtain the questionnaire measure of the trait for a sample,
- predict this measure with data available for many individuals (e.g., log data),
- use this model to construct a measure for out-of-sample individuals.
This new measure could then be used to personalize the interactive experience based on this trait, such that if a version performs well (or poorly) for people with a particular score on the trait, then use (or don’t use) that version for people with similar scores.
But why involve the trait at all? Why not just personalize the interactive experience based on the responses of similar others? Since the new measure of the trait is just based on the available behavioral, demographic, and other logged data, one could simply predict responses based on those measure. Put in geometric terms, if the goal is to project the effects of different message onto available log data, why should one project the questionnaire measure of the trait onto the available log data and then project the effects onto this projection? This seems especially unappealing if one doesn’t fully trust the questionnaire measure to be accurate or one can’t be sure about which the set of all the traits that make a (substantial) difference.
I find this argument quite intuitively appealing, and it seems to resonate with others.1 But I think there are some reasons the recipe above could still be appealing.
One way to think about this recipe is as dimensionality reduction guided by theory about psychological traits. Available log data can often be used to construct countless predictors (or “features”, as the machine learning people call them). So one can very quickly get into a situation where the effective number of parameters for a full model predicting the effects of different messages is very large and will make for poor predictions. Nothing — no, not penalized regression, not even a support vector machine — makes this problem go away. Instead, one has to rely on the domain knowledge of the person constructing the predictors (i.e., doing the “feature engineering”) to pick some good ones.
So the tentative rejoinder is this: established psychological traits might often make good dimensions to predict effects of different version of a message, intervention, or experience with. And they may “come with” suggestions about what kinds of log data might serve as measures of them. They would be expected to be reusable across settings. Thus, I think this recipe is nonetheless deserves serious attention.
- I owe some clarity on this to some conversations with Mike Nowak and Maurits Kaptein. [↩]
Hey Dean,
Great post. The psychological variables are meaningful to generalize across contexts, but if measurements of these can be obtained by classifying stimuli as such, I would say questionnaire measures of these “traits” seem to add unnecessary complexity and the associated error.
Glad you like it.
I also have the intuition that should help generalize across contexts. But if theory doesn’t tell us how consistent their effects should be (not just the direction), then models for each context must still be fit.
If instead of thinking about the predicted values of the traits as having error, you think about them as linear combinations of other measures, they might seem more useful.
Dean, if we can store 6 billion personality types based on tracked behavior than what is the purpose of even measuring traits anymore?
Thanks for the good ideas, j
Nice line, Jeremy 🙂
Fitting with the rejoinder above, I’d say that for each person we have p >> n. So borrowing strength from others with similar inferred traits and using substantive knowledge about trait-expression in doing the feature engineering is required!
I would feel more sanguine about this proposal if I had more confidence that “psychological theory”, in the area of personality and individual differences, did not amount to “vintage 1940s data mining”.
You are effectively proposing to use the traits as a low-dimensional bottleneck variable for the projection, but then why not search for such bottlenecks directly, using, e.g., auto-encoder techniques? (I know Tong Zhang and co. have worked on that problem, but can’t remember what they’ve published.) More positively, I suppose, if this worked it would give us some evidence that the traits postulated in psychological theory really work.
Cosma, I generally agree with you. And I enjoy “vintage 1940s data mining” as a term. Thanks for the Zhang pointer: I would like to know more.
To a large extent, I’m more on the side of the first view presented in this post. That is, it seems really odd and counterproductive to project the trait measures onto widely available measures and then project the outcome onto that projection. But too many people started agreeing with me, so I’ve been trying to give the other side a fair shake — despite my ongoing criticism of social & personality psychology.
I do think that psychology can give us some guidance in feature engineering. There is often substantial effort required here. Psych theory — flawed as it is — can suggest some ways to allot this effort that may be fruitful and otherwise neglected. Perhaps it is sad that “feature engineering” is not exactly a high-status enterprise, as it may be the future of what social psych has to offer.
Maybe we will get to test some of this soon.
Hey Dean, I remembered Maurits mentioned something about this discussion and I dug your this post. We’ve been doing stuff on the hedonic vs. utilitarian distinction, a classical typology and a predictive filter. Here are some reasons our group we still thinks the distinction is useful (some of which you’ve already mentioned):
1. The hed vs. util distinction is efficient (in terms of time, data and number of interactions). Say a salesperson or his client is really busy and there’s only time for one question before making the “personalized” offering. The most useful question has classically been e.g. “business or pleasure?”, “basic or cool?”, “value-for-money or top-of-the-line”, “fun or efficient”. Instead of dumping the distinction, we’re now thinking of how to the distinction it as an ex ante filter (not necessarily based on questionnaires… 😉 ) in interaction in order to make the info processing and profiling more efficient.
2. There are instances where going wrong the first time is really bad, and so testing out different interaction+offerings combos is not a good strategy. Reputation intensive and/or business-to-business settings come to mind. If you need to get it right the first time, you need ex ante psych.
3. Along the lines of your post, we see that the hed/util distinction enables crafting more-than-incremental improvements to offerings, whereas purely data-based will lead to incremental, path-dependent and possibly market-destroying (it’s hard to model the entire market) or other dead-end offerings.
4. Data based optimization is likely to lead to too short-term optimization – or then you need to hold you ‘rithms back. Ex ante constructs can be used to predict how the passing of time, repeats, consistency and other shite like that will be viewed. In pure data based optimization, cheating and lying will be profitable due to e.g. information asymmetries. Even the easiest of them all, the scarcity argument “there’s only two left in stock”, can be abused if you cover up well…
We’re putting a lot of eggs in the basket that the hed vs. util distinction will actually yield something relevant. Let’s see how it goes..
Thanks for the detailed comments, Petri. I like the point that often there are opportunities to make inquires that are context appropriate and may even increase credibility.
You’re right that relying on responses to previous persuasive attempts doesn’t work in any one-shot scenarios.
But non-trait-based approach being considered here is not limited to this case: one can use many other behavioral signals to predict an individual’s response, most of which may not be their past responses to similar stimuli.
Then the question is whether you should use the available signal to predict traits and then predict responses with predicted traits — or whether you should just predict responses with the available signals directly. Does “theoretical” knowledge about traits provide useful dimensionality reduction here?
Yes, that’s one of the major questions we are looking forward to solving.
Rambling on: There are two uses how the theoretical typologies could reduce dimensionality, I guess – idenfication (“who-to”) and offering (“what”). Say someone behaves in a certain way online – instead of looking for the closest historical “success case”, we could look for signs of a typology. Particularly if the, say browsing, histories are very unique, the probability of hitting home with a typologisation “he’s a hedonist!” might be way higher than “he’s pretty much like that other guy who bought from us last week.
The problem with typology could be that particularly in cases where the offering has a large number of permutations, a simple two-sided typology will produce suboptimal offers. There typology needs to be utilized, but not alone, in designing the offering. I have a feeling that there needs to be a “kink in the offering curve” for the typologizations to work at this stage – e.g. providing a radically different offering, crafting an entirely new offering or not offering anything at all (because of negative network effects..)