One of my favorite bits from James Scott’s Seeing Like a State:
While doing fieldwork in a small village in Malaysia, I was constantly struck by the breadth of my neighbors’ skills and their casual knowledge of local ecology. One particular anecdote is representative. Growing in the compound of the house in which I lived was a locally famous mango tree. Relatives and acquaintances would visit when the fruit was ripe in the hope of being given a few fruits and, more important, the chance to save and plant the seeds next to their own house. Shortly before my arrival, however, the tree had become infested with large red ants, which destroyed most of the fruit before it could ripen. It seemed nothing could be done short of bagging each fruit. Several times I noticed the elderly head of household, Mat Isa, bringing dried nipah palm fronds to the base of the mango tree and checking them. When I finally got around to asking what he was up to, he explained it to me, albeit reluctantly, as for him this was pretty humdrum stuff compared to our usual gossip. He knew that small black ants, which had a number of colonies at the rear of the compound, were the enemies of large red ants. He also knew that the thin, lancelike leaves of the nipah palm curled into long, tight tubes when they fell from the tree and died. (In fact, the local people used the tubes to roll their cigarettes.) Such tubes would also, he knew, be ideal places for the queens of the black ant colonies to lay their eggs. Over several weeks he placed dried nipah fronds in strategic places until he had masses of black-ant eggs beginning to hatch. He then placed the egg-infested fronds against the mango tree and observed the ensuing week-long Armageddon. Several neighbors, many of them skeptical, and their children followed the fortunes of the ant war closely. Although smaller by half or more, the black ants finally had the weight of numbers to prevail against the red ants and gain possession of the ground at the base of the mango tree. As the black ants were not interested in the mango leaves or fruits while the fruits were still on the tree, the crop was saved.
This successful field experiment in biological controls presupposes several kinds of knowledge: the habitat and diet of black ants, their egg-laying habits, a guess about what local material would substitute as movable egg chambers, and experience with the fighting proclivities of red and black ants. Mat Isa made it clear that such skill in practical entomology was quite widespread, at least among his older neighbors, and that people remembered something like this strategy having worked once or twice in the past. What is clear to me is that no agricultural extension official would have known the first thing about ants, let alone biological controls; most extension agents were raised in town and in any case were concerned entirely with rice, fertilizer, and loans. Nor would most of them think to ask; they were, after all, the experts, trained to instruct the peasant. It is hard to imagine this knowledge being created and maintained except in the context of lifelong observation and a relatively stable, multigenerational community that routinely exchanges and preserves knowledge of this kind. [Chapter 9]
Hobsbawn on industrialization, mass mobilization, and “total war” in The Age of Extremes: A History of the World, 1914-1991 (ch. 1):
Jane Austen wrote her novels during the Napoleonic wars, but no reader who did not know this already would guess it, for the wars do not appear in her pages, even though a number of the young gentlemen who pass through them undoubtedly took part in them. It is inconceivable that any novelist could write about Britain in the twentieth-century wars in this manner.
The monster of twentieth-century total war was not born full-sized. Nevertheless, from 1914 on, wars were unmistakably mass wars. Even in the First World War Britain mobilized 12.5 per cent of its men for the forces, Germany 15.4 per cent, France almost 17 per cent. In the Second World War the percentage of the total active labour force that went into the armed forces was pretty generally in the neighborhood of 20 per cent (Milward, 1979, p. 216). We may note in passing that such a level of mass mobilization, lasting for a matter of years, cannot be maintained except by a modern high-productivity industrialized economy, and – or alternatively – an economy largely in the hands of the non-combatant parts of the population. Traditional agrarian economies cannot usually mobilize so large a proportion of their labour force except seasonally, at least in the temperate zone, for there are times in the agricultural year when all hands are needed (for instance to get in the harvest). Even in industrial societies so great a manpower mobilization puts enormous strains on the labour force, which is why modern mass wars both strengthened the powers of organized labour and produced a revolution in the employment of women outside the household: temporarily in the First World War, permanently in the Second World War.
A superior good is something that one purchases more of as income rises. Here it is appealing to, at least metaphorically, see the huge expenditures on industrial armaments as revealing arms as superior goods in this sense.
Nate Cohn at The New York Times reports on how one 19-year-old black man is having an outsized impact on the USC/LAT panel’s estimates of support for Clinton in the U.S. presidential election. It happens that the sample doesn’t have enough other people with similar demographics and voting history (covariates) to this panelist, so he is getting a large weight in computing the overall averages for the populations of interest, such as likely voters:
There is a 19-year-old black man in Illinois who has no idea of the role he is playing in this election.
He is sure he is going to vote for Donald J. Trump.
And he has been held up as proof by conservatives — including outlets like Breitbart News and The New York Post — that Mr. Trump is excelling among black voters. He has even played a modest role in shifting entire polling aggregates, like the Real Clear Politics average, toward Mr. Trump.
As usual, Andrew Gelman suggests that the solution to this problem is a technique he calls “Mr. P” (multilevel regression and post-stratification). I wanted to comment on some practical tradeoffs among common methods. Maybe these are useful notes, which can be read alongside another nice piece by Nate Cohn on how different adjustment methods can yield very different polling results.
Complete post-stratification is when you compute the mean outcome (e.g., support for Clinton) for each stratum of people, such as 18-24-year-old black men, defined by the covariates X. Then you combine these weighting by the size of each group in the population of interest. This really only works when you have a lot of data compared with the number of strata — and the number of strata grows very fast in the number of covariates you want to adjust for.
Modeling sample inclusion and weighting
When people talk about survey weighting, often what they mean is weighting by inverse of the estimated probability of inclusion in the sample. You model selection into the survey S using, e.g., logistic regression on the covariates X and some interactions. This can be done with regularization (i.e., priors, shrinkage) since many of the terms in the model might be estimated with very few observations. Especially without enough regularization, this can result in very large weights when you don’t have enough of some particular type in your sample.
Modeling the outcome and integrating
You fit a model predicting the response (e.g., support for Clinton) Y with the covariates X. You regularize this model in some way so that the estimate for each person is going to “borrow strength” from other people with similar Xs. So now you have a fitted responses Yhat for each unique X. To get an estimate for a particular population of interest, integrate out over the distribution of X in that population. Gelman’s preferred version “Mr. P” uses a multilevel (aka hierarchical Bayes, random effects) model for the outcome, but other regularization methods may often be appealing.
This is nice because there can be some substantial efficiency gains (i.e. more precision) by making use of the outcome information. But there are also some practical issues. First, you need a model for each outcome in your analysis, rather than just having weights you could use for all outcomes and all recodings of outcomes. Second, the implicit weights that this process puts on each observation can vary from outcome to outcome — or even for different codings (i.e. a dichotomization of answers on a numeric scale) of the same outcome. In a reply to his post, Gelman notes that you would need a different model for each outcome, but that some joint model for all outcomes would be ideal. Of course, the latter joint modeling approach, while appealing in some ways (many statisticians love having one model that subsumes everything…) means that adding a new outcome to analysis would change all prior results.
Side note: Other methods, not described here, also work towards the aim of matching characteristics of the population distribution (e.g., iterative proportional fitting / raking). They strike me as overly specialized and not easy to adapt and extend.