This American Life on the Rating Agencies: This weekend's 'This American Life' is about the rating agencies. [...] A few excerpts:
"We hired a specialist firm that used a methodology called maximum entropy to generate this equation," says Frank Raiter, who until 2005 was in charge of rating mortgages at Standard and Poors. "It looked like a lot of Greek letters." (Via Calculated Risk.)
The new bonds were based on pools of thousands of mortgages. If you bought one of these bonds, you were basically loaning money to people for their houses. What the equation tried to predict was how likely the homeowners were to keep making payments.
The system made sense, Raiter says, until loan issuers started offering mortgages to people who didn't have great credit and in some cases didn't have a job.
Raiter says there wasn't a lot of data on these new homebuyers. He says he told his bosses they needed better data and a better model for assessing the riskiness of the loans.
E. T. Jaynes must be turning in his grave. I'll listen to the podcast soon, but this quote waves a big red flag of overfitting. The last ten years of maxent-related work in machine learning and natural-language processing sow clearly that the maximum entropy principle on its own can be highly misleading when it is applied to data drawn from long-tailed distributions. That's why there's thriving research on ways of regularizing maxent models, for example by replacing equality constraints by box constraints. But even with decent regularization, maxent models are only as good as their choice of event types (features) over which to compute sufficient statistics. If there are correlations in the real world that are not represented by corresponding features in the model, the model may be be overly confident in its predictions.
Maximum entropy, like other statistical-philosophical principles (you know who you are), carries the unfortunate burden of a philosophical foundation that may to some appear to guarantee correct inference without the need for empirical validation. In the case of maximum entropy, the familiar argument is that it produces the least informative hypothesis given the evidence. That seems to imply safety, lack of overreaching. Unfortunately, the principle doesn't say anything about quality of evidence. What if the “evidence” is noisy, incomplete, biased? The principle doesn't say anything about finite-sample effects, as it came from statistical mechanics where the huge number of molecules made (then) those a non-issue. But in biological, social and cultural processes (genomics, language, social relationships, markets) we may as well bet that small-sample effects are never negligible.