A very important value of this opening will be to allow full indexing and analysis of the past literature. Sometimes we have the illusion that the latest publication is the one that matters, but in many cases, discovery is bumpy and long-drawn, so the ability to find and synthesize the whole history of a topic is very important in assessing the current state of knowledge. The potential of automated biomedical literature mining has just become that much greater.
Thursday, December 27, 2007
Tuesday, December 25, 2007
Climate change for skeptical environmentalists: A science teacher in Independence, Oregon lays out the scenarios Bishop Berkeley style, with stunning simplicity. He points out that even if we don't know whether global climate disruption is real, we can decide whether or not to act. And when we weigh the risks of acting against the the risks of not acting, even if climate change might not be happening, our course is clear.This video has had over 2.9 million views already. Nice.
Now that you've seen it (Go ahead, watch it) imagine the same argument laid out in a written essay. As compelling? No way. As easily accessible by millions of people? Not. The medium is the message. Barriers to videos that would raise the threshold for little gems like this would be socially irresponsible. (Via isen.blog.)
The argument in the video seems to rely implicitly on the naive assumption that uncertainty has to be maximum, that is, the two outcomes have equal probability. But a skeptic exploiting that is forced to assign those probabilities to make the expected loses from action greater than the expected losses from inaction, or to assert that the worst case scenario is much less bad than suggested. Either move requires the skeptic to deny a lot of evidence. Which is what skeptics are doing. I like the effort in the movie, but without hard numbers, it is always possible for the skeptics to flip the movie's qualitative calculus towards inaction.
Friday, December 21, 2007
Monday, December 10, 2007
Sentiment Mining: The Truth: Nathan Gilliat (o excellent blogger) posts about BuzzLogic's new partnership with KDPaine which will deliver sentiment scores to BuzzLogic's clients. There are a number of approaches to delivering sentiment analysis including many automated approaches and some manual ones. The customers are still skeptical of automated methods and generally more comfortable with manual methods. Paine writes:
"Computers can do a lot of things well, but differentiating between positive and negative comments in consumer generated media isn’t one of them,” explained Katie Delahaye Paine, CEO of KDPaine & Partners. “The problem with consumer generated media is that it is filled with irony, sarcasm and non-traditional ways of expressing sentiment. That’s why we recommend a hybrid solution. Let computers do the heavy lifting, and let humans provide the judgment."This kind of statement is particularly unhelpful. Let's break it down. (Via Data Mining.)
Worth reading the whole post, where Matt gives a nice summary of strengths and weaknesses of statistical NLP/machine learning methods for sentiment classification that is also relevant to other applications.
This reminded me of a study I heard about in my AT&T days comparing automatic speech recognition and keypad entry for phone-based services. The conventional wisdom was that speech recognition would have to be worse, given how bad automatic speech recognizers are compared with human operators. Except that users made more mistakes with the keypad than the speech recognizer made with their utterances. It is also common to assume that automated text information extraction systems “must” be worse than human annotators, but I know of at least one comparison between outsourced manual extraction and automatic extraction where, again, human performance was worse than machine performance. People are just not that good at those tasks on average, although a person interested in a particular task instance will typically do much better than the best program.
The assumption that algorithms “must” be less accurate than people doesn't seem to be based on solid empirical evidence. However, when customers talk about accuracy, what they really mean is trust. We are more willing to trust human annotators because we (think that we) can understand how they perform the task, and we feel we could, at least in principle, query them about their reasoning if we doubted their conclusions. Whether this trust is warranted is another matter. Going by how often irony and sarcasm are misinterpreted in online communication (thus all those emoticons), we may not be such good modelers of the judgments of others.
Monday, December 3, 2007
Roll over, Beethoven: Deutsche Grammophon ditches DRM: Deutsche Grammophon is one of the most respected classical music labels in the world, and it just happens to be a subsidiary of Universal Music Group. With DG dropping DRM in favor of MP3, has Universal finally made up its mind about DRM? (Via Ars Technica.)
This is interesting, although why, why don't they provide AAC too? Still, I may need a bigger iPod.