Monday, December 10, 2007

Sentiment Mining: The Truth

Sentiment Mining: The Truth: Nathan Gilliat (o excellent blogger) posts about BuzzLogic's new partnership with KDPaine which will deliver sentiment scores to BuzzLogic's clients. There are a number of approaches to delivering sentiment analysis including many automated approaches and some manual ones. The customers are still skeptical of automated methods and generally more comfortable with manual methods. Paine writes:

"Computers can do a lot of things well, but differentiating between positive and negative comments in consumer generated media isn’t one of them,” explained Katie Delahaye Paine, CEO of KDPaine & Partners. “The problem with consumer generated media is that it is filled with irony, sarcasm and non-traditional ways of expressing sentiment. That’s why we recommend a hybrid solution. Let computers do the heavy lifting, and let humans provide the judgment."
This kind of statement is particularly unhelpful. Let's break it down. (Via Data Mining.)

Worth reading the whole post, where Matt gives a nice summary of strengths and weaknesses of statistical NLP/machine learning methods for sentiment classification that is also relevant to other applications.

This reminded me of a study I heard about in my AT&T days comparing automatic speech recognition and keypad entry for phone-based services. The conventional wisdom was that speech recognition would have to be worse, given how bad automatic speech recognizers are compared with human operators. Except that users made more mistakes with the keypad than the speech recognizer made with their utterances. It is also common to assume that automated text information extraction systems “must” be worse than human annotators, but I know of at least one comparison between outsourced manual extraction and automatic extraction where, again, human performance was worse than machine performance. People are just not that good at those tasks on average, although a person interested in a particular task instance will typically do much better than the best program.

The assumption that algorithms “must” be less accurate than people doesn't seem to be based on solid empirical evidence. However, when customers talk about accuracy, what they really mean is trust. We are more willing to trust human annotators because we (think that we) can understand how they perform the task, and we feel we could, at least in principle, query them about their reasoning if we doubted their conclusions. Whether this trust is warranted is another matter. Going by how often irony and sarcasm are misinterpreted in online communication (thus all those emoticons), we may not be such good modelers of the judgments of others.

No comments: