Saturday, May 12, 2007

Matt Hurst on The Future of Search

The Future of Search: I spent Thursday and Friday last week in the Bay area. On Friday I participated in Berkeley's Future of Search (FoS) event [...] (Peter Norvig) acknowledged that there is still much to be done (for Google) in terms of understanding document structure, in particular the interpretation of tables. I've never seen any real evidence of good document analysis in main stream search which is actually very surprising given the constraints that document structure provides which can only help with relevance and other issues.

This is not at all surprisingy. Document structure doesn't have a stable semantics. A given configuration may express many different relations depending on context. Or no relationship at all, but instead serve stylistic goals. The problem with document structure analysis, as the problem with natural language analysis, is that the analyzer is often wrong, but it has no way of knowing when.

As with NLP, document structure analysis is a great research topic. In both cases, there are good opportunities for researchers to work with search experts who understand the diversity of queries and documents to find out what if anything of the research may be effectively applicable to search. But I don't believe we can just will our current simplistic analysis methods into search success.

No comments: