This is not at all surprisingy. Document structure doesn't have a stable semantics. A given configuration may express many different relations depending on context. Or no relationship at all, but instead serve stylistic goals. The problem with document structure analysis, as the problem with natural language analysis, is that the analyzer is often wrong, but it has no way of knowing when.
As with NLP, document structure analysis is a great research topic. In both cases, there are good opportunities for researchers to work with search experts who understand the diversity of queries and documents to find out what if anything of the research may be effectively applicable to search. But I don't believe we can just will our current simplistic analysis methods into search success.
No comments:
Post a Comment