On 5-Oct-07, at 11:59 AM, Ravish Bhagdev wrote:

But a different use-case might be for the highlighting to encompass
the markup rather than >just the text, e.g.
<span class="highlighted"><topic type="location">Paris</topic></ span>
which would have to be accomplished some other way.

Yes, exactly.  And I think nutch handles this somehow as I remember
using it for indexing HTML and then returning snippets with accurate
highlighting placed within html snippets.

Is there a potential for code reuse from nutch?  Maybe this is topic
for solr developer list?  Or has it been already considered?

Last time I looked at the nutch highlighter I don't remember seeing anything about handling this correctly (which would involved a considerable amount of html finangling to get perfect).

Also, I don't see the use case for web docs: you absolutely never want to serve up the raw html form an unknown page.

I'm not against improving Solr's handling of HTML data, but it is the type of thing that is unlikely to happen unless someone who cares about it steps up.

Patches welcome :)

-Mike

Reply via email to