I'm using the standard Solr query language and the normal highlighting
parameters documented at
http://wiki.apache.org/solr/HighlightingParameters. Snippet generation
and highlighting is working pretty well, but my testers have
discovered something they find borderline unacceptable. If they search
for

    "stock market"

(with quotes), then Solr correctly returns only documents where
"stock" and "market" appear as adjacent words. Two problems though:
First, Solr is willing to pick snippets where only one of the terms
appears, e.g.

    ...and changes in the <b>market</b> regulation environment...

Second, even when Solr picks a snippet that indeed has "stock" and
"market" adjacent to one another, it still highlights any non-adjacent
instances of "stock" and "market", e.g.

    ... huge <b>stock</b> sales due to recent increases in
<b>stock</b> <b>market</b> prices...

(In the latter case the first instance of "stock" should not be highlighted.)

My testers say that both of these behaviors are incorrect, because
when people search for "stock market", they're not that interested in
the parts of the document where "stock" and "market" do not appear
together. I'm inclined to agree. I'm not sure there's an easy fix,
though, is there? The Lucene highlighter code seems to think only in
terms of terms, rather than any higher-level constructs.

Has anyone here dealt with this issue? Maybe I need to try the Lucene list.

Thanks,
Chris

Reply via email to