I'm using the standard Solr query language and the normal highlighting parameters documented at http://wiki.apache.org/solr/HighlightingParameters. Snippet generation and highlighting is working pretty well, but my testers have discovered something they find borderline unacceptable. If they search for
"stock market" (with quotes), then Solr correctly returns only documents where "stock" and "market" appear as adjacent words. Two problems though: First, Solr is willing to pick snippets where only one of the terms appears, e.g. ...and changes in the <b>market</b> regulation environment... Second, even when Solr picks a snippet that indeed has "stock" and "market" adjacent to one another, it still highlights any non-adjacent instances of "stock" and "market", e.g. ... huge <b>stock</b> sales due to recent increases in <b>stock</b> <b>market</b> prices... (In the latter case the first instance of "stock" should not be highlighted.) My testers say that both of these behaviors are incorrect, because when people search for "stock market", they're not that interested in the parts of the document where "stock" and "market" do not appear together. I'm inclined to agree. I'm not sure there's an easy fix, though, is there? The Lucene highlighter code seems to think only in terms of terms, rather than any higher-level constructs. Has anyone here dealt with this issue? Maybe I need to try the Lucene list. Thanks, Chris