On 5-Jun-08, at 8:31 PM, Kevin Xiao wrote:
Hi,
I have a question about highlighting fragment. I set hl.fragsize to
100, but the return is cut off from a middle of a sentence with
correct search term highlighting though. Is there a way to make the
cutoff to the beginning of a sentence? Set some flag? How does
highlighting cutoff work anyway?
It chops up the input text every hl.fragsize tokens, without regard to
punctuation.
For example:
Solr returns: in the middle of a <em>sentence</em>
What I want: We are in the middle of a <em>sentence</em>
The RegexFragmenter (development branch/1.3) can achieve results
similar to this. You give it a regular expression to match fragments
to, and a "slop" (factor by which hl.fragsize can be exceeded to fit
the regex). The example config shows an example for matching
sentences.
-Mike