Re: Snippet Generation at Punctuation Marks?

Mike Klaas Thu, 03 May 2007 10:49:59 -0700

On 5/3/07, Brian Whitman <[EMAIL PROTECTED]> wrote:

On May 3, 2007, at 11:39 AM, Jack L wrote:
> Snippet generation use hl.fragsize to determine the size
> of the snippets. This works very well. However, the snippets
> often have half of a sentence at the beginning, and half
> at the end. Is there a parameter I can use to tell the
> snippet generation code to cut at punctuation marks when
> possible?



We are working on this and hope to have a solr patch soon. Doing
simple splitting on punctuation is a new fragmenter, which trunk solr
does not support yet. But we're hoping to fix that asap.


See http://issues.apache.org/jira/browse/SOLR-102 for my solution to
this problem.  The idea is that you'd like to split at sentence
boundaries, but also not stray too far from the desired fragment size.
It would be great to get comments on/improvements to this approach.

-Mike

Re: Snippet Generation at Punctuation Marks?

Reply via email to