RE: Creating a document blurb when nothing is returned from highlight feature

Benjamin Higgins Thu, 09 Aug 2007 14:41:23 -0700

Thanks Mike.  I didn't think of creating a blurb beforehand, but that's
a great solution.  I'll probably do that.  Yonik, I can still add a JIRA
issue if you'd like, though.

Ben

-----Original Message-----
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 09, 2007 2:32 PM
To: [email protected]
Subject: Re: Creating a document blurb when nothing is returned from
highlight feature

On 9-Aug-07, at 2:10 PM, Benjamin Higgins wrote:

> Hi all, I'd like to provide a blurb of documents matching a search in
> the case when there is no text highlighted.  I assumed that perhaps  
> the
> highlighter would give me back the first few words in a document if  
> this
> occurred, but it doesn't.  My conundrum is that I'd rather not grab  
> the
> whole document body field because some of them are large.  Is there  
> some
> way I can request from Lucene the first N words or lines from a field?

The way I deal with this is that I modified the highlighter fragment  
scorer to return a positive (but low) score for the first few  
fragments of a doc.  This will work, but tends not to provide great  
summaries and will definitely still fetch and process the entire doc  
contents.

The better way to do this is to generate a better general summary  
yourself and store it in a separate field; this can be used if no  
highlighting is generated (or, capability in Solr to automatically  
substitute a field in the case of no highlighting would be cool).  I  
might even implement this if there is sufficient interest :).

Unfortunately, the highlighter does not know (and realy has no way of  
knowing) what parts of a doc matched, so it would still have to try  
highlighting first.

Note that you can control the cpu usage for long fields by setting  
hl.maxAnalyzedChars (will be in the next release).

best,
-Mike

RE: Creating a document blurb when nothing is returned from highlight feature

Reply via email to