On 9-Aug-07, at 2:10 PM, Benjamin Higgins wrote:
Hi all, I'd like to provide a blurb of documents matching a search in
the case when there is no text highlighted. I assumed that perhaps
the
highlighter would give me back the first few words in a document if
this
occurred, but it doesn't. My conundrum is that I'd rather not grab
the
whole document body field because some of them are large. Is there
some
way I can request from Lucene the first N words or lines from a field?
The way I deal with this is that I modified the highlighter fragment
scorer to return a positive (but low) score for the first few
fragments of a doc. This will work, but tends not to provide great
summaries and will definitely still fetch and process the entire doc
contents.
The better way to do this is to generate a better general summary
yourself and store it in a separate field; this can be used if no
highlighting is generated (or, capability in Solr to automatically
substitute a field in the case of no highlighting would be cool). I
might even implement this if there is sufficient interest :).
Unfortunately, the highlighter does not know (and realy has no way of
knowing) what parts of a doc matched, so it would still have to try
highlighting first.
Note that you can control the cpu usage for long fields by setting
hl.maxAnalyzedChars (will be in the next release).
best,
-Mike