I'm not sure if I have a good suggestion, but I have a question. :)  What is 
considered "junk"?  Would it be possible to eliminate the junk before it even 
goes into the index in order to avoid GIGO (Garbage In Garbage Out)?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Terence Gannon <butzi0...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, January 12, 2009 11:00:31 AM
> Subject: Improving Readability of Hit Highlighting
> 
> I'm indexing text from an OCR of an old document.  Many words get read
> perfectly, but they're typically embedded in a lot of junk.  I would
> like the hit highlighting to show only the 'good' words, in the order
> in which they appeared in the original document.  Is it possible to
> use output of the filter classes as the text used in hit highlighting?
> Or do you have to all the text cleanup outside of Solr and present it
> with two fields to index, one with the original text, and one with the
> cleaned up text.  The objective of the hit highlighting is to give the
> user a *sense* of the original context, even if it's not provided
> verbatim from the original document.  Thanks in advance.
> 
> TerryG

Reply via email to