:07:57 PM
> Subject: Re: Improving Readability of Hit Highlighting
>
> To answer your questions specifically, here is an example of the raw OCR
> output;
>
> "CONTRACTORINMPRIMENTAYIVE : mom Ale ACCEPT INFORMATIONON TOUR SHEET TO ea"
>
> to which I would like
To answer your questions specifically, here is an example of the raw OCR output;
"CONTRACTORINMPRIMENTAYIVE : mom Ale ACCEPT INFORMATIONON TOUR SHEET TO ea"
to which I would like to see;
"mom ale access tour sheet to"
in the hit highlight. My schema for this field is pretty much
standard, as f
I'm not sure if I have a good suggestion, but I have a question. :) What is
considered "junk"? Would it be possible to eliminate the junk before it even
goes into the index in order to avoid GIGO (Garbage In Garbage Out)?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
---