Hi Eric,

Thanks for the response.

I am already using termVectors with offsets & positions enabled as shown below.


<field name="attachment_bodies"  type="text_rev"    indexed="true"  
stored="true"  multiValued="true" termVectors="true" termPositions="true" 
termOffsets="true" />


I am indexing FAQ content and some these FAQ has attachments linked to them and 
these attachments have files like PDF, DOC *.TAR , *.GZIP files that contains 
additional information related to the FAQ and all these contents are indexed. 
But while searching and highlighting it is observed that for archived files 
like *.gz, *.tar, *.zip the search performance degrades and using the debug 
flag I am finding that the time taken for highlighting these *.gz, *.tar, *.zip 
archived files is taking more time.

What could be the reason behind it ? Is it because these files are unzipped and 
then highlighted from the index during display time ?

Is the highlighting dependent on file size what I mean is if the file size is 
more, then does the performance of the search degrades because of the 
highlighting ?

I have tried to reduce the maxAnalyzedChars value from 5MB to 1 MB bus still do 
not see any significant improvement in the search and highlighting for these 
kind of files.

Let me know if you can suggest any workaround for improving the highlighting 
and search performance for these kind of files or even files having large file 
size ?


Thanks
Shyam

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, November 26, 2011 8:57 AM
To: solr-user@lucene.apache.org
Subject: Re: highlighting performance poor with *.tar, *.gz files

Highlighting is dependent on the size of the
data being fed through the highlighter. Unless you have
termVectors & offsets & positions enabled, the text
must be re-analyzed, see:
http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=%28termvector%29%7C%28retrieve%29%7C%28contents%29

But highlighting compressed files seems like an odd
use-case, what is the business reason you need to do this?

Best
Erick

On Thu, Nov 24, 2011 at 10:28 AM, Shyam Bhaskaran
<shyam.bhaska...@synopsys.com> wrote:
> Hi,
>
> It is observed that highlighting of search results is taking too much time 
> especially for highlighting terms for archived files like *.gz, *.tar, *.zip.
> What could be the reason behind it ? Is it because these files are unzipped 
> and then highlighted from the index during display time ?
> Or is it dependent on the size of the file ? Is there any way by which the 
> search & highlighter performance improves for these kind of archived files 
> (*.tar, *.zip etc)
>
> Let me know if there is any workaround for improving the highlighting and 
> search performance for these kind of files?
>
> -Shyam
>

Reply via email to