Hi Eric, Thanks for the response.
I am already using termVectors with offsets & positions enabled as shown below. <field name="attachment_bodies" type="text_rev" indexed="true" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true" /> I am indexing FAQ content and some these FAQ has attachments linked to them and these attachments have files like PDF, DOC *.TAR , *.GZIP files that contains additional information related to the FAQ and all these contents are indexed. But while searching and highlighting it is observed that for archived files like *.gz, *.tar, *.zip the search performance degrades and using the debug flag I am finding that the time taken for highlighting these *.gz, *.tar, *.zip archived files is taking more time. What could be the reason behind it ? Is it because these files are unzipped and then highlighted from the index during display time ? Is the highlighting dependent on file size what I mean is if the file size is more, then does the performance of the search degrades because of the highlighting ? I have tried to reduce the maxAnalyzedChars value from 5MB to 1 MB bus still do not see any significant improvement in the search and highlighting for these kind of files. Let me know if you can suggest any workaround for improving the highlighting and search performance for these kind of files or even files having large file size ? Thanks Shyam -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, November 26, 2011 8:57 AM To: solr-user@lucene.apache.org Subject: Re: highlighting performance poor with *.tar, *.gz files Highlighting is dependent on the size of the data being fed through the highlighter. Unless you have termVectors & offsets & positions enabled, the text must be re-analyzed, see: http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=%28termvector%29%7C%28retrieve%29%7C%28contents%29 But highlighting compressed files seems like an odd use-case, what is the business reason you need to do this? Best Erick On Thu, Nov 24, 2011 at 10:28 AM, Shyam Bhaskaran <shyam.bhaska...@synopsys.com> wrote: > Hi, > > It is observed that highlighting of search results is taking too much time > especially for highlighting terms for archived files like *.gz, *.tar, *.zip. > What could be the reason behind it ? Is it because these files are unzipped > and then highlighted from the index during display time ? > Or is it dependent on the size of the file ? Is there any way by which the > search & highlighter performance improves for these kind of archived files > (*.tar, *.zip etc) > > Let me know if there is any workaround for improving the highlighting and > search performance for these kind of files? > > -Shyam >