Re: Highlighting Performance On Large Documents

Lance Norskog Fri, 07 May 2010 20:02:10 -0700

Do you have these options turned on when you index the text field:
termVectors/termPositions/termOffsets ?


Highlighting needs the information created by these anlysis options.
If they are not turned on, Solr has load the document text and run the
analyzer again with these options on, uses that data to create the
highlighting, then throws away the reanalyzed data. Without these
options, you are basically re-indexing the document when you highlight
it.

http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwiki.apache.org%2Fsolr%2FFieldOptionsByUseCase

On Wed, May 5, 2010 at 5:01 PM, Koji Sekiguchi <k...@r.email.ne.jp> wrote:
> (10/05/05 22:08), Serdar Sahin wrote:
>>
>> Hi,
>>
>> Currently, there are similar topics active in the mailing list, but it I
>> did
>> not want to steal the topic.
>>
>> I have currently indexed 100.000 documents, they are microsoft office/pdf
>> etc documents I convert them to TXT files before indexing. Files are
>> between
>> 1-500 pages. When I search something and filter it to retrieve documents
>> that has more than 100 pages, and activate highlighting, it takes 0.8-3
>> seconds, depending on the query. (10 result per page) If I retrieve
>> documents that has 1-5 pages, it drops to 0.1 seconds.
>>
>> If I disable highlighting, it drops to 0.1-0.2 seconds, even on the large
>> documents, which is more than enough. This problem mostly happens where
>> there are no caches, on the first query. I use this configuration for
>> highlighting:
>>
>>
>>  $query->addHighlightField('description')->addHighlightField('plainText');
>>     $query->setHighlightSimplePre('<strong>');
>>     $query->setHighlightSimplePost('</strong>');
>>     $query->setHighlightHighlightMultiTerm(TRUE);
>>     $query->setHighlightMaxAnalyzedChars(10000);
>>     $query->setHighlightSnippets(2);
>>
>> Do you have any suggestions to improve response time while highlighting is
>> active? I have read couple of articles you have previously provided but
>> they
>> did not help.
>>
>> And for the second question, I retrieve these fields:
>>
>>     $query->addField('title')->addField('cat')->addField('thumbs_up')->
>>             addField('thumbs_down')->addField('lang')->addField('id')->
>>
>>  addField('username')->addField('view_count')->addField('pages')->
>>             addField('no_img')->addField('date');
>>
>> If I can't solve the highlighting problem on large documents, I can simply
>> disable it and retrieve first x characters from the plainText (full text)
>> field, but is it possible to retrieve first x characters without using the
>> highlighting feature? When I use this;
>>     $query->setHighlight(TRUE);
>>     $query->setHighlightAlternateField('plainText');
>>     $query->setHighlightMaxAnalyzedChars(0);
>>     $query->setHighlightMaxAlternateFieldLength(256);
>>
>> It still takes 2 seconds if I retrieve 10 rows that has 200-300 pages. The
>> highlighting still works so it might be the source of the problem, I want
>> to
>> completely disable it and retrieve only the first 256 characters of the
>> plainText field. Is it possible? It may remove some overhead give better
>> performance.
>>
>> I personally prefer the highlighting solution but I also would like to
>> hear
>> the solution for this problem. For the same query, if I disable
>> highlighting
>> and without retrieving (but still searching) the plainText field, it drops
>> to 0.0094 seconds. So I think if I can get the first 256 characters
>> without
>> using the highlighting, I will get better performance.
>>
>> Any suggestions regarding with these two problems will highly appreciated.
>>
>> Thanks,
>>
>> Serdar Sahin
>>
>>
>
> Hi Serdar,
>
> There are a few things I think of you can try.
>
> 1. Provide another field for highlighting and use copyField
> to copy plainText to the highlighting field. When using copyField,
> specify maxChars attribute to limit the length of the copy of plainText.
> This should work on Solr 1.4.
>
> 2. If you can use branch_3x version of Solr, try FastVectorHighlighter.
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Highlighting Performance On Large Documents

Reply via email to