Hi,

Currently, there are similar topics active in the mailing list, but it I did
not want to steal the topic.

I have currently indexed 100.000 documents, they are microsoft office/pdf
etc documents I convert them to TXT files before indexing. Files are between
1-500 pages. When I search something and filter it to retrieve documents
that has more than 100 pages, and activate highlighting, it takes 0.8-3
seconds, depending on the query. (10 result per page) If I retrieve
documents that has 1-5 pages, it drops to 0.1 seconds.

If I disable highlighting, it drops to 0.1-0.2 seconds, even on the large
documents, which is more than enough. This problem mostly happens where
there are no caches, on the first query. I use this configuration for
highlighting:


 $query->addHighlightField('description')->addHighlightField('plainText');
    $query->setHighlightSimplePre('<strong>');
    $query->setHighlightSimplePost('</strong>');
    $query->setHighlightHighlightMultiTerm(TRUE);
    $query->setHighlightMaxAnalyzedChars(10000);
    $query->setHighlightSnippets(2);

Do you have any suggestions to improve response time while highlighting is
active? I have read couple of articles you have previously provided but they
did not help.

And for the second question, I retrieve these fields:

    $query->addField('title')->addField('cat')->addField('thumbs_up')->
            addField('thumbs_down')->addField('lang')->addField('id')->

 addField('username')->addField('view_count')->addField('pages')->
            addField('no_img')->addField('date');

If I can't solve the highlighting problem on large documents, I can simply
disable it and retrieve first x characters from the plainText (full text)
field, but is it possible to retrieve first x characters without using the
highlighting feature? When I use this;
    $query->setHighlight(TRUE);
    $query->setHighlightAlternateField('plainText');
    $query->setHighlightMaxAnalyzedChars(0);
    $query->setHighlightMaxAlternateFieldLength(256);

It still takes 2 seconds if I retrieve 10 rows that has 200-300 pages. The
highlighting still works so it might be the source of the problem, I want to
completely disable it and retrieve only the first 256 characters of the
plainText field. Is it possible? It may remove some overhead give better
performance.

I personally prefer the highlighting solution but I also would like to hear
the solution for this problem. For the same query, if I disable highlighting
and without retrieving (but still searching) the plainText field, it drops
to 0.0094 seconds. So I think if I can get the first 256 characters without
using the highlighting, I will get better performance.

Any suggestions regarding with these two problems will highly appreciated.

Thanks,

Serdar Sahin

Reply via email to