Re: Huge Query execution time for multiple ORs

Emir Arnautović Mon, 04 Dec 2017 06:28:00 -0800

Hi Faraz,
When you say query without sort, I assume that you mean you omit sort so you 
expect it to be sorted by score. It is expected to be slower than equal query 
without calculating score - e.g. run same query as fq.
What you observe can be explained with:
* Solr is calculating score even not sorted by score and not returning it (do 
you return score? Plus I am not sure about this - did not check the code)
* Field that you are using for sorting do not have doc values so have to be 
uninverted
* Fileld that you are using for sorting are not in OS cache so are read from 
disk.


Try comparing same query running as q=..,. and fq=… Make sure that your filter 
cache is disabled if you are repeating the same queries and averaging.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Dec 2017, at 14:54, Faraz Fallahi <faraz.fall...@googlemail.com> wrote:
> 
> Hi guys,
> 
> Sorry to bother you again, but i am really confused:
> 
> Ive used solr admin website and created a query with lots of ORs using solr
> 4.7.
> 
> When i execute the query without a sort it executes in round about 3.5 - 4
> seconds.
> When i execute it with a sort on a field called pubdate it takes about
> 4-4.5 seconds.
> When i execute it with a sort on the guid field it takes about 7 - 8
> seconds !!!
> 
> After your explanations i was expecting the query without a sort to be the
> slowest. What am i missing here?
> 
> Beat regards
> Faraz
> 
> Am 30.11.2017 09:29 schrieb "Faraz Fallahi" <faraz.fall...@googlemail.com>:
> 
>> Uff... I See.. thx dir the explanation :)
>> 
>> Am 30.11.2017 3:13 nachm. schrieb "Emir Arnautović" <
>> emir.arnauto...@sematext.com>:
>> 
>>> Hi Faraz,
>>> It is a bit worse than that - it also needs to calculate score, so for
>>> each matching doc of one query part it has to check if it appears in
>>> results of other query parts. If you use term query parser, you avoid
>>> calculating score - all doc will have score 1.
>>> Solr is based on lucene, which is mainly inverted index:
>>> https://en.wikipedia.org/wiki/Inverted_index <
>>> https://en.wikipedia.org/wiki/Inverted_index> so knowing that helps
>>> understand how expensive some queries are. It is relatively easy to figure
>>> out what steps are needed for different query types. Of course, Lucene
>>> includes a lot smartness, and it is probably not using the naive approach,
>>> but it cannot avoid limitations of inverted index.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 30 Nov 2017, at 02:39, Faraz Fallahi <faraz.fall...@googlemail.com>
>>> wrote:
>>>> 
>>>> Hi Toke,
>>>> 
>>>> Just to be clear and to understand. Does this mean that a query of the
>>> form
>>>> author:name1 OR author:name2 OR author:name3
>>>> 
>>>> Is being processed like e.g.
>>>> 
>>>> 1 query against the index with author:name1 getting 4 result
>>>> Then 1 query against the index with author:name2 getting 3 result
>>>> Then 1 query against the index with author:name3 getting 1 result
>>>> 
>>>> And in the end all results are merged and i get a result of 8 ?
>>>> 
>>>> So a query of thousand authors will be splitted into thousand single
>>>> queries against the index?
>>>> 
>>>> Do i understand this correctly?
>>>> 
>>>> Thx for the help
>>>> Faraz
>>>> 
>>>> 
>>>> Am 28.11.2017 15:39 schrieb "Toke Eskildsen" <t...@kb.dk>:
>>>> 
>>>> On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
>>>>> I have a question regarding solr queries.
>>>>> My query basically contains thousand of OR conditions for authors
>>>>> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
>>>>> The execution time on my index is huge (around 15 sec). When i tag
>>>>> all the associated documents with a custom field and value like
>>>>> authorlist:1 and then i change my query to just search for
>>>>> authorlist:1 it executes in 78 ms. How come there is such a big
>>>>> difference in exec-time?
>>>> 
>>>> Due to the nature of inverted indexes (which lies at the heart of
>>>> Solr), your thousands of OR-queries means thousands of lookups, whereas
>>>> your authorlist means a single lookup. Adding to this the results for
>>>> each author needs to be merged with the other author-results - for
>>>> authorlist the results are there directly.
>>>> 
>>>> If your author lists are static, indexing them as you did in your test
>>>> is the best solution.
>>>> 
>>>> If they are not static, using a filter-query will ensure that they are
>>>> at least cached subsequently, so that only the first call will be
>>>> slow.
>>>> 
>>>> If they are semi-static and there are not too many of them, you could
>>>> do warm-up filter-queries for all the different groups so that the
>>>> users does not pay the first-call penalty. This requires your filter-
>>>> cache to be large enough to hold all the author lists.
>>>> 
>>>> - Toke Eskildsen, Royal Danish Library
>>> 
>>>

Re: Huge Query execution time for multiple ORs

Reply via email to