Hi guys, Sorry to bother you again, but i am really confused:
Ive used solr admin website and created a query with lots of ORs using solr 4.7. When i execute the query without a sort it executes in round about 3.5 - 4 seconds. When i execute it with a sort on a field called pubdate it takes about 4-4.5 seconds. When i execute it with a sort on the guid field it takes about 7 - 8 seconds !!! After your explanations i was expecting the query without a sort to be the slowest. What am i missing here? Beat regards Faraz Am 30.11.2017 09:29 schrieb "Faraz Fallahi" <faraz.fall...@googlemail.com>: > Uff... I See.. thx dir the explanation :) > > Am 30.11.2017 3:13 nachm. schrieb "Emir Arnautović" < > emir.arnauto...@sematext.com>: > >> Hi Faraz, >> It is a bit worse than that - it also needs to calculate score, so for >> each matching doc of one query part it has to check if it appears in >> results of other query parts. If you use term query parser, you avoid >> calculating score - all doc will have score 1. >> Solr is based on lucene, which is mainly inverted index: >> https://en.wikipedia.org/wiki/Inverted_index < >> https://en.wikipedia.org/wiki/Inverted_index> so knowing that helps >> understand how expensive some queries are. It is relatively easy to figure >> out what steps are needed for different query types. Of course, Lucene >> includes a lot smartness, and it is probably not using the naive approach, >> but it cannot avoid limitations of inverted index. >> >> HTH, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >> > On 30 Nov 2017, at 02:39, Faraz Fallahi <faraz.fall...@googlemail.com> >> wrote: >> > >> > Hi Toke, >> > >> > Just to be clear and to understand. Does this mean that a query of the >> form >> > author:name1 OR author:name2 OR author:name3 >> > >> > Is being processed like e.g. >> > >> > 1 query against the index with author:name1 getting 4 result >> > Then 1 query against the index with author:name2 getting 3 result >> > Then 1 query against the index with author:name3 getting 1 result >> > >> > And in the end all results are merged and i get a result of 8 ? >> > >> > So a query of thousand authors will be splitted into thousand single >> > queries against the index? >> > >> > Do i understand this correctly? >> > >> > Thx for the help >> > Faraz >> > >> > >> > Am 28.11.2017 15:39 schrieb "Toke Eskildsen" <t...@kb.dk>: >> > >> > On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote: >> >> I have a question regarding solr queries. >> >> My query basically contains thousand of OR conditions for authors >> >> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...) >> >> The execution time on my index is huge (around 15 sec). When i tag >> >> all the associated documents with a custom field and value like >> >> authorlist:1 and then i change my query to just search for >> >> authorlist:1 it executes in 78 ms. How come there is such a big >> >> difference in exec-time? >> > >> > Due to the nature of inverted indexes (which lies at the heart of >> > Solr), your thousands of OR-queries means thousands of lookups, whereas >> > your authorlist means a single lookup. Adding to this the results for >> > each author needs to be merged with the other author-results - for >> > authorlist the results are there directly. >> > >> > If your author lists are static, indexing them as you did in your test >> > is the best solution. >> > >> > If they are not static, using a filter-query will ensure that they are >> > at least cached subsequently, so that only the first call will be >> > slow. >> > >> > If they are semi-static and there are not too many of them, you could >> > do warm-up filter-queries for all the different groups so that the >> > users does not pay the first-call penalty. This requires your filter- >> > cache to be large enough to hold all the author lists. >> > >> > - Toke Eskildsen, Royal Danish Library >> >>