Huge Query execution time for multiple ORs

2017-11-28 Thread Faraz Fallahi
Hi

I have a question regarding solr queries.
My query basically contains thousand of OR conditions for authors
(author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
The execution time on my index is huge (around 15 sec). When i tag all the
associated documents with a custom field and value like authorlist:1 and
then i change my query to just search for authorlist:1 it executes in 78
ms. How come there is such a big difference in exec-time?
Can somebody please explain why there is sucha difference (maybe the query
parser?) and if there is a way to speed this up?

Thx for the help


Re: Huge Query execution time for multiple ORs

2017-11-28 Thread Faraz Fallahi
Hi

Thx for all the replies.
I think in any way tagging them is probably the best solution on any way.

Best regards

Am 28.11.2017 15:39 schrieb "Toke Eskildsen" :

> On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
> > I have a question regarding solr queries.
> > My query basically contains thousand of OR conditions for authors
> > (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
> > The execution time on my index is huge (around 15 sec). When i tag
> > all the associated documents with a custom field and value like
> > authorlist:1 and then i change my query to just search for
> > authorlist:1 it executes in 78 ms. How come there is such a big
> > difference in exec-time?
>
> Due to the nature of inverted indexes (which lies at the heart of
> Solr), your thousands of OR-queries means thousands of lookups, whereas
> your authorlist means a single lookup. Adding to this the results for
> each author needs to be merged with the other author-results - for
> authorlist the results are there directly.
>
> If your author lists are static, indexing them as you did in your test
> is the best solution.
>
> If they are not static, using a filter-query will ensure that they are
> at least cached subsequently, so that only the first call will be
> slow.
>
> If they are semi-static and there are not too many of them, you could
> do warm-up filter-queries for all the different groups so that the
> users does not pay the first-call penalty. This requires your filter-
> cache to be large enough to hold all the author lists.
>
> - Toke Eskildsen, Royal Danish Library
>
>


Re: Huge Query execution time for multiple ORs

2017-11-29 Thread Faraz Fallahi
Hi Toke,

Just to be clear and to understand. Does this mean that a query of the form
author:name1 OR author:name2 OR author:name3

Is being processed like e.g.

1 query against the index with author:name1 getting 4 result
Then 1 query against the index with author:name2 getting 3 result
Then 1 query against the index with author:name3 getting 1 result

And in the end all results are merged and i get a result of 8 ?

So a query of thousand authors will be splitted into thousand single
queries against the index?

Do i understand this correctly?

Thx for the help
Faraz


Am 28.11.2017 15:39 schrieb "Toke Eskildsen" :

On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
> I have a question regarding solr queries.
> My query basically contains thousand of OR conditions for authors
> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
> The execution time on my index is huge (around 15 sec). When i tag
> all the associated documents with a custom field and value like
> authorlist:1 and then i change my query to just search for
> authorlist:1 it executes in 78 ms. How come there is such a big
> difference in exec-time?

Due to the nature of inverted indexes (which lies at the heart of
Solr), your thousands of OR-queries means thousands of lookups, whereas
your authorlist means a single lookup. Adding to this the results for
each author needs to be merged with the other author-results - for
authorlist the results are there directly.

If your author lists are static, indexing them as you did in your test
is the best solution.

If they are not static, using a filter-query will ensure that they are
at least cached subsequently, so that only the first call will be
slow.

If they are semi-static and there are not too many of them, you could
do warm-up filter-queries for all the different groups so that the
users does not pay the first-call penalty. This requires your filter-
cache to be large enough to hold all the author lists.

- Toke Eskildsen, Royal Danish Library


Re: Huge Query execution time for multiple ORs

2017-11-30 Thread Faraz Fallahi
Uff... I See.. thx dir the explanation :)

Am 30.11.2017 3:13 nachm. schrieb "Emir Arnautović" <
emir.arnauto...@sematext.com>:

> Hi Faraz,
> It is a bit worse than that - it also needs to calculate score, so for
> each matching doc of one query part it has to check if it appears in
> results of other query parts. If you use term query parser, you avoid
> calculating score - all doc will have score 1.
> Solr is based on lucene, which is mainly inverted index:
> https://en.wikipedia.org/wiki/Inverted_index <https://en.wikipedia.org/
> wiki/Inverted_index> so knowing that helps understand how expensive some
> queries are. It is relatively easy to figure out what steps are needed for
> different query types. Of course, Lucene includes a lot smartness, and it
> is probably not using the naive approach, but it cannot avoid limitations
> of inverted index.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 30 Nov 2017, at 02:39, Faraz Fallahi 
> wrote:
> >
> > Hi Toke,
> >
> > Just to be clear and to understand. Does this mean that a query of the
> form
> > author:name1 OR author:name2 OR author:name3
> >
> > Is being processed like e.g.
> >
> > 1 query against the index with author:name1 getting 4 result
> > Then 1 query against the index with author:name2 getting 3 result
> > Then 1 query against the index with author:name3 getting 1 result
> >
> > And in the end all results are merged and i get a result of 8 ?
> >
> > So a query of thousand authors will be splitted into thousand single
> > queries against the index?
> >
> > Do i understand this correctly?
> >
> > Thx for the help
> > Faraz
> >
> >
> > Am 28.11.2017 15:39 schrieb "Toke Eskildsen" :
> >
> > On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
> >> I have a question regarding solr queries.
> >> My query basically contains thousand of OR conditions for authors
> >> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
> >> The execution time on my index is huge (around 15 sec). When i tag
> >> all the associated documents with a custom field and value like
> >> authorlist:1 and then i change my query to just search for
> >> authorlist:1 it executes in 78 ms. How come there is such a big
> >> difference in exec-time?
> >
> > Due to the nature of inverted indexes (which lies at the heart of
> > Solr), your thousands of OR-queries means thousands of lookups, whereas
> > your authorlist means a single lookup. Adding to this the results for
> > each author needs to be merged with the other author-results - for
> > authorlist the results are there directly.
> >
> > If your author lists are static, indexing them as you did in your test
> > is the best solution.
> >
> > If they are not static, using a filter-query will ensure that they are
> > at least cached subsequently, so that only the first call will be
> > slow.
> >
> > If they are semi-static and there are not too many of them, you could
> > do warm-up filter-queries for all the different groups so that the
> > users does not pay the first-call penalty. This requires your filter-
> > cache to be large enough to hold all the author lists.
> >
> > - Toke Eskildsen, Royal Danish Library
>
>


Solr score use cases

2017-12-01 Thread Faraz Fallahi
Hi

A simple question: what are the most common use cases for the solr score of
documents retrieved after firing queries?
I dont have a real understanding of its purpose at the moment.

Thx for helping


Re: Solr score use cases

2017-12-01 Thread Faraz Fallahi
Oki but If ID Just make an simple query with a "where Claude" and sort by a
field i See no sense in calculating a score right?

Am 01.12.2017 16:33 schrieb "Aman Tandon" :

> Hi Faraz,
>
> Solr score which you could retrieved by adding in fl parameter could be
> helpful to understand the following:
>
> 1) search relevance ranking: how much score solr has given to the top &
> second top document, and with debug=true you could better understand what
> is causing that score.
>
> 2) You could use the function query to multiply score with some feature
> e.g. paid customers score, popularity score, etc to improve the relevance
> as per the business.
>
> I am able to think these few points only, someone can also put more light
> if I am missing anything. I hope this is what you want to know. 😊
>
> Regards,
> Aman
>
> On Dec 1, 2017 13:38, "Faraz Fallahi" 
> wrote:
>
> Hi
>
> A simple question: what are the most common use cases for the solr score of
> documents retrieved after firing queries?
> I dont have a real understanding of its purpose at the moment.
>
> Thx for helping
>


Re: Solr score use cases

2017-12-01 Thread Faraz Fallahi
Or does the Score even get calculated when i sort or Not?

Am 01.12.2017 4:38 nachm. schrieb "Faraz Fallahi" <
faraz.fall...@googlemail.com>:

> Oki but If ID Just make an simple query with a "where Claude" and sort by
> a field i See no sense in calculating a score right?
>
> Am 01.12.2017 16:33 schrieb "Aman Tandon" :
>
>> Hi Faraz,
>>
>> Solr score which you could retrieved by adding in fl parameter could be
>> helpful to understand the following:
>>
>> 1) search relevance ranking: how much score solr has given to the top &
>> second top document, and with debug=true you could better understand what
>> is causing that score.
>>
>> 2) You could use the function query to multiply score with some feature
>> e.g. paid customers score, popularity score, etc to improve the relevance
>> as per the business.
>>
>> I am able to think these few points only, someone can also put more light
>> if I am missing anything. I hope this is what you want to know. 😊
>>
>> Regards,
>> Aman
>>
>> On Dec 1, 2017 13:38, "Faraz Fallahi" 
>> wrote:
>>
>> Hi
>>
>> A simple question: what are the most common use cases for the solr score
>> of
>> documents retrieved after firing queries?
>> I dont have a real understanding of its purpose at the moment.
>>
>> Thx for helping
>>
>


Re: Solr score use cases

2017-12-01 Thread Faraz Fallahi
Thx for the clarification
Best regards

Am 01.12.2017 18:25 schrieb "Erick Erickson" :

> Sorting certainly ignores scoring, I'm pretty sure it's just not
> calculated in that case.
>
> If your sorting results in multiple documents in the same bin, people
> will combine the primary sort with a secondary sort on score, so in
> that case the score is definitely calculated, ie "&sort=day asc, score
> desc"
>
> Returning the score with documents is usually for development
> purposes. Scores are _not_ comparable except within a single query, so
> IMO telling users that a doc from one search has a score of X and a
> doc from another search has a score of Y is useless-to-misleading
> information. A score of 2X is _not_ necessarily "twice as good" (or
> even as good) as a score of X in another search.
>
> FWIW,
> Erick
>
> On Fri, Dec 1, 2017 at 6:31 AM, Faraz Fallahi
>  wrote:
> > Or does the Score even get calculated when i sort or Not?
> >
> > Am 01.12.2017 4:38 nachm. schrieb "Faraz Fallahi" <
> > faraz.fall...@googlemail.com>:
> >
> >> Oki but If ID Just make an simple query with a "where Claude" and sort
> by
> >> a field i See no sense in calculating a score right?
> >>
> >> Am 01.12.2017 16:33 schrieb "Aman Tandon" :
> >>
> >>> Hi Faraz,
> >>>
> >>> Solr score which you could retrieved by adding in fl parameter could be
> >>> helpful to understand the following:
> >>>
> >>> 1) search relevance ranking: how much score solr has given to the top &
> >>> second top document, and with debug=true you could better understand
> what
> >>> is causing that score.
> >>>
> >>> 2) You could use the function query to multiply score with some feature
> >>> e.g. paid customers score, popularity score, etc to improve the
> relevance
> >>> as per the business.
> >>>
> >>> I am able to think these few points only, someone can also put more
> light
> >>> if I am missing anything. I hope this is what you want to know. 😊
> >>>
> >>> Regards,
> >>> Aman
> >>>
> >>> On Dec 1, 2017 13:38, "Faraz Fallahi" 
> >>> wrote:
> >>>
> >>> Hi
> >>>
> >>> A simple question: what are the most common use cases for the solr
> score
> >>> of
> >>> documents retrieved after firing queries?
> >>> I dont have a real understanding of its purpose at the moment.
> >>>
> >>> Thx for helping
> >>>
> >>
>


Re: Huge Query execution time for multiple ORs

2017-12-04 Thread Faraz Fallahi
Hi guys,

Sorry to bother you again, but i am really confused:

Ive used solr admin website and created a query with lots of ORs using solr
4.7.

When i execute the query without a sort it executes in round about 3.5 - 4
seconds.
When i execute it with a sort on a field called pubdate it takes about
4-4.5 seconds.
When i execute it with a sort on the guid field it takes about 7 - 8
seconds !!!

After your explanations i was expecting the query without a sort to be the
slowest. What am i missing here?

Beat regards
Faraz

Am 30.11.2017 09:29 schrieb "Faraz Fallahi" :

> Uff... I See.. thx dir the explanation :)
>
> Am 30.11.2017 3:13 nachm. schrieb "Emir Arnautović" <
> emir.arnauto...@sematext.com>:
>
>> Hi Faraz,
>> It is a bit worse than that - it also needs to calculate score, so for
>> each matching doc of one query part it has to check if it appears in
>> results of other query parts. If you use term query parser, you avoid
>> calculating score - all doc will have score 1.
>> Solr is based on lucene, which is mainly inverted index:
>> https://en.wikipedia.org/wiki/Inverted_index <
>> https://en.wikipedia.org/wiki/Inverted_index> so knowing that helps
>> understand how expensive some queries are. It is relatively easy to figure
>> out what steps are needed for different query types. Of course, Lucene
>> includes a lot smartness, and it is probably not using the naive approach,
>> but it cannot avoid limitations of inverted index.
>>
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>> > On 30 Nov 2017, at 02:39, Faraz Fallahi 
>> wrote:
>> >
>> > Hi Toke,
>> >
>> > Just to be clear and to understand. Does this mean that a query of the
>> form
>> > author:name1 OR author:name2 OR author:name3
>> >
>> > Is being processed like e.g.
>> >
>> > 1 query against the index with author:name1 getting 4 result
>> > Then 1 query against the index with author:name2 getting 3 result
>> > Then 1 query against the index with author:name3 getting 1 result
>> >
>> > And in the end all results are merged and i get a result of 8 ?
>> >
>> > So a query of thousand authors will be splitted into thousand single
>> > queries against the index?
>> >
>> > Do i understand this correctly?
>> >
>> > Thx for the help
>> > Faraz
>> >
>> >
>> > Am 28.11.2017 15:39 schrieb "Toke Eskildsen" :
>> >
>> > On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
>> >> I have a question regarding solr queries.
>> >> My query basically contains thousand of OR conditions for authors
>> >> (author:name1 OR author:name2 OR author:name3 OR author:name4 ...)
>> >> The execution time on my index is huge (around 15 sec). When i tag
>> >> all the associated documents with a custom field and value like
>> >> authorlist:1 and then i change my query to just search for
>> >> authorlist:1 it executes in 78 ms. How come there is such a big
>> >> difference in exec-time?
>> >
>> > Due to the nature of inverted indexes (which lies at the heart of
>> > Solr), your thousands of OR-queries means thousands of lookups, whereas
>> > your authorlist means a single lookup. Adding to this the results for
>> > each author needs to be merged with the other author-results - for
>> > authorlist the results are there directly.
>> >
>> > If your author lists are static, indexing them as you did in your test
>> > is the best solution.
>> >
>> > If they are not static, using a filter-query will ensure that they are
>> > at least cached subsequently, so that only the first call will be
>> > slow.
>> >
>> > If they are semi-static and there are not too many of them, you could
>> > do warm-up filter-queries for all the different groups so that the
>> > users does not pay the first-call penalty. This requires your filter-
>> > cache to be large enough to hold all the author lists.
>> >
>> > - Toke Eskildsen, Royal Danish Library
>>
>>


Re: Huge Query execution time for multiple ORs

2017-12-04 Thread Faraz Fallahi
Will do thx

Am 04.12.2017 9:27 nachm. schrieb "Emir Arnautović" <
emir.arnauto...@sematext.com>:

> Hi Faraz,
> When you say query without sort, I assume that you mean you omit sort so
> you expect it to be sorted by score. It is expected to be slower than equal
> query without calculating score - e.g. run same query as fq.
> What you observe can be explained with:
> * Solr is calculating score even not sorted by score and not returning it
> (do you return score? Plus I am not sure about this - did not check the
> code)
> * Field that you are using for sorting do not have doc values so have to
> be uninverted
> * Fileld that you are using for sorting are not in OS cache so are read
> from disk.
>
> Try comparing same query running as q=..,. and fq=… Make sure that your
> filter cache is disabled if you are repeating the same queries and
> averaging.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Dec 2017, at 14:54, Faraz Fallahi 
> wrote:
> >
> > Hi guys,
> >
> > Sorry to bother you again, but i am really confused:
> >
> > Ive used solr admin website and created a query with lots of ORs using
> solr
> > 4.7.
> >
> > When i execute the query without a sort it executes in round about 3.5 -
> 4
> > seconds.
> > When i execute it with a sort on a field called pubdate it takes about
> > 4-4.5 seconds.
> > When i execute it with a sort on the guid field it takes about 7 - 8
> > seconds !!!
> >
> > After your explanations i was expecting the query without a sort to be
> the
> > slowest. What am i missing here?
> >
> > Beat regards
> > Faraz
> >
> > Am 30.11.2017 09:29 schrieb "Faraz Fallahi" <
> faraz.fall...@googlemail.com>:
> >
> >> Uff... I See.. thx dir the explanation :)
> >>
> >> Am 30.11.2017 3:13 nachm. schrieb "Emir Arnautović" <
> >> emir.arnauto...@sematext.com>:
> >>
> >>> Hi Faraz,
> >>> It is a bit worse than that - it also needs to calculate score, so for
> >>> each matching doc of one query part it has to check if it appears in
> >>> results of other query parts. If you use term query parser, you avoid
> >>> calculating score - all doc will have score 1.
> >>> Solr is based on lucene, which is mainly inverted index:
> >>> https://en.wikipedia.org/wiki/Inverted_index <
> >>> https://en.wikipedia.org/wiki/Inverted_index> so knowing that helps
> >>> understand how expensive some queries are. It is relatively easy to
> figure
> >>> out what steps are needed for different query types. Of course, Lucene
> >>> includes a lot smartness, and it is probably not using the naive
> approach,
> >>> but it cannot avoid limitations of inverted index.
> >>>
> >>> HTH,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
> >>>> On 30 Nov 2017, at 02:39, Faraz Fallahi  >
> >>> wrote:
> >>>>
> >>>> Hi Toke,
> >>>>
> >>>> Just to be clear and to understand. Does this mean that a query of the
> >>> form
> >>>> author:name1 OR author:name2 OR author:name3
> >>>>
> >>>> Is being processed like e.g.
> >>>>
> >>>> 1 query against the index with author:name1 getting 4 result
> >>>> Then 1 query against the index with author:name2 getting 3 result
> >>>> Then 1 query against the index with author:name3 getting 1 result
> >>>>
> >>>> And in the end all results are merged and i get a result of 8 ?
> >>>>
> >>>> So a query of thousand authors will be splitted into thousand single
> >>>> queries against the index?
> >>>>
> >>>> Do i understand this correctly?
> >>>>
> >>>> Thx for the help
> >>>> Faraz
> >>>>
> >>>>
> >>>> Am 28.11.2017 15:39 schrieb "Toke Eskildsen" :
> >>>>
> >>>> On Tue, 2017-11-28 at 11:07 +0100, Faraz Fallahi wrote:
> >>>>> I have a question regarding solr queries.
> >>>>> My query basi