Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

Günter Hipler Fri, 18 Aug 2017 07:04:31 -0700

Hi Erik,

thanks for your reply. I made some deeper investigations to tackle thereason for the behavior but wasn't successful so far

Answer to your questions:
- yes I completely re-indexed the data

- yes I'm running a collection of around 5.000 queries coming from ourproductive logs


Now my current state of investigation:

1) a query on our current system (4.10) is using around 200 ms forprocessing facets on a larger resultset (here just one example)

http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&indent=on&q.alt=*:*&ps=2&hl=true&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&q.op=AND&hl.simple.pre={{{{START_HILITE}}}}&qf=title_short^1000+title_alt^200+title^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+author_additional_gnd_txt_mv^100+title_additional_gnd_txt_mv^100+publplace_additional_gnd_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+cancisbn_isn_mv+variant_isbn_isn_mv+issn+incoissn_isn_mv+localcode+id&hl.fl=fulltext&wt=xml&mm=100%25&facet.field={!ex%3Dunion_filter}union&facet.field={!ex%3DnavAuthor_full_filter}navAuthor_full&facet.field={!ex%3Dformat_hierarchy_str_mv_filter}format_hierarchy_str_mv&facet.field={!ex%3Dlanguage_filter}language&facet.field=navSub_green&facet.field={!ex%3DnavSubform_filter}navSubform&facet.field=publishDate&qt=edismax&json.nl=arrarr&start=0&sort=score+desc&rows=0&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&pf=title_short^1000&facet.mincount=1&facet=true&facet.sort=count

while the same query on 6.x is using more than 4000 ms not uncommon morethan 10000ms

https://gist.github.com/guenterh/8032bddd9bfce31324d1a8651b8d282b
(server is publicly not available)

2) I used several solr 6 versions (6.3 until 6.6) because other(library) networks running big indexes reported they too had facetingproblems and one solved it with 6.3

3) I tried the way we built our old index schema (facet fields based ontext types) as well as a schema with string fields for docvalues (theway we want to go in the future) but had the same problems

4) I played around with new possibilities of facet.methods(https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Thefacet.methodParameter- not available in version 4) but wasn't able to improve the results.

I have the impression something changed significantly in the way howfacets are processed but unfortunately can't figure out how to make itthat our use case isn't so badly affected as it is by now.


Thanks for hints!

Günter


On 09.08.2017 17:22, Erick Erickson wrote:

Two questions:

1> did you completely re-index under 6x? My guess is "yes", since you
jumped two major versions and 6x won't read a 4x index. If not you may
be getting some performance degradation due to back-compat..

2> Try turning &debug=timing. that breaks down the time spent in each
component and may give a clue, Highlighting has changed significantly
so that's one place I'd look.

And I'm assuming you're running a suite of tests, trying just a few
queries is uninformative due to loading parts of the index into
memory.

Best,
Erick

On Wed, Aug 9, 2017 at 1:09 AM, guenterh.li...@bluewin.ch
<guenterh.li...@bluewin.ch> wrote:

Hi,
we are updating our SOLR infrastructure from version 4.10.2 to the latest
6.6.

We realize a significant degradation of the response time while running
match-all queries with facets (query in [1]) With version 4.x these kind of
queries never took longer than 2000 ms.

Now all of these queries need more than 9000 ms.

Our index [2] [3] contains around 30 Mio docs. Because we want to use
doc-values for facets and sort functions we changed our doc-processing
significantly replacing all text type with string fields.

The behavior of normal term queries is acceptable although it's a little bit
slower compared with the current productive environment. Yesterday I run a
couple of performance tests

I looked around and came across this (older) issue [4] which is partially
related to our observations but actually I cannot find a solution for our
behavior.

Did we miss something on the way of the development from version 4 / 5 / 6
which might be the reason for the degradation and we should change our
queries?

Thanks a lot for any hints

Günter



[1]
http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count

[2] www.swissbib.ch
[3]
http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true
[4] https://issues.apache.org/jira/browse/SOLR-8251


--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/

Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

Reply via email to