Hi Erik,

thanks for your reply. I made some deeper investigations to tackle the reason for the behavior but wasn't successful so far
Answer to your questions:
- yes I completely re-indexed the data
- yes I'm running a collection of around 5.000 queries coming from our productive logs

Now my current state of investigation:
1) a query on our current system (4.10) is using around 200 ms for processing facets on a larger resultset (here just one example)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&indent=on&q.alt=*:*&ps=2&hl=true&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&q.op=AND&hl.simple.pre={{{{START_HILITE}}}}&qf=title_short^1000+title_alt^200+title^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+author_additional_gnd_txt_mv^100+title_additional_gnd_txt_mv^100+publplace_additional_gnd_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+cancisbn_isn_mv+variant_isbn_isn_mv+issn+incoissn_isn_mv+localcode+id&hl.fl=fulltext&wt=xml&mm=100%25&facet.field={!ex%3Dunion_filter}union&facet.field={!ex%3DnavAuthor_full_filter}navAuthor_full&facet.field={!ex%3Dformat_hierarchy_str_mv_filter}format_hierarchy_str_mv&facet.field={!ex%3Dlanguage_filter}language&facet.field=navSub_green&facet.field={!ex%3DnavSubform_filter}navSubform&facet.field=publishDate&qt=edismax&json.nl=arrarr&start=0&sort=score+desc&rows=0&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&pf=title_short^1000&facet.mincount=1&facet=true&facet.sort=count

while the same query on 6.x is using more than 4000 ms not uncommon more than 10000ms
https://gist.github.com/guenterh/8032bddd9bfce31324d1a8651b8d282b
(server is publicly not available)

2) I used several solr 6 versions (6.3 until 6.6) because other (library) networks running big indexes reported they too had faceting problems and one solved it with 6.3

3) I tried the way we built our old index schema (facet fields based on text types) as well as a schema with string fields for docvalues (the way we want to go in the future) but had the same problems

4) I played around with new possibilities of facet.methods (https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Thefacet.methodParameter - not available in version 4) but wasn't able to improve the results.

I have the impression something changed significantly in the way how facets are processed but unfortunately can't figure out how to make it that our use case isn't so badly affected as it is by now.

Thanks for hints!

Günter


On 09.08.2017 17:22, Erick Erickson wrote:
Two questions:

1> did you completely re-index under 6x? My guess is "yes", since you
jumped two major versions and 6x won't read a 4x index. If not you may
be getting some performance degradation due to back-compat..

2> Try turning &debug=timing. that breaks down the time spent in each
component and may give a clue, Highlighting has changed significantly
so that's one place I'd look.

And I'm assuming you're running a suite of tests, trying just a few
queries is uninformative due to loading parts of the index into
memory.

Best,
Erick

On Wed, Aug 9, 2017 at 1:09 AM, guenterh.li...@bluewin.ch
<guenterh.li...@bluewin.ch> wrote:
Hi,
we are updating our SOLR infrastructure from version 4.10.2 to the latest
6.6.

We realize a significant degradation of the response time while running
match-all queries with facets (query in [1]) With version 4.x these kind of
queries never took longer than 2000 ms.

Now all of these queries need more than 9000 ms.

Our index [2] [3] contains around 30 Mio docs. Because we want to use
doc-values for facets and sort functions we changed our doc-processing
significantly replacing all text type with string fields.

The behavior of normal term queries is acceptable although it's a little bit
slower compared with the current productive environment. Yesterday I run a
couple of performance tests

I looked around and came across this (older) issue [4] which is partially
related to our observations but actually I cannot find a solution for our
behavior.

Did we miss something on the way of the development from version 4 / 5 / 6
which might be the reason for the degradation and we should change our
queries?

Thanks a lot for any hints

Günter



[1]
http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count

[2] www.swissbib.ch
[3]
http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true
[4] https://issues.apache.org/jira/browse/SOLR-8251

--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/


Reply via email to