response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x
Hi, we are updating our SOLR infrastructure from version 4.10.2 to the latest 6.6. We realize a significant degradation of the response time while running match-all queries with facets (query in [1]) With version 4.x these kind of queries never took longer than 2000 ms. Now all of these queries need more than 9000 ms. Our index [2] [3] contains around 30 Mio docs. Because we want to use doc-values for facets and sort functions we changed our doc-processing significantly replacing all text type with string fields. The behavior of normal term queries is acceptable although it's a little bit slower compared with the current productive environment. Yesterday I run a couple of performance tests I looked around and came across this (older) issue [4] which is partially related to our observations but actually I cannot find a solution for our behavior. Did we miss something on the way of the development from version 4 / 5 / 6 which might be the reason for the degradation and we should change our queries? Thanks a lot for any hints Günter [1] http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count [2] www.swissbib.ch [3] http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true [4] https://issues.apache.org/jira/browse/SOLR-8251
facet processing module in Version 6.x needs significantly more time compared to version 4.10
Hi, I can't figure out the reason why the facet processing in version 6 needs significantly more time compared to version 4. The debugging response (for 30 million documents) solr 4 280.00.0280.0 (once the query is cached) before caching: between 1.5 and 2 sec solr 6.x (my last try was with 6.6) without docvalues for facetting fields (same schema as version 4) 5874.00.05873.00.0 the time is not getting better even after repeating the query several times solr 6.6 with docvalues for facetting fields 9837.00.09837.00.0 used query (our productive system with version 4) http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count Running the queries on smaller indices (8 million docs) the difference is similar although the absolut figures for processing time are smaller Any hints why this huge differences? Günter
no stable results using morelikethis in distributed mode
Hi, I realize a weird behaviour I can't explain (so far we are still running in master/slave mode) Requesting the collection I see logs randomly against the two available shards "green_shard1_replica_n1" and "green_shard2_replica_n2" 2018-06-21 15:35:40.970 INFO (qtp1873653341-17) [c:green s:shard2 r:core_node4 x:green_shard2_replica_n2] o.a.s.c.S.Request [green_shard2_replica_n2] webapp=/solr path=/select params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json} status=0 QTime=1 2018-06-21 15:36:05.679 INFO (qtp1873653341-70) [c:green s:shard1 r:core_node3 x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1] webapp=/solr path=/select params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json} status=0 QTime=5 2018-06-21 15:36:11.022 INFO (qtp1873653341-17) [c:green s:shard1 r:core_node3 x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1] webapp=/solr path=/select params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json} status=0 QTime=2 2018-06-21 15:36:17.031 INFO (qtp1873653341-70) [c:green s:shard1 r:core_node3 x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1] webapp=/solr path=/select params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json} status=0 QTime=2 2018-06-21 15:36:21.800 INFO (qtp1873653341-17) [c:green s:shard1 r:core_node3 x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1] webapp=/solr path=/select params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json} status=0 QTime=2 2018-06-21 15:36:26.668 INFO (qtp1873653341-70) [c:green s:shard2 r:core_node4 x:green_shard2_replica_n2] o.a.s.c.S.Request [green_shard2_replica_n2] webapp=/solr path=/select params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json} status=0 QTime=0 In case the running node selects "green_shard1_replica_n1" I'm getting results but I will always get no morelikethis suggestions in case the other shard is selected. The number of shards for the test collection is 2, replication factor 1 (it's only a test index for a small number of docs) but the behavior is the same for a huge collection (30 Mio docs) What might be the reason for this behavior? - thanks for any hints Günter