response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

2017-08-09 Thread guenterh.li...@bluewin.ch
Hi,
we are updating our SOLR infrastructure from version 4.10.2 to the latest 6.6. 
We realize a significant degradation of the response time while running 
match-all queries with facets (query in [1]) With version 4.x these kind of 
queries never took longer than 2000 ms.
Now all of these queries need more than 9000 ms. 
Our index [2] [3] contains around 30 Mio docs. Because we want to use 
doc-values for facets and sort functions we changed our doc-processing 
significantly replacing all text type with string fields.
The behavior of normal term queries is acceptable although it's a little bit 
slower compared with the current productive environment. Yesterday I run a 
couple of performance tests
I looked around and came across this (older) issue [4] which is partially 
related to our observations but actually I cannot find a solution for our 
behavior.
Did we miss something on the way of the development from version 4 / 5 / 6 
which might be the reason for the degradation and we should change our queries?
Thanks a lot for any hints
Günter
[1] 
http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&facet=true&wt=xml&facet.sort=count
[2] www.swissbib.ch
[3] http://search.swissbib.ch/solr/sb-biblio/select?q=*%3A*&wt=xml&indent=true
[4] https://issues.apache.org/jira/browse/SOLR-8251


facet processing module in Version 6.x needs significantly more time compared to version 4.10

2017-08-21 Thread guenterh.li...@bluewin.ch
Hi,
I can't figure out the reason why the facet processing in version 6 needs 
significantly more time compared to version 4.
The debugging response (for 30 million documents)
solr 4
280.00.0280.0
(once the query is cached)
before caching: between 1.5 and 2 sec
solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
5874.00.05873.00.0
the time is not getting better even after repeating the query several times
solr 6.6 with docvalues for facetting fields
9837.00.09837.00.0
used query (our productive system with version 4)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
Running the queries on smaller indices (8 million docs) the difference is 
similar although the absolut figures for processing time are smaller
Any hints why this huge differences?
Günter


no stable results using morelikethis in distributed mode

2018-06-21 Thread guenterh.li...@bluewin.ch
Hi,
I realize a weird behaviour I can't explain (so far we are still running in 
master/slave mode)
Requesting the collection I see logs randomly against the two available shards 
"green_shard1_replica_n1" and "green_shard2_replica_n2"
2018-06-21 15:35:40.970 INFO  (qtp1873653341-17) [c:green s:shard2 r:core_node4 
x:green_shard2_replica_n2] o.a.s.c.S.Request [green_shard2_replica_n2]  
webapp=/solr path=/select 
params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json}
 status=0 QTime=1
2018-06-21 15:36:05.679 INFO  (qtp1873653341-70) [c:green s:shard1 r:core_node3 
x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1]  
webapp=/solr path=/select 
params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json}
 status=0 QTime=5
2018-06-21 15:36:11.022 INFO  (qtp1873653341-17) [c:green s:shard1 r:core_node3 
x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1]  
webapp=/solr path=/select 
params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json}
 status=0 QTime=2
2018-06-21 15:36:17.031 INFO  (qtp1873653341-70) [c:green s:shard1 r:core_node3 
x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1]  
webapp=/solr path=/select 
params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json}
 status=0 QTime=2
2018-06-21 15:36:21.800 INFO  (qtp1873653341-17) [c:green s:shard1 r:core_node3 
x:green_shard1_replica_n1] o.a.s.c.S.Request [green_shard1_replica_n1]  
webapp=/solr path=/select 
params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json}
 status=0 QTime=2
2018-06-21 15:36:26.668 INFO  (qtp1873653341-70) [c:green s:shard2 r:core_node4 
x:green_shard2_replica_n2] o.a.s.c.S.Request [green_shard2_replica_n2]  
webapp=/solr path=/select 
params={q=id:"508364329"&qt=morelikethis&json.nl=arrarr&fl=*,score&rows=40&rows=5&wt=json}
 status=0 QTime=0
In case the running node selects "green_shard1_replica_n1" I'm getting results 
but I will always get no morelikethis suggestions in case the other shard is 
selected.
The number of shards for the test collection is 2, replication factor 1 (it's 
only a test index for a small number of docs) but the behavior is the same for 
a huge collection (30 Mio docs)
What might be the reason for this behavior? - thanks for any hints
Günter