Hi,
in the meantime I came across the reason for the slow facet processing
capacities of SOLR since version 5.x
https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666
compared to version 4.x
Various library networks across the world are suffering from the same
symptoms:
Facet processing is one of the most important features of a search
server (for us) and it seems (at least IMHO) there is no solution for
the issue since March 2015 (release date for the last SOLR 4 version)
What are the plans / ideas of the solr developers for a possible future
solution? Or maybe there is already a solution I haven't seen so far.
Thanks for a feedback
Günter
On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:
Hi,
I can't figure out the reason why the facet processing in version 6
needs significantly more time compared to version 4.
The debugging response (for 30 million documents)
solr 4
<lst name="process"><double name="time">280.0</double><lst
name="query"><double name="time">0.0</double></lst><lst
name="facet"><double name="time">280.0</double></lst>
(once the query is cached)
before caching: between 1.5 and 2 sec
solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
<lst name="process"><double name="time">5874.0</double><lst
name="query"><double name="time">0.0</double></lst><lst
name="facet"><double name="time">5873.0</double></lst><lst
name="facet_module"><double name="time">0.0</double></lst>
the time is not getting better even after repeating the query several
times
solr 6.6 with docvalues for facetting fields
<lst name="process"><double name="time">9837.0</double><lst
name="query"><double name="time">0.0</double></lst><lst
name="facet"><double name="time">9837.0</double></lst><lst
name="facet_module"><double name="time">0.0</double></lst>
used query (our productive system with version 4)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
Running the queries on smaller indices (8 million docs) the difference
is similar although the absolut figures for processing time are smaller
Any hints why this huge differences?
Günter
--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org / http://www.ub.unibas.ch/