Hi,

in the meantime I came across the reason for the slow facet processing capacities of SOLR since version 5.x

 https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666

compared to version 4.x

Various library networks across the world are suffering from the same symptoms:

Facet processing is one of the most important features of a search server (for us) and it seems (at least IMHO) there is no solution for the issue since March 2015 (release date for the last SOLR 4 version)

What are the plans / ideas of the solr developers for a possible future solution? Or maybe there is already a solution I haven't seen so far.

Thanks for a feedback

Günter



On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:
Hi,

I can't figure out the reason why the facet processing in version 6 needs significantly more time compared to version 4.

The debugging response (for 30 million documents)

solr 4
<lst name="process"><double name="time">280.0</double><lst name="query"><double name="time">0.0</double></lst><lst name="facet"><double name="time">280.0</double></lst>
(once the query is cached)
before caching: between 1.5 and 2 sec


solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
<lst name="process"><double name="time">5874.0</double><lst name="query"><double name="time">0.0</double></lst><lst name="facet"><double name="time">5873.0</double></lst><lst name="facet_module"><double name="time">0.0</double></lst> the time is not getting better even after repeating the query several times


solr 6.6 with docvalues for facetting fields
<lst name="process"><double name="time">9837.0</double><lst name="query"><double name="time">0.0</double></lst><lst name="facet"><double name="time">9837.0</double></lst><lst name="facet_module"><double name="time">0.0</double></lst>

used query (our productive system with version 4)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count


Running the queries on smaller indices (8 million docs) the difference is similar although the absolut figures for processing time are smaller


Any hints why this huge differences?

Günter










--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/

Reply via email to