Yonik, thanks for the hint with the uif facet method.
(btw: why isn't it part of the official documentation? - at least I
haven't found it)
For our use case it means:
Time for facet processing is exactly the same as it is with version 4.
But this works only for indexes 'without' docvalues
I tested two indexes with 30 million docs which are exactly the same
with one difference:
a) uses docvalues for faceting fields
b) no docvalues
both are multivalued
with a) I get faceting response times around 200ms
with b) 9000 ms
I'm really happy you re-started yesterday the discussion about
https://issues.apache.org/jira/browse/SOLR-8096
I only can support the comment of Shawn Heisey:
"If I had any understanding of how this code worked and the precise
reasons it has become slower, I would be working on a solution."
Although an old feature and perhaps the first well known feature of
SOLR, faceting is the most important one.
Günter
On 31.08.2017 19:04, Yonik Seeley wrote:
A possible improvement for some multiValued fields might be to use the
"uif" facet method (UnInvertedField was the default method for
multiValued fields in 4.x)
I'm not sure if you would need to reindex without docValues on that
field to try it though.
Example: to enable on the "union" field, add f.union.facet.method=uif
Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466
-Yonik
On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
<guenter.hip...@unibas.ch> wrote:
Hi,
in the meantime I came across the reason for the slow facet processing
capacities of SOLR since version 5.x
https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666
compared to version 4.x
Various library networks across the world are suffering from the same
symptoms:
Facet processing is one of the most important features of a search server
(for us) and it seems (at least IMHO) there is no solution for the issue
since March 2015 (release date for the last SOLR 4 version)
What are the plans / ideas of the solr developers for a possible future
solution? Or maybe there is already a solution I haven't seen so far.
Thanks for a feedback
Günter
On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:
Hi,
I can't figure out the reason why the facet processing in version 6 needs
significantly more time compared to version 4.
The debugging response (for 30 million documents)
solr 4
<lst name="process"><double name="time">280.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">280.0</double></lst>
(once the query is cached)
before caching: between 1.5 and 2 sec
solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
<lst name="process"><double name="time">5874.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">5873.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>
the time is not getting better even after repeating the query several
times
solr 6.6 with docvalues for facetting fields
<lst name="process"><double name="time">9837.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">9837.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>
used query (our productive system with version 4)
http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count
Running the queries on smaller indices (8 million docs) the difference is
similar although the absolut figures for processing time are smaller
Any hints why this huge differences?
Günter
--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org / http://www.ub.unibas.ch/
--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org / http://www.ub.unibas.ch/