Yonik, thanks for the hint with the uif facet method.
(btw: why isn't it part of the official documentation? - at least I haven't found it)

For our use case it means:
Time for facet processing is exactly the same as it is with version 4. But this works only for indexes 'without' docvalues I tested two indexes with 30 million docs which are exactly the same with one difference:
a) uses docvalues for faceting fields
b) no docvalues

both are multivalued

with a) I get faceting response times around 200ms
with b) 9000 ms

I'm really happy you re-started yesterday the discussion about https://issues.apache.org/jira/browse/SOLR-8096

I only can support the comment of Shawn Heisey:
"If I had any understanding of how this code worked and the precise reasons it has become slower, I would be working on a solution."

Although an old feature and perhaps the first well known feature of SOLR, faceting is the most important one.

Günter


On 31.08.2017 19:04, Yonik Seeley wrote:
A possible improvement for some multiValued fields might be to use the
"uif" facet method (UnInvertedField was the default method for
multiValued fields in 4.x)
I'm not sure if you would need to reindex without docValues on that
field to try it though.

Example: to enable on the "union" field, add f.union.facet.method=uif

Support for this was added in https://issues.apache.org/jira/browse/SOLR-8466

-Yonik


On Thu, Aug 31, 2017 at 10:41 AM, Günter Hipler
<guenter.hip...@unibas.ch> wrote:
Hi,

in the meantime I came across the reason for the slow facet processing
capacities of SOLR since version 5.x

  https://issues.apache.org/jira/browse/SOLR-8096
https://issues.apache.org/jira/browse/LUCENE-5666

compared to version 4.x

Various library networks across the world are suffering from the same
symptoms:

Facet processing is one of the most important features of a search server
(for us) and it seems (at least IMHO) there is no solution for the issue
since March 2015 (release date for the last SOLR 4 version)

What are the plans / ideas of the solr developers for a possible future
solution? Or maybe there is already a solution I haven't seen so far.

Thanks for a feedback

Günter



On 21.08.2017 15:35, guenterh.li...@bluewin.ch wrote:
Hi,

I can't figure out the reason why the facet processing in version 6 needs
significantly more time compared to version 4.

The debugging response (for 30 million documents)

solr 4
<lst name="process"><double name="time">280.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">280.0</double></lst>
(once the query is cached)
before caching: between 1.5 and 2 sec


solr 6.x (my last try was with 6.6)
without docvalues for facetting fields (same schema as version 4)
<lst name="process"><double name="time">5874.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">5873.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>
the time is not getting better even after repeating the query several
times


solr 6.6 with docvalues for facetting fields
<lst name="process"><double name="time">9837.0</double><lst
name="query"><double name="time">0.0</double></lst><lst name="facet"><double
name="time">9837.0</double></lst><lst name="facet_module"><double
name="time">0.0</double></lst>

used query (our productive system with version 4)

http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&facet=true&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=0&hl.simple.pre={{{{START_HILITE}}}}&facet.limit=100&hl.simple.post={{{{END_HILITE}}}}&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+variant_isbn_isn_mv+issn+localcode+id&pf=title_short^1000&facet.mincount=1&hl.fl=fulltext&&wt=xml&facet.sort=count


Running the queries on smaller indices (8 million docs) the difference is
similar although the absolut figures for processing time are smaller


Any hints why this huge differences?

Günter









--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/


--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/

Reply via email to