Re: prefix facet performance

2017-04-24 Thread Yonik Seeley
In SimpleFacets.getFacetTermEnumCounts, we seek to the first term matching the prefix using the index and then for each term after compare the prefix until it no longer matches. -Yonik On Mon, Apr 24, 2017 at 5:04 AM, alessandro.benedetti wrote: > Thanks Yonik and Maria. > It make sense, if we

Re: prefix facet performance

2017-04-24 Thread alessandro.benedetti
Thanks Yonik and Maria. It make sense, if we reduce the number of terms, term enum becomes a very good solution. @Yonik : do we still check the prefix on the term dictionary one by one, or an FST is used to identify the set of candidate terms ? I will check the code later, Regards - --

Re: prefix facet performance

2017-04-21 Thread Maria Muslea
I see. Once I specify a prefix the number of terms is MUCH smaller. Thank you again for all your help. Maria On Fri, Apr 21, 2017 at 1:46 PM, Yonik Seeley wrote: > On Fri, Apr 21, 2017 at 4:25 PM, Maria Muslea > wrote: > > The field is: > > > > > > > > and using unique() I found that it has

Re: prefix facet performance

2017-04-21 Thread Yonik Seeley
On Fri, Apr 21, 2017 at 4:25 PM, Maria Muslea wrote: > The field is: > > > > and using unique() I found that it has 700K+ unique values. > > The query before (that takes ~10s): > > wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/ > > the query after (that is almost

Re: prefix facet performance

2017-04-21 Thread Maria Muslea
The field is: and using unique() I found that it has 700K+ unique values. The query before (that takes ~10s): wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/ the query after (that is almost instant): wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=conc

Re: prefix facet performance

2017-04-21 Thread alessandro.benedetti
That is quite interesting ! You can use the stats module ( in association with the Json facets if you need it) to calculate an accurate approximation of the unique values [1] [2] . Good to know it improved your scenario, I may need to update my knowledge of term enum internals! Can you describe yo

Re: prefix facet performance

2017-04-21 Thread Maria Muslea
Actually using facet.method=enum made a HUGE difference even in my case where I have many unique values. I am happy with the query response time now. Is there a way in SOLR to count the unique values for a field? If not, I could run the reindexing and count the unique values while I add them to gi

Re: prefix facet performance

2017-04-21 Thread alessandro.benedetti
Hi Maria, If you have 100-500.000 unique values for the field you are interested in, and the cardinality of your search results is actually quite small in comparison, I am not that sure term enum will help you that much ... To simplify, with the term enum approach, you iterate over each unique val

Re: prefix facet performance

2017-04-18 Thread Maria Muslea
Hmmm, not sure. Probably in the range of 100K-500K. Before writing the email I was just looking at: http://yonik.com/facet-performance/ Wow, using facet.method=enum makes a big difference. I will read on it to understand what it does. Thank you so much. Maria On Tue, Apr 18, 2017 at 5:21 PM, Y

Re: prefix facet performance

2017-04-18 Thread Yonik Seeley
How many unique values in the index? You could try facet.method=enum -Yonik On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea wrote: > Hi, > > I have ~40K documents in SOLR (not many) and a multivalued facet field that > contains at least 2K values per document. > > The values of the facet field lo