I think we found our performance killer here : https://issues.apache.org/jira/browse/SOLR-9176
Basically we were thinking to use Term Enum, but actually under the hood Solr forces you to use FCS with single valued numeric fields. In Solr 4 was not like that. I checked the commit related , and it is not functionally equivalent and no message in there related. Let's continue the discussion in Jira. On Wed, May 25, 2016 at 9:45 AM, Alessandro Benedetti <abenede...@apache.org > wrote: > I was taking a look into the code again : > org/apache/solr/search/facet/FacetField.java:115 ( branch 6.0 ) > > if (!multiToken) { >> if (ntype != null) { >> // single valued numeric (docvalues or fieldcache) >> return new FacetFieldProcessorNumeric(fcontext, this, sf); >> } else { >> // single valued string... >> return new FacetFieldProcessorDV(fcontext, this, sf); >> } >> } >> // multi-valued after this point >> if (sf.hasDocValues() || method == FacetMethod.DV) { >> // single and multi-valued string docValues >> return new FacetFieldProcessorDV(fcontext, this, sf); >> } >> // Top-level multi-valued field cache (UIF) >> return new FacetFieldProcessorUIF(fcontext, this, sf); > > > This part is for the new Json Facet code ( but when you pass the uif > method in legacy facet, we pass to this code mocking the Json ). > According to this code if you have docValues for the field, single valued > or multi Valued you are going to use FacetFieldProcessorDV. > This seems to be the reason I don't see my fieldValueCache populated, I > have both single/multi valued fields now, but all of them have docValues! > > On Tue, May 24, 2016 at 9:38 PM, Mikhail Khludnev < > mkhlud...@griddynamics.com> wrote: > >> Alessandro, >> >> I checked with Solr 6.0 distro on techproducts. >> Faceting on cat with uif hits fieldValueCache >> >> http://localhost:8983/solr/techproducts/select?facet.field=cat&facet.method=uif&facet=on&indent=on&q=*:*&wt=json >> >> fieldValueCache >> - class:org.apache.solr.search.FastLRUCache >> - description:Concurrent LRU Cache(maxSize=10000, initialSize=10, >> minSize=9000, acceptableSize=9500, cleanupThread=false) >> - src: >> - version:1.0 stats: >> >> - cumulative_evictions:0 >> - cumulative_hitratio:0.5 >> - cumulative_hits:1 >> - cumulative_inserts:2 >> - cumulative_lookups:2 >> - evictions:0 >> - hitratio:0.5 >> - hits:1 >> - inserts:2 >> - item_cat: >> >> >> {field=cat,memSize=4665,tindexSize=46,time=28,phase1=27,nTerms=16,bigTerms=2,termInstances=21,uses=0} >> - lookups:2 >> - size:1 >> >> Beware, for example field manu_exact doesn't hit field value cache, >> because >> it single valued and goes to FacetFieldProcessorDV instead of >> FacetFieldProcessorUIF. And cat is multivalued and hits UIF. see >> org.apache.solr.search.facet.FacetField.createFacetProcessor(FacetContext) >> it might need to just debug there. >> >> In summary, uif works and you have a chance to hit it. Goof Luck! >> >> On Tue, May 24, 2016 at 7:43 PM, Alessandro Benedetti < >> benedetti.ale...@gmail.com> wrote: >> >> > Update , it seems clear I incurred in the bad >> > https://issues.apache.org/jira/browse/SOLR-8096 : >> > >> > Just adding some additional information as I just incurred on the issue >> > with Solr 6.0 : >> > Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with >> high >> > cardinality on top of grouping. >> > Groping was not affecting at all. >> > >> > All the symptoms are there, Solr 4.10.2 around 150 ms and Solr 6.0 >> around >> > 550 ms . >> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in >> Solr >> > 6.0. >> > In Solr 4.10 the 'fieldValueCache' is in heavy use with a >> > cumulative_hitratio of 0.96 . >> > Switching from enum to fc to fcs to uif did not change that much. >> > >> > Moving to DocValues didn't improve that much the situation ( but I was >> on >> > an optimized index, so I need to try the multi-segmented one according >> > to Mikhail >> > Khludnev >> > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mkhludnev> >> > contribution >> > in Solr 5.4.0 ) . >> > >> > Moving to field collapsing moved down the query to 110-120 ms ( but >> this is >> > normal, we were faceting on 260 /1 million orignal docs) >> > Adding facet.threads=NCores moved down the queryTime to 100 ms, in >> > combination with field collapsing we reached 80-90 ms when warmed. >> > >> > What are the plan for the future related this ? >> > Do we want to deprecate the legacy facets implementation and move >> > everything to Json facets ( like it happened with the UIF ) ? >> > So backward compatible but different implementation ? >> > >> > I think for migrations should be a transparent process. >> > >> > >> > Cheers >> > >> > On Mon, May 23, 2016 at 6:49 PM, Alessandro Benedetti < >> > benedetti.ale...@gmail.com> wrote: >> > >> > > Furthermore I was checking the internals of the old facet >> implementation >> > ( >> > > which comes when using the classic request parameter based, instead >> of >> > the >> > > json facet). It seems that if you enable docValues even with the enun >> > > method passed as parameter , actually fc with docValues will be used. >> > > i will give some report on the performance we get with docValues. >> > > >> > > Cheers >> > > On 23 May 2016 16:29, "Joel Bernstein" <joels...@gmail.com> wrote: >> > > >> > >> If you can make min/max work for you instead of sort then it should >> be >> > >> faster, but I haven't spent time comparing the performance. >> > >> >> > >> But if you're using the top_fc with the min/max param the performance >> > >> between Solr 4 & Solr 6 should be very close as the data structures >> > behind >> > >> them are the same. >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> Joel Bernstein >> > >> http://joelsolr.blogspot.com/ >> > >> >> > >> On Mon, May 23, 2016 at 3:34 PM, Alessandro Benedetti < >> > >> abenede...@apache.org >> > >> > wrote: >> > >> >> > >> > Hi Joel, >> > >> > thanks for the reply, actually we were not using field collapsing >> > >> before, >> > >> > we basically want to replace grouping with that. >> > >> > The grouping performance between Solr 4 and 6 are basically >> > comparable. >> > >> > It's surprising I got so big degradation with the field collapsing. >> > >> > >> > >> > So basically the comparison we did were based on the Solr4 queries >> , >> > >> > extracted from logs, and modified slightly to include field >> collapsing >> > >> > parameter. >> > >> > >> > >> > To build the tests to compare Solr 4.10.2 to Solr 6 we basically >> > >> proceeded >> > >> > in this way : >> > >> > >> > >> > 1) install Solr 4.10.2 and Solr 6.0.0 >> > >> > 2) migrate the index with the related lucene tool ( 4.10.2 -> >> 5.5.0 -> >> > >> Solr >> > >> > 6.0 ) >> > >> > 3) switch on/off the 2 instances and repeating the tests both with >> > cold >> > >> > instances and warm instances. >> > >> > >> > >> > This means that the query looks the same. >> > >> > I have not double checked the results but only the timings. >> > >> > I will provide additional feedback to see if the query are >> producing >> > >> > comparable results as well. >> > >> > >> > >> > Related your suggestion about the top_fc, thanks, I will try that . >> > >> > I actually discovered that a little bit after I posted the mailing >> > list >> > >> ( I >> > >> > think exactly from another post of yours :) ) >> > >> > >> > >> > Not sure if setting up docValues for the field we use to collapse >> > could >> > >> > give some benefit as well. >> > >> > >> > >> > I keep you updated, >> > >> > >> > >> > Cheers >> > >> > >> > >> > On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein < >> joels...@gmail.com> >> > >> > wrote: >> > >> > >> > >> > > Were you using the sort param or min/max param in Solr 4 to >> select >> > the >> > >> > > group head? The sort work came later and I'm not sure how it >> > compares >> > >> in >> > >> > > performance to the min/max param. >> > >> > > >> > >> > > Since you are collapsing on a string field you can use the top_fc >> > hint >> > >> > > which will use a top level field cache for the collapse. This is >> > >> faster >> > >> > at >> > >> > > query time then the default which uses MultiDocValue ordinal map. >> > >> > > >> > >> > > The docs cover the top_fc hint. >> > >> > > >> > >> > > >> > >> > >> > >> >> > >> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results >> > >> > > >> > >> > > >> > >> > > >> > >> > > Joel Bernstein >> > >> > > http://joelsolr.blogspot.com/ >> > >> > > >> > >> > > On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti < >> > >> > > abenede...@apache.org> wrote: >> > >> > > >> > >> > > > Let's add some additional details guys : >> > >> > > > >> > >> > > > 1) *Faceting* >> > >> > > > Currently the facet method used is "enum" and it runs over 20 >> > fields >> > >> > more >> > >> > > > or less. >> > >> > > > Mainly using it on low cardinality fields except one which has >> a >> > >> > > > cardinality of 1000 terms. >> > >> > > > I am aware of the famous Jira related faceting regression : >> > >> > > > https://issues.apache.org/jira/browse/SOLR-8096 . >> > >> > > > >> > >> > > > Our index is indeed quite static ( we index once per day) and >> the >> > >> > fields >> > >> > > we >> > >> > > > facet on are multi-valued ( by schema definition but not in >> > >> practise) . >> > >> > > > But we use Term Enum as method so i was not expecting to hit >> the >> > >> > > > regression. >> > >> > > > We currently see query times which are 30% worse than Solr >> > 4.10.2 . >> > >> > > > Our next experiment will be to enable docValues for all the >> fields >> > >> and >> > >> > > > verify if we get any benefit ( switching the facet method to >> fc) . >> > >> > > > At the moment, switching to json faceting is not an option as >> we >> > >> would >> > >> > > like >> > >> > > > first to proceed with a transparent migration and then possibly >> > add >> > >> > > > improvements and refactor in the future. >> > >> > > > Following will be to fix the schema to set as multi valued only >> > >> what is >> > >> > > > really multi-valued ( do you know if this can affect ? the >> wrong >> > >> schema >> > >> > > > definition is enough to mess up the facet performance ? even if >> > then >> > >> > the >> > >> > > > fields are single valued ?) >> > >> > > > >> > >> > > > >> > >> > > > 2) *Field Collapsing* >> > >> > > > Field collapsing performance seems much, much worse, something >> > like >> > >> 200 >> > >> > > ms >> > >> > > > ( Solr 4) vs 1800 ms ( Solr 6) . >> > >> > > > This is suprising as I never heard about any regression in >> field >> > >> > > > collapsing. >> > >> > > > I will investigate a little bit more in details about the >> > internals >> > >> of >> > >> > > the >> > >> > > > field collapsing and why the performance could be so degraded. >> > >> > > > I will also verify if I find any info in the mailing list or >> Jira. >> > >> > > > >> > >> > > > &fq={!collapse field=string_field sort='TrieDoubleField asc'} >> > >> > > > >> > >> > > > let me know if you faced something similar >> > >> > > > >> > >> > > > Cheers >> > >> > > > >> > >> > > > On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti < >> > >> > > > abenede...@apache.org> wrote: >> > >> > > > >> > >> > > > > I'm planning a migration from 4.10.2 to 6.0 . >> > >> > > > > Because we generate the index on daily basis from scratch, we >> > >> don't >> > >> > > need >> > >> > > > > to migrate the index but actually only migrate the server >> > >> instances. >> > >> > > > > With my team we were doing some experiments on some dev >> > machines, >> > >> > > > > basically comparing Solr 4.10.2 and Solr 6.0 to check any >> > >> functional >> > >> > > and >> > >> > > > > performance regression in our use cases. >> > >> > > > > >> > >> > > > > After setting up two installation on the same machine ( >> > switching >> > >> on >> > >> > > and >> > >> > > > > off each version for doing comparison and experiments) we are >> > >> > > verifying a >> > >> > > > > degradation of the performances with Solr 6. >> > >> > > > > >> > >> > > > > Basically from a queryTime and throughput perspective Solr 6 >> is >> > >> not >> > >> > > > > performing as well as Solr 4.10.2 . >> > >> > > > > Still need to start the proper investigations but this >> appears >> > >> weird >> > >> > to >> > >> > > > me. >> > >> > > > > Will proceed with all the analysis of the case and a deep >> study >> > of >> > >> > our >> > >> > > > > queries ( which anyway are mainly fq , faceting and >> grouping). >> > >> > > > > >> > >> > > > > Any suggestion in particular to start with ? Has anyone >> > >> experienced a >> > >> > > > > similar migration with similar experience ? >> > >> > > > > I will anyway explore also the mailing list in search for >> > similar >> > >> > > cases. >> > >> > > > > >> > >> > > > > Cheers >> > >> > > > > >> > >> > > > > -- >> > >> > > > > -------------------------- >> > >> > > > > >> > >> > > > > Benedetti Alessandro >> > >> > > > > Visiting card : http://about.me/alessandro_benedetti >> > >> > > > > >> > >> > > > > "Tyger, tyger burning bright >> > >> > > > > In the forests of the night, >> > >> > > > > What immortal hand or eye >> > >> > > > > Could frame thy fearful symmetry?" >> > >> > > > > >> > >> > > > > William Blake - Songs of Experience -1794 England >> > >> > > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > -- >> > >> > > > -------------------------- >> > >> > > > >> > >> > > > Benedetti Alessandro >> > >> > > > Visiting card : http://about.me/alessandro_benedetti >> > >> > > > >> > >> > > > "Tyger, tyger burning bright >> > >> > > > In the forests of the night, >> > >> > > > What immortal hand or eye >> > >> > > > Could frame thy fearful symmetry?" >> > >> > > > >> > >> > > > William Blake - Songs of Experience -1794 England >> > >> > > > >> > >> > > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > -------------------------- >> > >> > >> > >> > Benedetti Alessandro >> > >> > Visiting card : http://about.me/alessandro_benedetti >> > >> > >> > >> > "Tyger, tyger burning bright >> > >> > In the forests of the night, >> > >> > What immortal hand or eye >> > >> > Could frame thy fearful symmetry?" >> > >> > >> > >> > William Blake - Songs of Experience -1794 England >> > >> > >> > >> >> > > >> > >> > >> > -- >> > -------------------------- >> > >> > Benedetti Alessandro >> > Visiting card - http://about.me/alessandro_benedetti >> > Blog - http://alexbenedetti.blogspot.co.uk >> > >> > "Tyger, tyger burning bright >> > In the forests of the night, >> > What immortal hand or eye >> > Could frame thy fearful symmetry?" >> > >> > William Blake - Songs of Experience -1794 England >> > >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> Principal Engineer, >> Grid Dynamics >> >> <http://www.griddynamics.com> >> <mkhlud...@griddynamics.com> >> > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England