Mikhail, you have been really helpful! On Tue, May 24, 2016 at 9:38 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote:
> Alessandro, > > I checked with Solr 6.0 distro on techproducts. > Faceting on cat with uif hits fieldValueCache > > http://localhost:8983/solr/techproducts/select?facet.field=cat&facet.method=uif&facet=on&indent=on&q=*:*&wt=json > > fieldValueCache > - class:org.apache.solr.search.FastLRUCache > - description:Concurrent LRU Cache(maxSize=10000, initialSize=10, > minSize=9000, acceptableSize=9500, cleanupThread=false) > - src: > - version:1.0 stats: > > - cumulative_evictions:0 > - cumulative_hitratio:0.5 > - cumulative_hits:1 > - cumulative_inserts:2 > - cumulative_lookups:2 > - evictions:0 > - hitratio:0.5 > - hits:1 > - inserts:2 > - item_cat: > > > {field=cat,memSize=4665,tindexSize=46,time=28,phase1=27,nTerms=16,bigTerms=2,termInstances=21,uses=0} > - lookups:2 > - size:1 > > Beware, for example field manu_exact doesn't hit field value cache, because > it single valued and goes to FacetFieldProcessorDV instead of > FacetFieldProcessorUIF. And cat is multivalued and hits UIF. It does completely make sense ! I think the query I was debugging today was containing only single valued fields. On the other hand the Solr 4.10.2 version I was testing was with a schema with the same fields but set multi-valued. It seems to me that proceeding with UIF seems the most reasonable approach in my case, as it will automatically redirect to the proper method depending on multi-value/single value. Today I was mainly testing with FCS ( but I optimised the index in my experiments so basically FCS =FC ). Tomorrow I will try on a fresh index not optimised. I have 3 additional questions: 1) Let's assume we set DocValues for the fields involved . If some field is misconfigured, set multivalued in the schema but actually single valued, according to the code we are going to hit UIF. This is going to cause un-necessary usage of the FieldValueCache and slowness in comparison with the DV approach that was the correct algorithm to apply ? 2) thanks to the facet.thread I got a huge benefit on a single query with FC. Am I expecting to see even more benefit if I have a segmented index ? ( today I was playing with an optimised one). 3) In my experiments today, in Solr 4.10.2 I was getting better results with the enum approach ( the overall cardinality of the fields involved was pretty low). Using the enum approach in Solr 6 with no-DocValues was worst in comparison to Solr 4 ( we know that with the legacy facet approach, if you set docValues and the field is multi-valued we redirect always to DV). This bit seems a little bit unrelated the well known bug, as according to my knowledge the enum approach should make a massive usage of the filterCache, but the fieldValueCache should not be involved. Do you know why the termEnum approach has been involved in the regression in the recents Solr ? Thank you very much again! see > org.apache.solr.search.facet.FacetField.createFacetProcessor(FacetContext) > it might need to just debug there. > > In summary, uif works and you have a chance to hit it. Goof Luck! > > On Tue, May 24, 2016 at 7:43 PM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > Update , it seems clear I incurred in the bad > > https://issues.apache.org/jira/browse/SOLR-8096 : > > > > Just adding some additional information as I just incurred on the issue > > with Solr 6.0 : > > Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with > high > > cardinality on top of grouping. > > Groping was not affecting at all. > > > > All the symptoms are there, Solr 4.10.2 around 150 ms and Solr 6.0 around > > 550 ms . > > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr > > 6.0. > > In Solr 4.10 the 'fieldValueCache' is in heavy use with a > > cumulative_hitratio of 0.96 . > > Switching from enum to fc to fcs to uif did not change that much. > > > > Moving to DocValues didn't improve that much the situation ( but I was on > > an optimized index, so I need to try the multi-segmented one according > > to Mikhail > > Khludnev > > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mkhludnev> > > contribution > > in Solr 5.4.0 ) . > > > > Moving to field collapsing moved down the query to 110-120 ms ( but this > is > > normal, we were faceting on 260 /1 million orignal docs) > > Adding facet.threads=NCores moved down the queryTime to 100 ms, in > > combination with field collapsing we reached 80-90 ms when warmed. > > > > What are the plan for the future related this ? > > Do we want to deprecate the legacy facets implementation and move > > everything to Json facets ( like it happened with the UIF ) ? > > So backward compatible but different implementation ? > > > > I think for migrations should be a transparent process. > > > > > > Cheers > > > > On Mon, May 23, 2016 at 6:49 PM, Alessandro Benedetti < > > benedetti.ale...@gmail.com> wrote: > > > > > Furthermore I was checking the internals of the old facet > implementation > > ( > > > which comes when using the classic request parameter based, instead of > > the > > > json facet). It seems that if you enable docValues even with the enun > > > method passed as parameter , actually fc with docValues will be used. > > > i will give some report on the performance we get with docValues. > > > > > > Cheers > > > On 23 May 2016 16:29, "Joel Bernstein" <joels...@gmail.com> wrote: > > > > > >> If you can make min/max work for you instead of sort then it should be > > >> faster, but I haven't spent time comparing the performance. > > >> > > >> But if you're using the top_fc with the min/max param the performance > > >> between Solr 4 & Solr 6 should be very close as the data structures > > behind > > >> them are the same. > > >> > > >> > > >> > > >> > > >> > > >> > > >> Joel Bernstein > > >> http://joelsolr.blogspot.com/ > > >> > > >> On Mon, May 23, 2016 at 3:34 PM, Alessandro Benedetti < > > >> abenede...@apache.org > > >> > wrote: > > >> > > >> > Hi Joel, > > >> > thanks for the reply, actually we were not using field collapsing > > >> before, > > >> > we basically want to replace grouping with that. > > >> > The grouping performance between Solr 4 and 6 are basically > > comparable. > > >> > It's surprising I got so big degradation with the field collapsing. > > >> > > > >> > So basically the comparison we did were based on the Solr4 queries , > > >> > extracted from logs, and modified slightly to include field > collapsing > > >> > parameter. > > >> > > > >> > To build the tests to compare Solr 4.10.2 to Solr 6 we basically > > >> proceeded > > >> > in this way : > > >> > > > >> > 1) install Solr 4.10.2 and Solr 6.0.0 > > >> > 2) migrate the index with the related lucene tool ( 4.10.2 -> 5.5.0 > -> > > >> Solr > > >> > 6.0 ) > > >> > 3) switch on/off the 2 instances and repeating the tests both with > > cold > > >> > instances and warm instances. > > >> > > > >> > This means that the query looks the same. > > >> > I have not double checked the results but only the timings. > > >> > I will provide additional feedback to see if the query are producing > > >> > comparable results as well. > > >> > > > >> > Related your suggestion about the top_fc, thanks, I will try that . > > >> > I actually discovered that a little bit after I posted the mailing > > list > > >> ( I > > >> > think exactly from another post of yours :) ) > > >> > > > >> > Not sure if setting up docValues for the field we use to collapse > > could > > >> > give some benefit as well. > > >> > > > >> > I keep you updated, > > >> > > > >> > Cheers > > >> > > > >> > On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein <joels...@gmail.com > > > > >> > wrote: > > >> > > > >> > > Were you using the sort param or min/max param in Solr 4 to select > > the > > >> > > group head? The sort work came later and I'm not sure how it > > compares > > >> in > > >> > > performance to the min/max param. > > >> > > > > >> > > Since you are collapsing on a string field you can use the top_fc > > hint > > >> > > which will use a top level field cache for the collapse. This is > > >> faster > > >> > at > > >> > > query time then the default which uses MultiDocValue ordinal map. > > >> > > > > >> > > The docs cover the top_fc hint. > > >> > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results > > >> > > > > >> > > > > >> > > > > >> > > Joel Bernstein > > >> > > http://joelsolr.blogspot.com/ > > >> > > > > >> > > On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti < > > >> > > abenede...@apache.org> wrote: > > >> > > > > >> > > > Let's add some additional details guys : > > >> > > > > > >> > > > 1) *Faceting* > > >> > > > Currently the facet method used is "enum" and it runs over 20 > > fields > > >> > more > > >> > > > or less. > > >> > > > Mainly using it on low cardinality fields except one which has a > > >> > > > cardinality of 1000 terms. > > >> > > > I am aware of the famous Jira related faceting regression : > > >> > > > https://issues.apache.org/jira/browse/SOLR-8096 . > > >> > > > > > >> > > > Our index is indeed quite static ( we index once per day) and > the > > >> > fields > > >> > > we > > >> > > > facet on are multi-valued ( by schema definition but not in > > >> practise) . > > >> > > > But we use Term Enum as method so i was not expecting to hit the > > >> > > > regression. > > >> > > > We currently see query times which are 30% worse than Solr > > 4.10.2 . > > >> > > > Our next experiment will be to enable docValues for all the > fields > > >> and > > >> > > > verify if we get any benefit ( switching the facet method to > fc) . > > >> > > > At the moment, switching to json faceting is not an option as we > > >> would > > >> > > like > > >> > > > first to proceed with a transparent migration and then possibly > > add > > >> > > > improvements and refactor in the future. > > >> > > > Following will be to fix the schema to set as multi valued only > > >> what is > > >> > > > really multi-valued ( do you know if this can affect ? the wrong > > >> schema > > >> > > > definition is enough to mess up the facet performance ? even if > > then > > >> > the > > >> > > > fields are single valued ?) > > >> > > > > > >> > > > > > >> > > > 2) *Field Collapsing* > > >> > > > Field collapsing performance seems much, much worse, something > > like > > >> 200 > > >> > > ms > > >> > > > ( Solr 4) vs 1800 ms ( Solr 6) . > > >> > > > This is suprising as I never heard about any regression in field > > >> > > > collapsing. > > >> > > > I will investigate a little bit more in details about the > > internals > > >> of > > >> > > the > > >> > > > field collapsing and why the performance could be so degraded. > > >> > > > I will also verify if I find any info in the mailing list or > Jira. > > >> > > > > > >> > > > &fq={!collapse field=string_field sort='TrieDoubleField asc'} > > >> > > > > > >> > > > let me know if you faced something similar > > >> > > > > > >> > > > Cheers > > >> > > > > > >> > > > On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti < > > >> > > > abenede...@apache.org> wrote: > > >> > > > > > >> > > > > I'm planning a migration from 4.10.2 to 6.0 . > > >> > > > > Because we generate the index on daily basis from scratch, we > > >> don't > > >> > > need > > >> > > > > to migrate the index but actually only migrate the server > > >> instances. > > >> > > > > With my team we were doing some experiments on some dev > > machines, > > >> > > > > basically comparing Solr 4.10.2 and Solr 6.0 to check any > > >> functional > > >> > > and > > >> > > > > performance regression in our use cases. > > >> > > > > > > >> > > > > After setting up two installation on the same machine ( > > switching > > >> on > > >> > > and > > >> > > > > off each version for doing comparison and experiments) we are > > >> > > verifying a > > >> > > > > degradation of the performances with Solr 6. > > >> > > > > > > >> > > > > Basically from a queryTime and throughput perspective Solr 6 > is > > >> not > > >> > > > > performing as well as Solr 4.10.2 . > > >> > > > > Still need to start the proper investigations but this appears > > >> weird > > >> > to > > >> > > > me. > > >> > > > > Will proceed with all the analysis of the case and a deep > study > > of > > >> > our > > >> > > > > queries ( which anyway are mainly fq , faceting and grouping). > > >> > > > > > > >> > > > > Any suggestion in particular to start with ? Has anyone > > >> experienced a > > >> > > > > similar migration with similar experience ? > > >> > > > > I will anyway explore also the mailing list in search for > > similar > > >> > > cases. > > >> > > > > > > >> > > > > Cheers > > >> > > > > > > >> > > > > -- > > >> > > > > -------------------------- > > >> > > > > > > >> > > > > Benedetti Alessandro > > >> > > > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > > > >> > > > > "Tyger, tyger burning bright > > >> > > > > In the forests of the night, > > >> > > > > What immortal hand or eye > > >> > > > > Could frame thy fearful symmetry?" > > >> > > > > > > >> > > > > William Blake - Songs of Experience -1794 England > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > -------------------------- > > >> > > > > > >> > > > Benedetti Alessandro > > >> > > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > > >> > > > "Tyger, tyger burning bright > > >> > > > In the forests of the night, > > >> > > > What immortal hand or eye > > >> > > > Could frame thy fearful symmetry?" > > >> > > > > > >> > > > William Blake - Songs of Experience -1794 England > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > -------------------------- > > >> > > > >> > Benedetti Alessandro > > >> > Visiting card : http://about.me/alessandro_benedetti > > >> > > > >> > "Tyger, tyger burning bright > > >> > In the forests of the night, > > >> > What immortal hand or eye > > >> > Could frame thy fearful symmetry?" > > >> > > > >> > William Blake - Songs of Experience -1794 England > > >> > > > >> > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com> > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England