Re: [Solr 6] Migration from Solr 4.10.2

Alessandro Benedetti Wed, 25 May 2016 01:46:08 -0700

I was taking a look into the code again :
org/apache/solr/search/facet/FacetField.java:115 ( branch 6.0 )


if (!multiToken) {
> if (ntype != null) {
> // single valued numeric (docvalues or fieldcache)
> return new FacetFieldProcessorNumeric(fcontext, this, sf);
> } else {
> // single valued string...
> return new FacetFieldProcessorDV(fcontext, this, sf);
> }
> }
> // multi-valued after this point
> if (sf.hasDocValues() || method == FacetMethod.DV) {
> // single and multi-valued string docValues
> return new FacetFieldProcessorDV(fcontext, this, sf);
> }
> // Top-level multi-valued field cache (UIF)
> return new FacetFieldProcessorUIF(fcontext, this, sf);


This part is for the new Json Facet code ( but when you pass the uif method
in legacy facet, we pass to this code mocking the Json ).
According to this code if you have docValues for the field, single valued
or multi Valued you are going to use FacetFieldProcessorDV.
This seems to be the reason I don't see my fieldValueCache populated, I
have both single/multi valued fields now, but all of them have docValues!

On Tue, May 24, 2016 at 9:38 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Alessandro,
>
> I checked with Solr 6.0 distro on techproducts.
> Faceting on cat with uif hits fieldValueCache
>
> http://localhost:8983/solr/techproducts/select?facet.field=cat&facet.method=uif&facet=on&indent=on&q=*:*&wt=json
>
> fieldValueCache
> - class:org.apache.solr.search.FastLRUCache
> - description:Concurrent LRU Cache(maxSize=10000, initialSize=10,
> minSize=9000, acceptableSize=9500, cleanupThread=false)
> - src:
> - version:1.0 stats:
>
>    - cumulative_evictions:0
>    - cumulative_hitratio:0.5
>    - cumulative_hits:1
>    - cumulative_inserts:2
>    - cumulative_lookups:2
>    - evictions:0
>    - hitratio:0.5
>    - hits:1
>    - inserts:2
>    - item_cat:
>
>  
> {field=cat,memSize=4665,tindexSize=46,time=28,phase1=27,nTerms=16,bigTerms=2,termInstances=21,uses=0}
>    - lookups:2
>    - size:1
>
> Beware, for example field manu_exact doesn't hit field value cache, because
> it single valued and goes to FacetFieldProcessorDV instead of
> FacetFieldProcessorUIF.  And cat is multivalued and hits UIF. see
> org.apache.solr.search.facet.FacetField.createFacetProcessor(FacetContext)
> it might need to just debug there.
>
> In summary, uif works and you have a chance to hit it. Goof Luck!
>
> On Tue, May 24, 2016 at 7:43 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Update , it seems clear I incurred in the bad
> > https://issues.apache.org/jira/browse/SOLR-8096 :
> >
> > Just adding some additional information as I just incurred on the issue
> > with Solr 6.0 :
> > Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with
> high
> > cardinality on top of grouping.
> > Groping was not affecting at all.
> >
> > All the symptoms are there, Solr 4.10.2 around 150 ms and Solr 6.0 around
> > 550 ms .
> > The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr
> > 6.0.
> > In Solr 4.10 the 'fieldValueCache' is in heavy use with a
> > cumulative_hitratio of 0.96 .
> > Switching from enum to fc to fcs to uif did not change that much.
> >
> > Moving to DocValues didn't improve that much the situation ( but I was on
> > an optimized index, so I need to try the multi-segmented one according
> > to Mikhail
> > Khludnev
> > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mkhludnev>
> > contribution
> > in Solr 5.4.0 ) .
> >
> > Moving to field collapsing moved down the query to 110-120 ms ( but this
> is
> > normal, we were faceting on 260 /1 million orignal docs)
> > Adding facet.threads=NCores moved down the queryTime to 100 ms, in
> > combination with field collapsing we reached 80-90 ms when warmed.
> >
> > What are the plan for the future related this ?
> > Do we want to deprecate the legacy facets implementation and move
> > everything to Json facets ( like it happened with the UIF ) ?
> > So backward compatible but different implementation ?
> >
> > I think for migrations should be a transparent process.
> >
> >
> > Cheers
> >
> > On Mon, May 23, 2016 at 6:49 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Furthermore I was checking the internals of the old facet
> implementation
> > (
> > > which comes when using the classic request parameter based,  instead of
> > the
> > > json facet). It seems that if you enable docValues even with the enun
> > > method passed as parameter , actually fc with docValues will be used.
> > > i will give some report on the performance we get with docValues.
> > >
> > > Cheers
> > > On 23 May 2016 16:29, "Joel Bernstein" <joels...@gmail.com> wrote:
> > >
> > >> If you can make min/max work for you instead of sort then it should be
> > >> faster, but I haven't spent time comparing the performance.
> > >>
> > >> But if you're using the top_fc with the min/max param the performance
> > >> between Solr 4 & Solr 6 should be very close as the data structures
> > behind
> > >> them are the same.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Mon, May 23, 2016 at 3:34 PM, Alessandro Benedetti <
> > >> abenede...@apache.org
> > >> > wrote:
> > >>
> > >> > Hi Joel,
> > >> > thanks for the reply, actually we were not using field collapsing
> > >> before,
> > >> > we basically want to replace grouping with that.
> > >> > The grouping performance between Solr 4 and 6 are basically
> > comparable.
> > >> > It's surprising I got so big degradation with the field collapsing.
> > >> >
> > >> > So basically the comparison we did were based on the Solr4 queries ,
> > >> > extracted from logs, and modified slightly to include field
> collapsing
> > >> > parameter.
> > >> >
> > >> > To build the tests to compare Solr 4.10.2 to Solr 6 we basically
> > >> proceeded
> > >> > in this way :
> > >> >
> > >> > 1) install Solr 4.10.2 and Solr 6.0.0
> > >> > 2) migrate the index with the related lucene tool ( 4.10.2 -> 5.5.0
> ->
> > >> Solr
> > >> > 6.0 )
> > >> > 3) switch on/off the 2 instances and repeating the tests both with
> > cold
> > >> > instances and warm instances.
> > >> >
> > >> > This means that the query looks the same.
> > >> > I have not double checked the results but only the timings.
> > >> > I will provide additional feedback to see if the query are producing
> > >> > comparable results as well.
> > >> >
> > >> > Related your suggestion about the top_fc, thanks, I will try that .
> > >> > I actually discovered that a little bit after I posted the mailing
> > list
> > >> ( I
> > >> > think exactly from another post of yours :) )
> > >> >
> > >> > Not sure if setting up docValues for the field we use to collapse
> > could
> > >> > give some benefit as well.
> > >> >
> > >> > I keep you updated,
> > >> >
> > >> > Cheers
> > >> >
> > >> > On Mon, May 23, 2016 at 2:48 PM, Joel Bernstein <joels...@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Were you using the sort param or min/max param in Solr 4 to select
> > the
> > >> > > group head? The sort work came later and I'm not sure how it
> > compares
> > >> in
> > >> > > performance to the min/max param.
> > >> > >
> > >> > > Since you are collapsing on a string field you can use the top_fc
> > hint
> > >> > > which will use a top level field cache for the collapse. This is
> > >> faster
> > >> > at
> > >> > > query time then the default which uses MultiDocValue ordinal map.
> > >> > >
> > >> > > The docs cover the top_fc hint.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> > >> > >
> > >> > >
> > >> > >
> > >> > > Joel Bernstein
> > >> > > http://joelsolr.blogspot.com/
> > >> > >
> > >> > > On Mon, May 23, 2016 at 12:14 PM, Alessandro Benedetti <
> > >> > > abenede...@apache.org> wrote:
> > >> > >
> > >> > > > Let's add some additional details guys :
> > >> > > >
> > >> > > > 1) *Faceting*
> > >> > > > Currently the facet method used is "enum" and it runs over 20
> > fields
> > >> > more
> > >> > > > or less.
> > >> > > > Mainly using it on low cardinality fields except one which has a
> > >> > > > cardinality of 1000 terms.
> > >> > > > I am aware of the famous Jira related faceting regression :
> > >> > > > https://issues.apache.org/jira/browse/SOLR-8096 .
> > >> > > >
> > >> > > > Our index is indeed quite static ( we index once per day) and
> the
> > >> > fields
> > >> > > we
> > >> > > > facet on are multi-valued ( by schema definition but not in
> > >> practise) .
> > >> > > > But we use Term Enum as method so i was not expecting to hit the
> > >> > > > regression.
> > >> > > > We currently see  query times which are 30% worse than Solr
> > 4.10.2 .
> > >> > > > Our next experiment will be to enable docValues for all the
> fields
> > >> and
> > >> > > > verify if we get any benefit ( switching the facet method to
> fc) .
> > >> > > > At the moment, switching to json faceting is not an option as we
> > >> would
> > >> > > like
> > >> > > > first to proceed with a transparent migration and then possibly
> > add
> > >> > > > improvements and refactor in the future.
> > >> > > > Following will be to fix the schema to set as multi valued only
> > >> what is
> > >> > > > really multi-valued ( do you know if this can affect ? the wrong
> > >> schema
> > >> > > > definition is enough to mess up the facet performance ? even if
> > then
> > >> > the
> > >> > > > fields are single valued ?)
> > >> > > >
> > >> > > >
> > >> > > > 2) *Field Collapsing*
> > >> > > > Field collapsing performance seems much, much worse, something
> > like
> > >> 200
> > >> > > ms
> > >> > > > ( Solr 4) vs 1800 ms ( Solr 6) .
> > >> > > > This is suprising as I never heard about any regression in field
> > >> > > > collapsing.
> > >> > > > I will investigate a little bit more in details about the
> > internals
> > >> of
> > >> > > the
> > >> > > > field collapsing and why the performance could be so degraded.
> > >> > > > I will also verify if I find any info in the mailing list or
> Jira.
> > >> > > >
> > >> > > > &fq={!collapse field=string_field sort='TrieDoubleField asc'}
> > >> > > >
> > >> > > > let me know if you faced something similar
> > >> > > >
> > >> > > > Cheers
> > >> > > >
> > >> > > > On Fri, May 13, 2016 at 10:41 PM, Alessandro Benedetti <
> > >> > > > abenede...@apache.org> wrote:
> > >> > > >
> > >> > > > > I'm planning a migration from 4.10.2 to 6.0 .
> > >> > > > > Because we generate the index on daily basis from scratch, we
> > >> don't
> > >> > > need
> > >> > > > > to migrate the index but actually only migrate the server
> > >> instances.
> > >> > > > > With my team we were doing some experiments on some dev
> > machines,
> > >> > > > > basically comparing Solr 4.10.2 and Solr 6.0 to check any
> > >> functional
> > >> > > and
> > >> > > > > performance regression in our use cases.
> > >> > > > >
> > >> > > > > After setting up two installation on the same machine (
> > switching
> > >> on
> > >> > > and
> > >> > > > > off each version for doing comparison and experiments) we are
> > >> > > verifying a
> > >> > > > > degradation of the performances with Solr 6.
> > >> > > > >
> > >> > > > > Basically from a queryTime and throughput perspective Solr 6
> is
> > >> not
> > >> > > > > performing as well as Solr 4.10.2 .
> > >> > > > > Still need to start the proper investigations but this appears
> > >> weird
> > >> > to
> > >> > > > me.
> > >> > > > > Will proceed with all the analysis of the case and a deep
> study
> > of
> > >> > our
> > >> > > > > queries ( which anyway are mainly fq , faceting and grouping).
> > >> > > > >
> > >> > > > > Any suggestion in particular to start with ? Has anyone
> > >> experienced a
> > >> > > > > similar migration with similar experience ?
> > >> > > > > I will anyway explore also the mailing list in search for
> > similar
> > >> > > cases.
> > >> > > > >
> > >> > > > > Cheers
> > >> > > > >
> > >> > > > > --
> > >> > > > > --------------------------
> > >> > > > >
> > >> > > > > Benedetti Alessandro
> > >> > > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > > >
> > >> > > > > "Tyger, tyger burning bright
> > >> > > > > In the forests of the night,
> > >> > > > > What immortal hand or eye
> > >> > > > > Could frame thy fearful symmetry?"
> > >> > > > >
> > >> > > > > William Blake - Songs of Experience -1794 England
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > --------------------------
> > >> > > >
> > >> > > > Benedetti Alessandro
> > >> > > > Visiting card : http://about.me/alessandro_benedetti
> > >> > > >
> > >> > > > "Tyger, tyger burning bright
> > >> > > > In the forests of the night,
> > >> > > > What immortal hand or eye
> > >> > > > Could frame thy fearful symmetry?"
> > >> > > >
> > >> > > > William Blake - Songs of Experience -1794 England
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > --------------------------
> > >> >
> > >> > Benedetti Alessandro
> > >> > Visiting card : http://about.me/alessandro_benedetti
> > >> >
> > >> > "Tyger, tyger burning bright
> > >> > In the forests of the night,
> > >> > What immortal hand or eye
> > >> > Could frame thy fearful symmetry?"
> > >> >
> > >> > William Blake - Songs of Experience -1794 England
> > >> >
> > >>
> > >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [Solr 6] Migration from Solr 4.10.2

Reply via email to