Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

Joe Lawson Mon, 06 Jun 2016 16:37:51 -0700

Yeah I thought the scale of the boosts were off as well but got caught up
verifying that the plugin was working. My colleague suggested that it could
be that because small block is a phrase that it would get a higher score in
matching because you basically get a phrase match each time which causes it
to float to the top. You should check out his post about Solr's latest
score engine. It explains the notion of TF*IDF which drives almost all the
theory in information retrieval (aka search).


http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

We were thinking, as you experimented with, that the 0.5 and 2.0 boosts
were no match for the product name and keyword field boosts so that would
influence your search as well.
On Jun 6, 2016 6:03 PM, "MaryJo Sminkey" <mjsmin...@gmail.com> wrote:

> Oh thanks, yeah I did miss that one field which had a parent type with the
> normal synonym filter. However, that's our product SKU field so really
> doesn't even come into play. I verified that none of the other fields have
> a synonym filter set and even removed the productumbertext just to make
> sure it wasn't doing anything. I was still getting the same results, the
> matches with "SBC" in the name are buried under the "small block" matches.
> After thinking over the issue, I realized what the solution was, I just
> needed to set the synonym.originalBoost high enough that it would be higher
> than the boosts provided by the phrase boosting, which is clearly what was
> letting "small block" jump ahead of "sbc". So I bumped that up to 100
> leaving the synonymBoost at 1 and now I'm getting the results I'm looking
> for.
>
> Thanks for the help!
>
> Mary Jo
>
> Sent with MailTrack
> <
> https://mailtrack.io/install?source=signature&lang=en&referral=mjsmin...@gmail.com&idSignature=22
> >
>
> On Mon, Jun 6, 2016 at 4:57 PM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
> > Mary Jo.
> >
> > It appears to be working correctly but you have a very complex query
> going
> > on so it can be confusing. Assuming you are using the queryParser as
> > provided in examples your query would look like "+sbc" when it enters the
> > queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
> > when it came out and then it would enter the normal pipeline and
> everything
> > would be processed as individual tokens.
> >
> > It appears that you have synonyms being processed at query time on the
> > prodnumbertext field. For example when (sbc)^2.0 enters into the normal
> > query stage then have all the qf, pf, ps and tie modifies added so the
> > first one turns into something like
> >
> > "(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 |
> prodname:sbc^10.0
> > | prodnumbertext:sbc^20.0)^2.0"
> >
> > Then the query time synonym expansion on produnumbertext combined with a
> > phrase and default mm being 100% (
> >
> >
> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter
> > )
> > you end up with query being
> >
> > (((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
> > prodnumbertext:block)~2)^20.0
> >
> > The ~2 comes from mm=100% and having the phrase "small block" as a
> synonym.
> > This messes up your results as well as anything in prodnumbertext will
> have
> > to match "sbc block" "sb block" or "small block" which of course is only
> > going to match small block. Check out the section "Multi-work synonyms
> > won't work as phrase queries" in
> > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
> > more info.
> >
> > Advice: make sure on the schema that none of the fields your are running
> > queries against do any complex query operations, especially make sure
> they
> > aren't doing additional synonym resolution against the same file.
> >
> > I think you are getting hit by the MM bug.  Try tuning it way down to
> > something like 0.01% and see how the matches go.
> >
> >
> >
> > On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey <mjsmin...@gmail.com>
> > wrote:
> >
> > > Okay so big thanks for the help with getting the hon_lucene_synonyms
> > plugin
> > > working. That is a big load off to finally have a solution in place for
> > all
> > > our multi-term synonyms. We did find that the information in Step 8
> about
> > > the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser
> > does
> > > not seem to be correct, we only ever get "ExtendedDismaxQParser" but
> the
> > > synonym expansion is definitely working.
> > >
> > > In implementing it though, the one thing I'm still having an issue with
> > is
> > > trying to figure out how I can get results on the original term to
> appear
> > > first in our results and matches on the synonyms lower in the results.
> > The
> > > plugin includes settings for an originalboost and synonymboost, but
> that
> > > doesn't seem to be working along with all the other edismax boosts I'm
> > > doing. We search across a number of fields, each with their own boost
> and
> > > then do phrase searches with boosts as well. My params look like this:
> > >
> > > params["defType"] = 'synonym_edismax';
> > > params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> > > prodnumbertext^20.0';
> > > params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > > params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > > params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> > > params["ps"] = 1;
> > > params["tie"] = 0.1;
> > > params["synonyms"] = true;
> > > params["synonyms.originalBoost"] = 2.0;
> > > params["synonyms.synonymBoost"] = 0.5;
> > >
> > > And here's an example of what the plugin gives me for a search on "sbc"
> > > which includes synonyms for "sb" and "small block".... I don't really
> > know
> > > enough about this to figure out what exactly it's doing but since all
> of
> > > the results I am getting first are ones with "small block" in the name,
> > and
> > > the ones with "sbc" in the prodname field which should be first are
> > buried
> > > about 1000 documents in, I know the originalboost and synonymboost
> aren't
> > > working with all this other stuff. Ideas how to fix this? With the
> normal
> > > synonym filter we just set up copies of the fields that could have
> > synonyms
> > > to use with that filter applied and had a lower boost on those. Not
> sure
> > > how to make it work with this custom query parser though.
> > >
> > > +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc |
> keywords:sbc^2.0
> > |
> > > (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> > > prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 |
> > body:sb^0.5
> > > | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> > > prodnumbertext:small prodnumbertext:sbc)
> > prodnumbertext:block)~2)^20.0)~0.1
> > > ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 |
> > productinfo:small |
> > > keywords:small^2.0 | prodnumbertext:small^20.0)~0.1
> (prodname:block^10.0
> > |
> > > body:block^0.5 | productinfo:block | keywords:block^2.0 |
> > > prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> > > body:"small block"~1^5.0 | keywords:"small block"~1^10.0 |
> > prodname:"small
> > > block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small
> > block"~1^5.0
> > > | keywords:"small block"~1^10.0 | prodname:"small
> > > block"~1^50.0)~0.1))^0.5)) ()
> > >
> > >
> > > Mary Jo
> > >
> >
>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

Reply via email to