Inconsistent debugQuery score with multiplicative boost
Hi! When debugging a query using multiplicative boost based on the product() function I noticed that the score computed in the explain section is correct while the score in the actual result is wrong. As an example here’s a simple query that boosts a field name_text_de (containing German product names). The term “Netzteil” boost to 200% and “Sony” boosts to 300%. A name that contains both terms would be boosted to 600%. If a term does not match, a default pseudo boost of 1 is used (multiplicative identity). The params of the responseHeader in the query result are: "q":"{!boost b=$ymb}(+{!lucene v=$yq})", "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))", "yq":"*:*", The parsed query of the ymb parameter translates to: FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0) For a product that contains both terms, the score in the result and explain section correctly yields 6.0: "name_text_de":"Original Sony Vaio Netzteil", "score":6.0, 6.0 = product of: 1.0 = boost 6.0 = product of: 1.0 = *:* 6.0 = product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=3.0) However, for a product with only “Netzteil” in the name, the result score wrongly is 1.0 while the explain score correctly is 2.0: "name_text_de":"GS-Netzteil 20W schwarz", "score":1.0, 2.0 = product of: 1.0 = boost 2.0 = product of: 1.0 = *:* 2.0 = product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0) (Note: the filter chain splits words on hyphen so the “GS-“ in front of the “Netzteil” should not be an issue.) Here’s the complete filter chain for the text_de field type: Interestingly if I simplify the query to only boost on “Netzteil”, the score in both the result and explain section are correctly 2.0. I reproduced this with a local Solr 7.5.0 server (no sharding, no replica) on Mac OS X 10.14.1. I found mention of a somewhat similar situation with BooleanQuery, which was considered a bug and fixed in 2016: https://issues.apache.org/jira/browse/LUCENE-7132 So my questions are: 1. Is there something wrong in my query that prevents the “Netzteil”-only product to get a score of 2.0? 2. Shouldn’t the score in the result and the explain section always be the same? Best regards, Thomas
Re: Questions for SynonymGraphFilter and WordDelimiterGraphFilter
Hi Wei, here's a fairly simple field type we currently use in a project that seems to do the job with graph synonyms. Maybe this helps as a starting point for you: As you can see we use the same filters for both indexing and query, so this might have some impact on positional queries but so far it seems negligible for the short synonyms we use in practice. Also there is no need for the FlattenGraphFilter. The WhitespaceTokenizerFactory ensures that you can define synonyms with hyphens like mac-book -> macbook. Best regards, Thomas. On 05.01.19, 02:11, "Wei" wrote: Hello, We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and WordDelimiterFilter have been deprecated. Solr doc recommends to use SynonymGraphFilter and WordDelimiterGraphFilter instead I guess the StopFilter mess up the SynonymGraphFilter output? Not sure if it's a solr defect or there is a guideline that StopFilter should not be put after graph filters. Thanks in advance for you input. Thanks, Wei
Re: Inconsistent debugQuery score with multiplicative boost
On 04.01.19, 09:11, "Thomas Aglassinger" wrote: > When debugging a query using multiplicative boost based on the product() > function I noticed that the score computed in the explain section is correct > while the score in the actual result is wrong. We digged into this further and seem to have found the culprit. The last working version is Solr 7.2.1. Using git bisect we found out that the issue got introduced with LUCENE-8099 (a refactoring). There's two changes that break the scoring in different ways: LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, BoostingQuery LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with FunctionScoreQuery.boostByValue() Reverting parts of these changes to the previous version based on a deprecated class (which LUCENE-8099 clean up) seems to fix the issue. We created a Solr issue to document our current findings and changes: https://issues.apache.org/jira/browse/SOLR-13126 It contains a patch for our experimental fix (which currently is in a rough state) and a test case that can reproduce the issue starting with Solr 7.3 up to the current master. A proper fix of course would not revert to deprecated classes again but fix whatever went wrong during LUCENE-8099. Hopefully someone with a deeper understand of the mechanics behind can take a look into this. Best regards, Thomas.