Inconsistent debugQuery score with multiplicative boost

2019-01-04 Thread Thomas Aglassinger
Hi!

When debugging a query using multiplicative boost based on the product() 
function I noticed that the score computed in the explain section is correct 
while the score in the actual result is wrong.

As an example here’s a simple query that boosts a field name_text_de 
(containing German product names). The term “Netzteil” boost to 200% and “Sony” 
boosts to 300%. A name that contains both terms would be boosted to 600%. If a 
term does not match, a default pseudo boost of 1 is used (multiplicative 
identity). The params of the responseHeader in the query result are:

"q":"{!boost b=$ymb}(+{!lucene v=$yq})",
"ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
"yq":"*:*",

The parsed query of the ymb parameter translates to:

FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)

For a product that contains both terms, the score in the result and explain 
section correctly yields 6.0:

"name_text_de":"Original Sony Vaio Netzteil",
"score":6.0,

6.0 = product of:
  1.0 = boost
  6.0 = product of:
1.0 = *:*
6.0 = 
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=3.0)

However, for a product with only “Netzteil” in the name, the result score 
wrongly is 1.0 while the explain score correctly is 2.0:

"name_text_de":"GS-Netzteil 20W schwarz",
"score":1.0,

2.0 = product of:
  1.0 = boost
  2.0 = product of:
1.0 = *:*
2.0 = 
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)

(Note: the filter chain splits words on hyphen so the “GS-“ in front of the 
“Netzteil” should not be an issue.)

Here’s the complete filter chain for the text_de field type:














Interestingly if I simplify the query to only boost on “Netzteil”, the score in 
both the result and explain section are correctly 2.0.

I reproduced this with a local Solr 7.5.0 server (no sharding, no replica) on 
Mac OS X 10.14.1.

I found mention of a somewhat similar situation with BooleanQuery, which was 
considered a bug and fixed in 2016: 
https://issues.apache.org/jira/browse/LUCENE-7132

So my questions are:

1. Is there something wrong in my query that prevents the “Netzteil”-only 
product to get a score of 2.0?
2. Shouldn’t the score in the result and the explain section always be the same?

Best regards,
Thomas


Re: Questions for SynonymGraphFilter and WordDelimiterGraphFilter

2019-01-07 Thread Thomas Aglassinger
Hi Wei,

here's a fairly simple field type we currently use in a project that seems to 
do the job with graph synonyms. Maybe this helps as a starting point for you:














As you can see we use the same filters for both indexing and query, so this 
might have some impact on positional queries but so far it seems negligible for 
the short synonyms we use in practice. Also there is no need for the 
FlattenGraphFilter.

The WhitespaceTokenizerFactory ensures that you can define synonyms with 
hyphens like mac-book -> macbook.

Best regards, Thomas.


On 05.01.19, 02:11, "Wei"  wrote:

Hello,

We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and
WordDelimiterFilter have been deprecated. Solr doc recommends to use
SynonymGraphFilter and WordDelimiterGraphFilter instead 
I guess the StopFilter mess up the SynonymGraphFilter output? Not sure
if  it's a solr defect or there is a guideline that StopFilter should
not be put after graph filters.

Thanks in advance for you input.


Thanks,

Wei




Re: Inconsistent debugQuery score with multiplicative boost

2019-01-16 Thread Thomas Aglassinger
On 04.01.19, 09:11, "Thomas Aglassinger"  wrote:

>  When debugging a query using multiplicative boost based on the product() 
> function I noticed that the score computed in the explain section is correct 
> while the score in the actual result is wrong.

We digged into this further and seem to have found the culprit. 

The last working version is Solr 7.2.1. Using git bisect we found out that the 
issue got introduced with LUCENE-8099 (a refactoring). There's two changes that 
break the scoring in different ways:

LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, BoostingQuery
LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with 
FunctionScoreQuery.boostByValue()

Reverting parts of these changes to the previous version based on a deprecated 
class (which LUCENE-8099 clean up) seems to fix the issue.

We created a Solr issue to document our current findings and changes: 
https://issues.apache.org/jira/browse/SOLR-13126

It contains a patch for our experimental fix (which currently is in a rough 
state) and a test case that can reproduce the issue starting with Solr 7.3 up 
to the current master.

A proper fix of course would not revert to deprecated classes again but fix 
whatever went wrong during LUCENE-8099. 

Hopefully someone with a deeper understand of the mechanics behind can take a 
look into this.

Best regards, Thomas.