Peter Davie created SOLR-13838:
----------------------------------
Summary: igain query parser generating invalid output
Key: SOLR-13838
URL: https://issues.apache.org/jira/browse/SOLR-13838
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: query parsers
Affects Versions: 8.2
Environment: The issue is a generic Java defect and therefore will be
independent of the operating system or software platform.
Reporter: Peter Davie
Fix For: 8.3
Attachments: IGainTermsQParserPlugin.java.patch
Investigating the output from the "features()" stream source, terms are being
returned with NaN for the score_f field:
{{{{ "docs": [}}}}
{{{{ {}}}}
{{{{ "featureSet_s": "business",}}}}
{{{{ "score_f": "NaN",}}}}
{{{{ "term_s": "1,011.15",}}}}
{{{{ "idf_d": "-Infinity",}}}}
{{{{ "index_i": 1,}}}}
{{{{ "id": "business_1"}}}}
{{{{ },}}}}
{{{{ {}}}}
{{{{ "featureSet_s": "business",}}}}
{{{{ "score_f": "NaN",}}}}
{{{{ "term_s": "10.3m",}}}}
{{{{ "idf_d": "-Infinity",}}}}
{{{{ "index_i": 2,}}}}
{{{{ "id": "business_2"}}}}
{{{{ },}}}}
{{{{ {}}}}
{{{{ "featureSet_s": "business",}}}}
{{{{ "score_f": "NaN",}}}}
{{{{ "term_s": "01",}}}}
{{{{ "idf_d": "-Infinity",}}}}
{{{{ "index_i": 3,}}}}
{{{{ "id": "business_3"}}}}
{{{{ },...}}}}
Looking into{{ org/apache/solr/search/IGainTermsQParserPlugin.java}}, it seems
that when a term is not included in the positive or negative documents, the
docFreq calculation (docFreq = xc + nc) is 0, which means that subsequent
calculations result in NaN (division by 0).
Attached is a patch which skips terms for which docFreq
is 0 in the finish() method of IGainTermsQParserPlugin and this resolves the
issues with NaN scores in the features() output.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]