[jira] [Comment Edited] (SOLR-13829) RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch

Dennis Gove (Jira) Sat, 12 Oct 2019 14:19:07 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950126#comment-16950126
 ]


Dennis Gove edited comment on SOLR-13829 at 10/12/19 9:18 PM:
--------------------------------------------------------------

{quote}The other way to solve this would be to modify every streaming 
expression (like the sort evaluator) to explicitly do type checking and 
conversions anytime incompatible types are being compared
{quote}
Alternatively that modification could be in the 
[FieldComparator|https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/comp/FieldComparator.java]
 class to keep it centralized and consistent.

The purpose of this conversion to {{long}} was to ensure that values that could 
be represented as scaler values would be. When Evaluators were added there was 
a need to choose a singular datatype to represent all numeric values within 
Evaluators, so that comparison and conversion could be done without needing to 
repeatedly check the value datatypes. The use of BigDecimal was deliberate in 
that it is able to accurately represent all existing datatypes without the 
possibly of loss of precision (this is specifically the reason why 
{{double/Double}} were not used). While {{Evaluators}} are turning longs and 
doubles into {{BigDecimals}} on the way in there was a need to do the inverse 
on the way out - that is the impetus for that conversion code to exist.

I think it's reasonable to expect {{SortStream}} to be able to sort on a field 
containing a mixture of numeric types. It should be able to handle that. What 
this patch does is instead insist that every value be the same type (double).

That said, yes, there was a bug in how the choice was made over which type to 
convert the value to. I worry that this fix potentially adds latent numeric 
precision errors.


was (Author: dpgove):
{quote}The other way to solve this would be to modify every streaming 
expression (like the sort evaluator) to explicitly do type checking and 
conversions anytime incompatible types are being compared
{quote}
Alternatively that modification could be in the 
[FieldComparator|https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/comp/FieldComparator.java]
 class to keep it centralized and consistent.

The purpose of this conversion to {{long}} was to ensure that values that could 
be represented as scaler values would be. When Evaluators were added there was 
a need to choose a singular datatype to represent all numeric values within 
Evaluators, so that comparison and conversion could be done without needing to 
repeatedly check the value datatypes. The use of BigDecimal was deliberate in 
that it is able to accurately represent all existing datatypes without the 
possibly of loss of precision (this is specifically the reason why 
{{double/Double}} were not used. While {{Evaluators}} are turning longs and 
doubles into {{BigDecimals}} on the way in there was a need to do the inverse 
on the way out - that is the impetus for that conversion code to exist.

I think it's reasonable to expect {{SortStream}} to be able to sort on a field 
containing a mixture of numeric types. It should be able to handle that. What 
this patch does is instead insist that every value be the same type (double). 

That said, yes, there was a bug in how the choice was made over which type to 
convert the value to. I worry that this fix potentially adds latent numeric 
precision errors.

> RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing 
> mismatch
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-13829
>                 URL: https://issues.apache.org/jira/browse/SOLR-13829
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Trey Grainger
>            Priority: Major
>             Fix For: 8.3
>
>         Attachments: SOLR-13829.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In trying to use the "sort" streaming evaluator on float field (pfloat), I am 
> getting casting errors back based upon which values are calculated based upon 
> underlying values in a field.
> Example:
> *Docs:* (paste each into "Documents" pane in Solr Admin UI as type:"json")
>  
> {code:java}
> {"id": "1", "name":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}
> {"id": "2", "name":"cheese 
> pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}{code}
>  
> *Streaming Expression:*
>  
> {code:java}
> sort(select(search(food_collection, q="*:*", fl="id,vector_fs", sort="id 
> asc"), cosineSimilarity(vector_fs, array(5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as 
> sim, id), by="sim desc"){code}
>  
> *Response:*
>  
> {code:java}
> { 
>   "result-set": {
>     "docs": [
>       {
>         "EXCEPTION": "class java.lang.Double cannot be cast to class 
> java.lang.Long (java.lang.Double and java.lang.Long are in module java.base 
> of loader 'bootstrap')",
>         "EOF": true,
>         "RESPONSE_TIME": 13
>       }
>     ]
>   }
> }{code}
>  
>  
> This is because in org.apache.solr.client.solrj.io.eval.RecursiveEvaluator, 
> there is a line which examines a numeric (BigDecimal) value and - regardless 
> of the type of the field the value originated from - converts it to a Long if 
> it looks like a whole number. This is the code in question from that class:
> {code:java}
> protected Object normalizeOutputType(Object value) {
>     if(null == value){
>       return null;
>     } else if (value instanceof VectorFunction) {
>       return value;
>     } else if(value instanceof BigDecimal){
>       BigDecimal bd = (BigDecimal)value;
>       if(bd.signum() == 0 || bd.scale() <= 0 || 
> bd.stripTrailingZeros().scale() <= 0){
>         try{
>           return bd.longValueExact();
>         }
>         catch(ArithmeticException e){
>           // value was too big for a long, so use a double which can handle 
> scientific notation
>         }
>       }
>       
>       return bd.doubleValue();
>     }
> ... [other type conversions]
> {code}
> Because of the *return bd.longValueExact()*; line, the calculated value for 
> "sim" in doc 1 is "Float(1)", whereas the calculated value for "sim" for doc 
> 2 is "Double(0.88938313). These are coming back as incompatible data types, 
> even though the source data is all of the same type and should be comparable.
> Thus when the *sort* evaluator streaming expression (and probably others) 
> runs on these calculated values and the list should contain ["0.88938313", 
> "1.0"], an exception is thrown because the it's trying to compare 
> incompatible data types [Double("0.99"), Long(1)].
> This bug is occurring on master currently, but has probably existed in the 
> codebase since at least August 2017.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-13829) RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch

Reply via email to