Ignacio Vera created LUCENE-9154:
------------------------------------

             Summary: Remove encodeCeil()  to encode bounding box queries
                 Key: LUCENE-9154
                 URL: https://issues.apache.org/jira/browse/LUCENE-9154
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Ignacio Vera


We currently have the following logic in LatLonPoint#newBoxquery():
{code:java}
 // exact double values of lat=90.0D and lon=180.0D must be treated special as 
they are not represented in the encoding
// and should not drag in extra bogus junk! TODO: should encodeCeil just throw 
ArithmeticException to be less trappy here?
if (minLatitude == 90.0) {
  // range cannot match as 90.0 can never exist
  return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with minLatitude=90.0");
}
if (minLongitude == 180.0) {
  if (maxLongitude == 180.0) {
    // range cannot match as 180.0 can never exist
    return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with 
minLongitude=maxLongitude=180.0");
  } else if (maxLongitude < minLongitude) {
    // encodeCeil() with dateline wrapping!
    minLongitude = -180.0;
  }
}
byte[] lower = encodeCeil(minLatitude, minLongitude);
byte[] upper = encode(maxLatitude, maxLongitude);
{code}
 

IMO opinion this is confusing and can lead to strange results. For example a 
query with {{minLatitude = minLatitude = 90}} does not match points with 
{{latitude = 90}}. On the other hand a query with {{minLatitude = 
}}{{minLatitude}}{{ = 179.999999}} will match points at latitude = 90.

I don't really understand the statement that says: {{90.0 can never exist}} as 
this is as well true for values > 179.9999846611172 which is the maximum 
quantize value. In this argument, this will be true for all values between 
quantize coordinates as they do not exist in the index, why 90D is so special? 
I guess because it cannot be ceil up without overflowing the encoding.

Another argument to remove this function is that it opens the room to have 
false negatives in the result of the query. if a query has minLon = 
179.9999846611171, it won't match points with longitude = 179.9999846611171 as 
it is rounded up to 179.9999846611172.

The only merit I can see in the current approach is that if you only index 
points that are already quantize, then all queries would be exact. But does it 
make sense for someone to only index quantize values and then query by 
non-quantize bounding boxes?

 

I hope I am missing something, but my proposal is to remove encodeCeil all 
together and remove all the special handling at the positive pole and positive 
dateline.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to