[
https://issues.apache.org/jira/browse/LUCENE-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030626#comment-17030626
]
Ignacio Vera commented on LUCENE-9154:
--------------------------------------
In a comment above, it has been said that the value on the index is a lat / lon
value and that is not accurate as the value on the index is represented as a
two dimensional integer. These integers represents a *range* of lat / lon
values (all values that are encoded to that integer) which can be decoded to a
single value using GeoEncodingUtils. The value to which is decoded is not the
middle of the range (which I would expect to be the logical point to represent
that range) but the lower value of the range.
I understand now that probably one of the reasons that value was chosen is to
make this logic happy. If I change the decoded value to the middle of the
range, all this logic fails as the implementation relies on how we decode the
values from the index.
> Remove encodeCeil() to encode bounding box queries
> ---------------------------------------------------
>
> Key: LUCENE-9154
> URL: https://issues.apache.org/jira/browse/LUCENE-9154
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We currently have the following logic in LatLonPoint#newBoxquery():
> {code:java}
> // exact double values of lat=90.0D and lon=180.0D must be treated special
> as they are not represented in the encoding
> // and should not drag in extra bogus junk! TODO: should encodeCeil just
> throw ArithmeticException to be less trappy here?
> if (minLatitude == 90.0) {
> // range cannot match as 90.0 can never exist
> return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with
> minLatitude=90.0");
> }
> if (minLongitude == 180.0) {
> if (maxLongitude == 180.0) {
> // range cannot match as 180.0 can never exist
> return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with
> minLongitude=maxLongitude=180.0");
> } else if (maxLongitude < minLongitude) {
> // encodeCeil() with dateline wrapping!
> minLongitude = -180.0;
> }
> }
> byte[] lower = encodeCeil(minLatitude, minLongitude);
> byte[] upper = encode(maxLatitude, maxLongitude);
> {code}
>
> IMO opinion this is confusing and can lead to strange results. For example a
> query with {{minLatitude = minLatitude = 90}} does not match points with
> {{latitude = 90}}. On the other hand a query with {{minLatitude =
> minLatitude}} = 89.99999996}} will match points at latitude = 90.
> I don't really understand the statement that says: {{90.0 can never exist}}
> as this is as well true for values > 89.99999995809048 which is the maximum
> quantize value. In this argument, this will be true for all values between
> quantize coordinates as they do not exist in the index, why 90D is so
> special? I guess because it cannot be ceil up without overflowing the
> encoding.
> Another argument to remove this function is that it opens the room to have
> false negatives in the result of the query. if a query has minLon =
> 89.999999957, it won't match points with longitude = 89.999999957 as it is
> rounded up to 89.99999995809048.
> The only merit I can see in the current approach is that if you only index
> points that are already quantize, then all queries would be exact. But does
> it make sense for someone to only index quantize values and then query by
> non-quantize bounding boxes?
>
> I hope I am missing something, but my proposal is to remove encodeCeil all
> together and remove all the special handling at the positive pole and
> positive dateline.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]