[GitHub] [lucene] nknize commented on pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

GitBox Mon, 25 Jul 2022 14:25:52 -0700


nknize commented on PR #1017:
URL: https://github.com/apache/lucene/pull/1017#issuecomment-1194649110


   Thanks @iverase.
   
   > Why so much hurry with this change? ...it will be nice to have something 
production ready.
   
   Just a few points here. 
   1. I don't believe the proposed PR is a change. It's a new field that hasn't 
existed before despite Elasticsearch carrying it's own proprietary 
implementation for two years and only now proposing it as the preferred 
approach. I prefer a fresh Lucene focused implementation with input from 
Elasticsearch perspective. (e.g., geometry centroid and bounding box were added 
to help Elasticsearch `geo_centroid` and `geo_boundingbox` aggregations but 
really not needed for the docvalue format).
   2. This PR is a move of progress over perfection. I do feel it's important 
to do the best we can on the first iteration but there will always be needed 
improvements. We have mechanisms in place to enable us to unleash experimental 
features for the purpose of receiving feedback from production. The PR is using 
those mechanisms as intended. No need to iterate to what _we think_ is 
"production ready".
   3. Akin to 2 I'd prefer to avoid waiting another two months+ before the next 
minor release to get this in the wild and start obtaining that feedback and 
iterating. Those iterations will improve with more feedback. If we do feel this 
is cutting it close for 9.3 then I'd prefer merging this PR to main and 9.4 and 
iterate on this code for 9.4. 
   4. I am happy to hear Elasticsearch wanting to contribute their 
implementation but I think it's better to start with a foundation and iterate. 
We can merge any desirable properties from that Elasticsearch field in follow 
ups. I think that will strengthen the field as a whole and agree it's a great 
decision but do not believe it is a requirement before merging the current 
functioning PR. Again, progress over perfection.
   
   > this way of developing this exciting and complex feature is making things 
harder.
   
   Harder for who?
   
   > I would like to propose to initially focus on the data structure and once 
we are happy, we can start integrating the functionality, e.g support for 
queries and so on. 
   
   Query support is already in this PR so I'm not sure what you mean by "start 
integrating the functionality". Regarding the desire to add a visitor access 
pattern that's a nice to have but not a requirement for this PR. I wrote an 
initial rough implementation (because I also thought it would be nice to mimic 
the query visitor pattern) but it's an improvement that is easier to add in a 
follow-up since (as this PR shows) it's not a requirement for supporting the 
queries in the first iteration. I agree with the bounding box improvements 
(which, again, could come later) and will add the centroid fix, but that is 
unrelated to the relation and query logic in this PR which already has parity 
with the BKD index queries and uses the same test scaffolding. 
   
   > Here is our current implementation which can be used as a good starting 
point. 
   
   The current PR already has a starting point so I'm not sure why the proposal 
to scrap and start from the Elasticsearch proprietary implementation (which 
could've been proposed two years ago if parity is the concern). I took a quick 
look at the proposed code and have some differing of opinions on the 
implementation:
   
   
   * I didn't see any explicit relation visitors; so it doesn't look like the 
prototype code includes bounding box relation logic or tests against any BKD 
index queries to ensure functional parity.
   * I didn't look at the details of the serialized format. That shouldn't 
matter so long as the API is the same and results are correct. This PR now 
includes a `VERSION` byte to provide a mechanism to change the format and 
ensure backwards compatibility. This reduces Elasticsearch risk.
   * The PR results here are matching the BKD index queries so I'm confident 
the PR query results are correct. I'll add (or someone can add) the visitor 
access pattern in a follow up. Again, it's not a requirement for Lucene (even 
if it is one for Elasticsearch).
   * This PR isolates all ShapeDocValues logic (e.g., Readers and Writers) as 
private logic to the single pkg private abstract `ShapeDocValues` and only 
exposes field and query instantiation through the public `LatLonShape` and 
`XYShape` access classes. I prefer this approach (which is consistent w/ the 
BKD field API) over the prototype that conversely has that split into disparate 
separate Reader / Writer classes with public abstractions. I think we should 
keep the abstraction and internal API surface area tidy and be thoughtful about 
what's exposed (e.g., only pkg-private `ShapeDocValues` and 
`ShapeDocValueField`. Visitor member class and BaseQuery foundation classes for 
internal extensions, and `LatLonShape` and `XYShape` static factories for 
public access).
   
   I'll add the centroid fixes and bounding box additions to this PR shortly. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] nknize commented on pull request #1017: LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape

Reply via email to