nknize commented on PR #1017: URL: https://github.com/apache/lucene/pull/1017#issuecomment-1194649110
Thanks @iverase. > Why so much hurry with this change? ...it will be nice to have something production ready. Just a few points here. 1. I don't believe the proposed PR is a change. It's a new field that hasn't existed before despite Elasticsearch carrying it's own proprietary implementation for two years and only now proposing it as the preferred approach. I prefer a fresh Lucene focused implementation with input from Elasticsearch perspective. (e.g., geometry centroid and bounding box were added to help Elasticsearch `geo_centroid` and `geo_boundingbox` aggregations but really not needed for the docvalue format). 2. This PR is a move of progress over perfection. I do feel it's important to do the best we can on the first iteration but there will always be needed improvements. We have mechanisms in place to enable us to unleash experimental features for the purpose of receiving feedback from production. The PR is using those mechanisms as intended. No need to iterate to what _we think_ is "production ready". 3. Akin to 2 I'd prefer to avoid waiting another two months+ before the next minor release to get this in the wild and start obtaining that feedback and iterating. Those iterations will improve with more feedback. If we do feel this is cutting it close for 9.3 then I'd prefer merging this PR to main and 9.4 and iterate on this code for 9.4. 4. I am happy to hear Elasticsearch wanting to contribute their implementation but I think it's better to start with a foundation and iterate. We can merge any desirable properties from that Elasticsearch field in follow ups. I think that will strengthen the field as a whole and agree it's a great decision but do not believe it is a requirement before merging the current functioning PR. Again, progress over perfection. > this way of developing this exciting and complex feature is making things harder. Harder for who? > I would like to propose to initially focus on the data structure and once we are happy, we can start integrating the functionality, e.g support for queries and so on. Query support is already in this PR so I'm not sure what you mean by "start integrating the functionality". Regarding the desire to add a visitor access pattern that's a nice to have but not a requirement for this PR. I wrote an initial rough implementation (because I also thought it would be nice to mimic the query visitor pattern) but it's an improvement that is easier to add in a follow-up since (as this PR shows) it's not a requirement for supporting the queries in the first iteration. I agree with the bounding box improvements (which, again, could come later) and will add the centroid fix, but that is unrelated to the relation and query logic in this PR which already has parity with the BKD index queries and uses the same test scaffolding. > Here is our current implementation which can be used as a good starting point. The current PR already has a starting point so I'm not sure why the proposal to scrap and start from the Elasticsearch proprietary implementation (which could've been proposed two years ago if parity is the concern). I took a quick look at the proposed code and have some differing of opinions on the implementation: * I didn't see any explicit relation visitors; so it doesn't look like the prototype code includes bounding box relation logic or tests against any BKD index queries to ensure functional parity. * I didn't look at the details of the serialized format. That shouldn't matter so long as the API is the same and results are correct. This PR now includes a `VERSION` byte to provide a mechanism to change the format and ensure backwards compatibility. This reduces Elasticsearch risk. * The PR results here are matching the BKD index queries so I'm confident the PR query results are correct. I'll add (or someone can add) the visitor access pattern in a follow up. Again, it's not a requirement for Lucene (even if it is one for Elasticsearch). * This PR isolates all ShapeDocValues logic (e.g., Readers and Writers) as private logic to the single pkg private abstract `ShapeDocValues` and only exposes field and query instantiation through the public `LatLonShape` and `XYShape` access classes. I prefer this approach (which is consistent w/ the BKD field API) over the prototype that conversely has that split into disparate separate Reader / Writer classes with public abstractions. I think we should keep the abstraction and internal API surface area tidy and be thoughtful about what's exposed (e.g., only pkg-private `ShapeDocValues` and `ShapeDocValueField`. Visitor member class and BaseQuery foundation classes for internal extensions, and `LatLonShape` and `XYShape` static factories for public access). I'll add the centroid fixes and bounding box additions to this PR shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org