iverase commented on PR #1017: URL: https://github.com/apache/lucene/pull/1017#issuecomment-1194123352
Hey Nick, Why so much hurry with this change? I appreciate everything you are trying to do to signal the feature as experimental and adding the version on the doc value but still it will be nice to have something production ready from the beginning and this way of developing this exciting and complex feature is making things harder. I am very much interested in this change because Elasticsearch has its own doc value implementation. I would like to eventually move to Lucene doc values so I want to make sure that the functionalities, the Elasticsearch implementation currently has, can be preserved when / if it makes sense. In order to achieve that we would like to donate our code or help with our experience in this feature as we have been running it in production for 2+ years with success. The Elasticsearch implementation is pretty much similar to the one you are proposing, it has three parts, first we add all the information regarding the extent of the geometry, then we add centroid information and finally we have what I call the “triangle tree” which is exactly what you described in this issue. Here are the differences and how we would like to help adding them: Our geometry extent contains more information than just plain min/max values of the coordinates that you are proposing. In particular, we capture the minimum positive value and the maximum negative value for the x coordinate so we can wrap those bounding boxes around the dateline in the geo case. This would be my proposal for a Extent object that captures all that information: https://github.com/iverase/lucene/blob/TriangleTree/lucene/core/src/java/org/apache/lucene/geo/Extent.java In the case of the centroid, the biggest difference is that we are computing the centroid using the original geometry. I really like the algorithm you are proposing but you are working on the encoded space and I am wondering now if that would work for cartesian, remember that the encoding is not linear so in that case the centroids might be incorrect. I would like to discuss the benefits of the way you are encoding those values with a bit more care too. Finally the triangle tree is the same idea, it is just the serialisation of an interval tree that is composed of tessellation elements. One thing I realised while developing this feature is that it is necessary to be able to visit the tree in different ways and therefore adding a visitor pattern for the tree is a big win. Here is our current implementation which can be used as a good starting point: https://github.com/iverase/lucene/blob/TriangleTree/lucene/core/src/java/org/apache/lucene/geo/TriangleTreeReader.java Finally, I would like to propose to initially focus on the data structure and once we are happy, we can start integrating the functionality, e.g support for queries and so on. I hope a more structured approach will make sure we get the right structure. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org