iverase commented on PR #1017:
URL: https://github.com/apache/lucene/pull/1017#issuecomment-1194123352

   Hey Nick,
   
   Why so much hurry with this change? I appreciate everything you are trying 
to do to signal the feature as experimental and adding the version on the doc 
value but still it will be nice to have something production ready from the 
beginning and this way of developing this exciting and complex feature is 
making things harder. 
   
   I am very much interested in this change because Elasticsearch has its own 
doc value implementation. I would like to eventually move to Lucene doc values 
so I want to make sure that the functionalities, the Elasticsearch 
implementation currently has, can be preserved when / if it makes sense. In 
order to achieve that we would like to donate our code or help with our 
experience in this feature as we have been running it in production for 2+ 
years with success.
   
   The Elasticsearch implementation is pretty much similar to the one you are 
proposing, it has three parts, first we add all the information regarding the 
extent of the geometry, then we add centroid information and finally we have 
what I call the “triangle tree” which is exactly what you described in this 
issue. Here are the differences and how we would like to help adding them:
   
   Our geometry extent contains more information than just plain min/max values 
of the coordinates that you are proposing. In particular, we capture the 
minimum positive value and the maximum negative value for the x coordinate so 
we can wrap those bounding boxes around the dateline in the geo case. This 
would be my proposal for a Extent object that captures all that information: 
https://github.com/iverase/lucene/blob/TriangleTree/lucene/core/src/java/org/apache/lucene/geo/Extent.java
   In the case of the centroid, the biggest difference is that we are computing 
the centroid using the original geometry. I really like the algorithm you are 
proposing but you are working on the encoded space and I am wondering now if 
that would work for cartesian, remember that the encoding is not linear so in 
that case the centroids might be incorrect.
   I would like to discuss the benefits of the way you are encoding those 
values with a bit more care too.
   Finally the triangle tree is the same idea, it is just the serialisation of 
an interval tree that is composed of tessellation elements. One thing I 
realised while developing this feature is that it is necessary to be able to 
visit the tree in different ways and therefore adding a visitor pattern for the 
tree is a big win. Here is our current implementation which can be used as a 
good starting point: 
https://github.com/iverase/lucene/blob/TriangleTree/lucene/core/src/java/org/apache/lucene/geo/TriangleTreeReader.java
   
   
   Finally, I would like to propose to initially focus on the data structure 
and once we are happy, we can start integrating the functionality, e.g support 
for queries and so on. I hope a more structured approach will make sure we get 
the right structure.
   
   What do you think? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to