nknize commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1883224420

   > There are existing tests which should fails. 
https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java#L67
 
   > However because we are catching exception and try until we get the correct 
polygon, it never fails.
   
   That try-catch is intentional. The evil, esoteric / invalid test geometry 
generation is also intentional so we can maximize our test coverage across the 
tessellator and Shape indexing.  This is why we have [Tessellator 
logic](https://github.com/apache/lucene/blob/4d916a754be608e7ffb4e5626e4f99b21b8513f1/lucene/core/src/java/org/apache/lucene/geo/Tessellator.java#L1293)
 to filter colinear and coplanar points so we can best effort index shapes that 
may not be well formed or may not have been pre-validated or cleaned. While we 
expect well formed data, that's not the real world. Users (especially geo 
users) often throw dirty data at the API. And to pre-validate geometries is an 
expensive operation. Not all users want to pay that price. This is why we left 
much of that diligence out of Lucene and in the hands of the integrating API. 
For example, see Elasticsearch [`ignore_malformed` 
parameter](https://github.com/elastic/elasticsearch/pull/24654). For the 
explicit shape doc value boundi
 ng box test referenced, we expect a valid polygon bounding box for test 
purposes, but (as just pointed out) that reused polygon generator could create 
a malformed one and an invalid bounding box. Rather than write a new polygon 
generator I opted to reuse the existing one and just retry until a valid 
polygon is created.  What you could do is just write a new polygon generator 
that always creates a valid well-formed polygon. I'd be concerned, though, 
about new contributors always using that to create perfect data, which is not 
real world.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to