[ 
https://issues.apache.org/jira/browse/LUCENE-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578674#comment-17578674
 ] 

Nick Knize edited comment on LUCENE-10654 at 8/11/22 9:08 PM:
--------------------------------------------------------------

Nightly test failure on XY bounding box:


{code:java}
Reproduce with: gradlew :lucene:core:test --tests 
"org.apache.lucene.document.TestShapeDocValues.testXYPolygonBBox" 
-Ptests.jvms=4 -Ptests.haltonfailure=false 
-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=ABDF070B81479950 
-Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false 
-Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 
-Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/test-data/enwiki.random.lines.txt
{code}



{code:java}
1 tests failed.
FAILED:  org.apache.lucene.document.TestShapeDocValues.testXYPolygonBBox

Error Message:
java.lang.AssertionError: expected:<-2.028229934961692E32> but 
was:<-2.026382696309321E32>
{code}


This is caused because the {{TestUtil.nextPolygon}} is producing a polygon with 
an extruded colinear self intersecting vertex and the {{BaseXYShapeTestCase}} 
is not throwing this as an invalid polygon because the test case uses 
{{randomBoolean}}. The simple fix is to switch the TestCase to always throw an 
exception on invalid polygons so we never test with a non-compliant polygon. 
This passed the queries because the tessellator would filter out the dirty 
vertext. This test failed because the dirty vertext just happened to be the 
minimum X value. So this does expose an inconsistency where an invalid polygon 
will have a bounding box inconsistent with the raw geometry. I think that's 
okay because we have API guardrails to enable or disable strict validation and 
I don't think that should be removed.

I will open a PR to switch the base test cases over to strict geometry 
validation instead of random validation.



was (Author: nknize):
Nightly test failure on XY bounding box:


{code:java}
Reproduce with: gradlew :lucene:core:test --tests 
"org.apache.lucene.document.TestShapeDocValues.testXYPolygonBBox" 
-Ptests.jvms=4 -Ptests.haltonfailure=false 
-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=ABDF070B81479950 
-Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false 
-Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 
-Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/test-data/enwiki.random.lines.txt
{code}



{code:java}
1 tests failed.
FAILED:  org.apache.lucene.document.TestShapeDocValues.testXYPolygonBBox

Error Message:
java.lang.AssertionError: expected:<-2.028229934961692E32> but 
was:<-2.026382696309321E32>
{code}


This is caused because the {{{TestUtil.nextPolygon}}} is producing a polygon 
with an extruded colinear self intersecting vertex and the 
{{BaseXYShapeTestCase}} is not throwing this as an invalid polygon because the 
test case uses {{randomBoolean}}. The simple fix is to switch the TestCase to 
always throw an exception on invalid polygons so we never test with a 
non-compliant polygon. This passed the queries because the tessellator would 
filter out the dirty vertext. This test failed because the dirty vertext just 
happened to be the minimum X value. So this does expose an inconsistency where 
an invalid polygon will have a bounding box inconsistent with the raw geometry. 
I think that's okay because we have API guardrails to enable or disable strict 
validation and I don't think that should be removed.

I will open a PR to switch the base test cases over to strict geometry 
validation instead of random validation.


> New companion doc value format for LatLonShape and XYShape field types
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-10654
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10654
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Nick Knize
>            Priority: Major
>             Fix For: 9.4
>
>          Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> {{XYDocValuesField}} provides doc value support for {{XYPoint}}. 
> {{LatLonDocValuesField}} provides docvalue support for {{LatLonPoint}}.
> However, neither {{LatLonShape}} nor {{XYShape}} currently have a docvalue 
> format. 
> This lack of doc value support for shapes means facets, aggregations, and 
> IndexOrDocValues queries are currently not possible for Shape field types. 
> This gap needs be closed in lucene.
> To support IndexOrDocValues queries along with various geometry aggregations 
> and facets, the ability to compute the spatial relation with the doc value is 
> needed. This is straightforward with {{XYPoint}} and {{LatLonPoint}} since 
> the doc value encoding is nothing more than a simple 2D integer encoding of 
> the x,y and lat,lon dimensional components. Accomplishing the same with a 
> naive integer encoded binary representation for N-vertex shapes would be 
> costly. 
> {{ComponentTree}} already provides an efficient in memory structure for 
> quickly computing spatial relations over Shape types based on a binary tree 
> of tessellated triangles provided by the {{Tessellator}}. Furthermore, this 
> tessellation is already computed at index time. If we create an on-disk 
> representation of {{ComponentTree}} 's binary tree of tessellated triangles 
> and use this as the doc value {{binaryValue}} format we will be able to 
> efficiently compute spatial relations with this binary representation and 
> achieve the same facet/aggregation result over shapes as we can with points 
> today (e.g., grid facets, centroid, area, etc).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to