[ https://issues.apache.org/jira/browse/LUCENE-10288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477036#comment-17477036 ]
Ignacio Vera commented on LUCENE-10288: --------------------------------------- Yes, I think we can compute it cheaply by counting how many points we have in that leaf node. Depending on that number it should be easy to determine if the tree is balanced or not. > Are 1-dimensional kd trees in pre-86 indices always unbalanced trees? > --------------------------------------------------------------------- > > Key: LUCENE-10288 > URL: https://issues.apache.org/jira/browse/LUCENE-10288 > Project: Lucene - Core > Issue Type: Bug > Reporter: Ignacio Vera > Priority: Blocker > Fix For: 9.1 > > > I am looking into a set error, it can be reproduced with the following > command in brach 9x: > {code} > ./gradlew :lucene:backward-codecs:test --tests > "org.apache.lucene.backward_codecs.lucene60.TestLucene60PointsFormat.testOneDimTwoValues" > -Dtests.seed=A70882387D2AAFC2 -Dtests.multiplier=3 > {code} > The actual error looks looks like: > {code:java} > org.apache.lucene.backward_codecs.lucene60.TestLucene60PointsFormat > test > suite's output saved to > /Users/ivera/projects/lucene_prod/lucene/backward-codecs/build/test-results/test/outputs/OUTPUT-org.apache.lucene.backward_codecs.lucene60.TestLucene60PointsFormat.txt, > copied below: > > java.lang.AssertionError: expected:<1137> but was:<1138> > > at > __randomizedtesting.SeedInfo.seed([A70882387D2AAFC2:1B737C7FDE6454F3]:0) > > at org.junit.Assert.fail(Assert.java:89) > > at org.junit.Assert.failNotEquals(Assert.java:835) > > at org.junit.Assert.assertEquals(Assert.java:647) > > at org.junit.Assert.assertEquals(Assert.java:633) > {code} > For Lucene created with this codec we assume that for 1D cases, the kd-trees > are unbalance but for the ND case we assume that they are always fully > balance. This is true for the generic case but this failure might show that > it might not always the case. > During this test a merging is going on, but during the merge we Havel the > following code: > {code:java} > for (PointsReader reader : mergeState.pointsReaders) { > if (reader instanceof Lucene60PointsReader == false) { > // We can only bulk merge when all to-be-merged segments use our format: > super.merge(mergeState); > return; > } > } {code} > So we only bulk merge segments that use `Lucene60PointsReader`. Not that if > we do not bulk merge a 1D index then it will be created as a fully balanced > tree! > In this case the test is wrapping the readers with the > {{SlowCodecReaderWrapper}} and therefore tricking our logic. > But I am wondering if this the case for Index sorting where our readers might > be wrapped with the {{{}SortingCodecReader{}}}. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org