easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1791978766
I ran this with wikimedium10m and wikimediumall, There was no significant performance improvement or regression that was found. The total size of tip has a slight reduced: | | baseline | candidate | | --- | --- | --- | | wikimedium10m | 10280673 | 10275716 | | wikimediumall | 28530090 | 28496270 | The counted the different `nodeFlags` for wikimedium10m: | strategies | count | percent | | --- | --- | --- | | ARCS_FOR_DIRECT_ADDRESSING | 558555 | 50.23% | | ARCS_FOR_CONTINUOUS | 25215 | 2.26% | | ARCS_FOR_BINARY_SEARCH | 9 | 0.00% | | Linear search(bytesPerArc:0) | 528100 | 47.49% | It seems that the percentage hitting this optimization is small, but the data is dense for the arcs, so i generated 10 million random long values as terms: ``` for (int i = 0; i < 1000_0000; i++) { Document doc = new Document(); doc.add(new StringField("f1", String.valueOf(rand.nextLong()), Store.NO)); indexWriter.addDocument(doc); } ``` This optimization will be hit in most cases: | strategies | count | percent | | --- | --- | --- | | ARCS_FOR_DIRECT_ADDRESSING | 2469 | 2.58% | | ARCS_FOR_CONTINUOUS | 78732 | 82.45% | | ARCS_FOR_BINARY_SEARCH | 0 | 0.00% | | Linear search(bytesPerArc:0) | 14280 | 14.95% | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org