Re: [PR] Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" [lucene]

2023-12-11 Thread via GitHub
shaikhu commented on PR #12519: URL: https://github.com/apache/lucene/pull/12519#issuecomment-1849510960 > And it looks like a few more mis-spellings crept in? I can't find any other instances (other then the changelog.txt) of it being misspelled . -- This is an automated mess

Re: [PR] Revert "Optimize outputs accumulating for SegmentTermsEnum and InterssectTermsEnum " [lucene]

2023-12-11 Thread via GitHub
javanna closed pull request #12899: Revert "Optimize outputs accumulating for SegmentTermsEnum and InterssectTermsEnum " URL: https://github.com/apache/lucene/pull/12899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Revert "Optimize outputs accumulating for SegmentTermsEnum and InterssectTermsEnum " [lucene]

2023-12-11 Thread via GitHub
javanna commented on PR #12899: URL: https://github.com/apache/lucene/pull/12899#issuecomment-1849537869 Closed in favour of #12900 . Better to fix the problem and build on top of the original change than reverting entirely. -- This is an automated message from the Apache Git Service. To

[PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz opened a new pull request, #12908: URL: https://github.com/apache/lucene/pull/12908 This builds on top of #12904 and adds back the write logic for the terms dictionary format we used between 9.0 and 9.8, so that we can more easily test it. -- This is an automated message

Re: [PR] [branch_9_9] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty commented on PR #12903: URL: https://github.com/apache/lucene/pull/12903#issuecomment-1849596742 > ``` > gradlew --no-build-cache :lucene:core:beast -Ptests.dups=100 --tests "org.apache.lucene.util.bkd.TestDocIdsWriter.testCrash" > ``` Thanks @dweiss , that's nicer,

Re: [PR] [branch_9_9] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty commented on code in PR #12903: URL: https://github.com/apache/lucene/pull/12903#discussion_r142033 ## lucene/core/src/test/org/apache/lucene/util/bkd/TestDocIdsWriter.java: ## @@ -150,4 +154,18 @@ public Relation compare(byte[] minPackedValue, byte[] maxPacked

Re: [PR] [branch_9_9] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty commented on code in PR #12903: URL: https://github.com/apache/lucene/pull/12903#discussion_r142445 ## lucene/core/src/java/org/apache/lucene/util/MSBRadixSorter.java: ## @@ -214,6 +214,12 @@ protected int getBucket(int i, int k) { * @see #buildHistogram

Re: [PR] [branch_9_9] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty commented on code in PR #12903: URL: https://github.com/apache/lucene/pull/12903#discussion_r142033 ## lucene/core/src/test/org/apache/lucene/util/bkd/TestDocIdsWriter.java: ## @@ -150,4 +154,18 @@ public Relation compare(byte[] minPackedValue, byte[] maxPacked

Re: [PR] [branch_9_9] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty commented on PR #12903: URL: https://github.com/apache/lucene/pull/12903#issuecomment-1849713173 I've addressed all comments so far. The use of System::getenv is a little annoying, as it required a test-only permission, but it's minimal and make the test very straightforward t

Re: [PR] Add test that tickles a JVM JIT crash on JDK's less than 21.0.1 [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty closed pull request #12887: Add test that tickles a JVM JIT crash on JDK's less than 21.0.1 URL: https://github.com/apache/lucene/pull/12887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [Backporting] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [

2023-12-11 Thread via GitHub
javanna commented on PR #12891: URL: https://github.com/apache/lucene/pull/12891#issuecomment-1849753229 Thanks for merging this @zacharymorn . I am updating Elasticsearch to this change and I came to wonder if we should deprecate the static methods that create collectors. I do agree on exp

Re: [PR] [branch_9_9] Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash [lucene]

2023-12-11 Thread via GitHub
dweiss commented on code in PR #12903: URL: https://github.com/apache/lucene/pull/12903#discussion_r1422276072 ## gradle/testing/randomization/policies/tests.policy: ## @@ -63,6 +63,9 @@ grant { permission java.lang.RuntimePermission "getFileStoreAttributes"; permission ja

Re: [PR] Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12519: URL: https://github.com/apache/lucene/pull/12519#issuecomment-1849813948 Great, thank you @shaikhu -- I'll merge and backport for 9.10. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Correct last remaining instances of typo e.g. "Levenstein" -> "Levenshtein" [lucene]

2023-12-11 Thread via GitHub
mikemccand merged PR #12519: URL: https://github.com/apache/lucene/pull/12519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422300969 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1849889528 Thanks. Why can't we move the whole loop to the util class? Of course you could benchmark this, but this may also the reson for the slowdown in the NIOFSDir case, -- This is an aut

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422358990 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { } }

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1849946150 The `precommit` failure looks unrelated -- good 'ol `TestRandomChains`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1849947295 I restarted that failed job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Add tests for Lucene90PostingsFormat back [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12904: URL: https://github.com/apache/lucene/pull/12904#issuecomment-1849951055 > I just noticed that the move from FOR to PFOR did all the work to make the old format (FOR) writeable, but missed keeping an instance of `BasePostingsFormatTestCase` for this format

Re: [PR] Add tests for Lucene90PostingsFormat back [lucene]

2023-12-11 Thread via GitHub
jpountz merged PR #12904: URL: https://github.com/apache/lucene/pull/12904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422381211 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1849958671 Hmm the retried job failed with something scary and clearly FST related: ``` java.lang.AssertionError: mismatch frozenHash=1634033861501336 vs hash=6385840662311 >

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1849963592 Don't worry @mikemccand, I'll look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] Fix bug where NFARunAutomaton#getTransition does not set Transition correctly [lucene]

2023-12-11 Thread via GitHub
zhaih opened a new pull request, #12909: URL: https://github.com/apache/lucene/pull/12909 ### Description Fix: #12906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1849961825 And in fact the first run's failure was NOT spurious! It failed with this root cause: ``` org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved to

Re: [I] Improve BWC tests to reveal #12895 and confirm its fix [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on issue #12901: URL: https://github.com/apache/lucene/issues/12901#issuecomment-1849968380 > We should look into more systematically forking the code when we do a file format change, or figuring out other ways to keep testing prior formats with version bumps, e.g. by r

Re: [I] TransitionAccessor for NFA: get transitions for a given state via random-access leads to wrong results. [lucene]

2023-12-11 Thread via GitHub
zhaih commented on issue #12906: URL: https://github.com/apache/lucene/issues/12906#issuecomment-1849962693 @Tony-X Nice catch I have a PR to fix: #12909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422390737 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { } }

Re: [I] Improve BWC tests to reveal #12895 and confirm its fix [lucene]

2023-12-11 Thread via GitHub
jpountz commented on issue #12901: URL: https://github.com/apache/lucene/issues/12901#issuecomment-1849971078 Agreed, I didn't want to go the forking route for `FST`, which I'm not too familiar with, but ideally we'd version it like other file formats: FST90, FST99, etc. -- This is an au

Re: [I] Refactor HNSW graph build such that concurrent build won't impact single thread build [lucene]

2023-12-11 Thread via GitHub
zhaih commented on issue #12732: URL: https://github.com/apache/lucene/issues/12732#issuecomment-1849980796 BTW I opened #12910 which (I think) will be helpful to this task. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422395622 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422396582 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

[PR] Refactor around NeighborArray [lucene]

2023-12-11 Thread via GitHub
zhaih opened a new pull request, #12910: URL: https://github.com/apache/lucene/pull/12910 ### Description Not a huge refactor, basically: 1. Not share field (`nodes/scores`) outside 2. Move diversity check inside NeighborArray such that no one outside should tell Neighbo

[I] Require bundled FSTs to be on the current FST version [lucene]

2023-12-11 Thread via GitHub
jpountz opened a new issue, #12911: URL: https://github.com/apache/lucene/issues/12911 ### Description In Lucene 9.9 we bumped the FST version, but we did not re-generate our bundled FSTs such as the ones that are used by our Kuromoji analyzer. IMO we should systematically re-generat

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422413707 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422413707 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422417582 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-11 Thread via GitHub
gf2121 closed pull request #12912: Add BWC test to reveal #12895 URL: https://github.com/apache/lucene/pull/12912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12912: URL: https://github.com/apache/lucene/pull/12912#discussion_r1422428244 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2265,4 +2268,47 @@ public void testReadNMinusTwoSegmentI

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-11 Thread via GitHub
benwtrent commented on code in PR #12912: URL: https://github.com/apache/lucene/pull/12912#discussion_r1422433156 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2265,4 +2268,47 @@ public void testReadNMinusTwoSegmentIn

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850035022 I can change the jenkins jobs to use a more strict pattern. That's no problem. On Policeman Jenkins I could change the sysprop, too. I wonder why jenkins hit the problem

[PR] #12901: add TestBackwardsCompatibility test case that reveals the block tree IntersectTermsEnum bug #12895 [lucene]

2023-12-11 Thread via GitHub
mikemccand opened a new pull request, #12913: URL: https://github.com/apache/lucene/pull/12913 This is a smallish repro of #12895. The test should fail currently with the glorious `java.lang.ArrayIndexOutOfBoundsException` -- let's confirm GitHub actions agrees. Then I'll add an `@Ignore`

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422436257 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { } }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422441239 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422441239 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-11 Thread via GitHub
gf2121 commented on code in PR #12912: URL: https://github.com/apache/lucene/pull/12912#discussion_r1422443348 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2265,4 +2268,47 @@ public void testReadNMinusTwoSegmentInfos

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850050936 I think I found the bug, the FST version was mistakenly initialized to 0 instead of Version.CURRENT in some cases, it should be good now. -- This is an automated message from the Apach

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12912: URL: https://github.com/apache/lucene/pull/12912#discussion_r1422446225 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2265,4 +2268,47 @@ public void testReadNMinusTwoSegmentI

[PR] Clean up sleep in TestBackwardsCompatibility#testCreateMoreTermsIndex [lucene]

2023-12-11 Thread via GitHub
gf2121 opened a new pull request, #12914: URL: https://github.com/apache/lucene/pull/12914 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] #12901: add TestBackwardsCompatibility test case that reveals the block tree IntersectTermsEnum bug #12895 [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12913: URL: https://github.com/apache/lucene/pull/12913#issuecomment-1850065392 Yay! I've never been so happy to see a test failure! ``` org.apache.lucene.backward_index.TestBackwardsCompatibility > test suite's output saved to /home/runner/work/lucene

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422466176 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850094072 It's actually a bad news that all tests pass here, as this means that our `BasePostingsFormatTestCase` is not good enough to uncover the recent problem with `Terms#intersect`... So we co

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422449061 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -325,6 +328,21 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850104246 > I think I found the bug, the FST version was mistakenly initialized to 0 instead of Version.CURRENT in some cases, it should be good now. Phew!! -- This is an automated mes

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422487388 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -109,10 +109,20 @@ public enum INPUT_TYPE { // Increment version to change it private static

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422487100 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -325,6 +328,21 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850117651 > It's actually a bad news that all tests pass here, as this means that our `BasePostingsFormatTestCase` is not good enough to uncover the recent problem with `Terms#intersect`... So

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422494949 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { } }

Re: [PR] #12901: add TestBackwardsCompatibility test case that reveals the block tree IntersectTermsEnum bug #12895 [lucene]

2023-12-11 Thread via GitHub
mikemccand merged PR #12913: URL: https://github.com/apache/lucene/pull/12913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Improve BWC tests to reveal #12895 and confirm its fix [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on issue #12901: URL: https://github.com/apache/lucene/issues/12901#issuecomment-1850129073 OK this is done -- I pushed the new BWC test case (@Ignore'd) to 9.9.x, 9.x and main. Actually I can un-@Ignore at least in main. I'll go do that. -- This is an automa

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850129755 Hmmm... I agree we can't expect `BasePostingsFormatTestCase` to catch all bw compat problems, but the `TestLucene90PostingsFormat` from this PR writes data in the 9.8 format of the terms

Re: [I] Improve BWC tests to reveal #12895 and confirm its fix [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on issue #12901: URL: https://github.com/apache/lucene/issues/12901#issuecomment-1850135758 > Actually I can un-@ignore at least in main. I'll go do that. D'oh! No, I cannot -- it will still fail in main, 9.x and 9.9.x until we get the fix (#12900) in. -- This

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422507453 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1422507453 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { }

Re: [PR] Add BWC test to reveal #12895 [lucene]

2023-12-11 Thread via GitHub
gf2121 commented on code in PR #12912: URL: https://github.com/apache/lucene/pull/12912#discussion_r1422514730 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBackwardsCompatibility.java: ## @@ -2265,4 +2268,47 @@ public void testReadNMinusTwoSegmentInfos

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12900: URL: https://github.com/apache/lucene/pull/12900#issuecomment-1850151623 I've confirmed the new (failing) BWC test from #12901 now passes with this PR. I'll review ... -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422523889 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -109,10 +109,20 @@ public enum INPUT_TYPE { // Increment version to change it private static

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-11 Thread via GitHub
stefanvodita commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1422523360 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -757,6 +757,30 @@ public void testRamUsageEstimate() throws IOException { l

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-12-11 Thread via GitHub
stefanvodita commented on code in PR #12844: URL: https://github.com/apache/lucene/pull/12844#discussion_r1422523818 ## lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java: ## @@ -330,15 +330,36 @@ public static int[] growExact(int[] array, int newLength) { return c

Re: [PR] Upgrade ECJ to 3.36.0 [lucene]

2023-12-11 Thread via GitHub
ChrisHegarty commented on PR #12888: URL: https://github.com/apache/lucene/pull/12888#issuecomment-1850178914 relates: #12753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850184681 OK, let me try feeding LineFileDocs into this test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850181921 > Hmmm... I agree we can't expect `BasePostingsFormatTestCase` to catch all bw compat problems, but the `TestLucene90PostingsFormat` from this PR writes data in the 9.8 format of the

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422538272 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -109,10 +109,20 @@ public enum INPUT_TYPE { // Increment version to change it private sta

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422541990 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsu

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422549348 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsumer

[PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
daixque opened a new pull request, #12915: URL: https://github.com/apache/lucene/pull/12915 ### Description Sutegana (捨て仮名) is small letter of hiragana and katakana in Japanese. In the old Japanese text, sutegana (捨て仮名) is not used unlikely to modern one. For example: - "ストップウ

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422599871 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsu

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on PR #12908: URL: https://github.com/apache/lucene/pull/12908#issuecomment-1850255060 > OK, let me try feeding LineFileDocs into this test case. FTR I will look into it, but it's probably best done in a follow-up PR rather than this one, let's merge this PR first?

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz commented on code in PR #12908: URL: https://github.com/apache/lucene/pull/12908#discussion_r1422609478 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene90/Lucene90RWPostingsFormat.java: ## @@ -75,7 +75,11 @@ public FieldsConsumer fieldsConsumer

[I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2023-12-11 Thread via GitHub
mikemccand opened a new issue, #12916: URL: https://github.com/apache/lucene/issues/12916 ### Description There is an exciting [upstream (OpenJ9) comment here](https://github.com/eclipse-openj9/openj9/issues/18400#issuecomment-1846199023) (thank you @singh264), copied below: T

Re: [PR] IntersectTermsEnum should accumulate from output prefix instead of current output [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12900: URL: https://github.com/apache/lucene/pull/12900#discussion_r1422705408 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnum.java: ## @@ -198,6 +204,7 @@ private IntersectTermsEnumFrame pushFrame(int st

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850392837 I though changing the pattern to lucene/**/build/hs_err_pid* would help but there's way too many files/folders in there so it'll eventually hit that threshold anyway. Unless we set t

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1850424855 The performance of the new approach seems regressed a bit more on java21_benchMMapDirectoryInputs_readGroupVInt, here is the difference of speed up relative to the baseline.

Re: [PR] Removing @lucene.experimental tags in testXXX methods in CheckIndex [lucene]

2023-12-11 Thread via GitHub
mikemccand merged PR #12893: URL: https://github.com/apache/lucene/pull/12893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850470237 If you set the bound to `Integer#MAX_VALUE` then it uses default directory scanner without time-based limutations: https://github.com/jenkinsci/jenkins/blob/f9a777bc682963de46403

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-12-11 Thread via GitHub
gsmiller closed issue #12418: Reproducible TestDrillSideways failure URL: https://github.com/apache/lucene/issues/12418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs [lucene]

2023-12-11 Thread via GitHub
gsmiller closed issue #12558: IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs URL: https://github.com/apache/lucene/issues/12558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850476822 Hi @dweiss: I am a bit confsed: On Policeman Jenkins the limit is already raised: ` -Dhudson.FilePath.VALIDATE_ANT_FILE_MASK_BOUND=6` This is in `/etc/default/jenkins`

Re: [PR] Refactor around NeighborArray [lucene]

2023-12-11 Thread via GitHub
benwtrent commented on code in PR #12910: URL: https://github.com/apache/lucene/pull/12910#discussion_r1422799676 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -201,9 +225,69 @@ private int descSortFindRightMostInsertionPoint(float newScore, int

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
uschindler commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1850500493 Hi, the problem of MMapDir is that the seek method has to update also the current block number. Maybe we pass a second lambda to update the position? Let's just try this out!

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2023-12-11 Thread via GitHub
mikemccand commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1422804214 ## lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseHiraganaUppercaseFilter.java: ## @@ -0,0 +1,65 @@ +package org.apache.lucene.analysis.ja;

Re: [PR] Add tests for the 9.0->9.8 block tree terms dict format back. [lucene]

2023-12-11 Thread via GitHub
jpountz merged PR #12908: URL: https://github.com/apache/lucene/pull/12908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-12-11 Thread via GitHub
easyice commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1850511554 Thank you @uschindler , thinking about that too, i will try to a second lambda tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850529839 Hi I found the problem, /etc/default/jenkins is no longer interpreted by jenkins systemd. I moved the stuff to override file (and deleted the defaults file). Now looks fine on Po

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850536421 But I can't change ASF Jenkins. In additio, Policeman Jenkins did not show a stack trace. It only had the message with 1 files. I have the feeling ASF jenkins is older versio

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
dweiss commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850545653 Thanks, Uwe! The problem is on ASF infrastructure - that's where I see those exceptions most in my emails/ log messages. I don't know if there is a way to tweak it just for Lucene wo

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850553512 This is a global option, you can only change it for the main jenkins. The workers are started remotely and get the same flag over the wire on startup. -- This is an automated m

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850559511 Let me wait on Policeman Jenkins how the global property affects the workers (they are still called slaves in the internals). There might be the option to ask special syspr

Re: [I] jenkins dump file traversal exceptions ("no matches found within 10000") [lucene]

2023-12-11 Thread via GitHub
uschindler commented on issue #12907: URL: https://github.com/apache/lucene/issues/12907#issuecomment-1850567620 Looks like you can't pass sysprops to individual workers. There are different types of workers (I use shell script launcher as this works better with Virtualbox), but the default

  1   2   >