[PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2024-01-30 Thread via GitHub
msfroh opened a new pull request, #13054: URL: https://github.com/apache/lucene/pull/13054 ### Description This stores the synonym map's FST and word lookup off-heap in a separate, configurable directory. The initial implementation is rough, but the unit tests pass with this chan

Re: [I] `SynonymGraphFilter` should read FSTs off-heap? [lucene]

2024-01-30 Thread via GitHub
msfroh commented on issue #13005: URL: https://github.com/apache/lucene/issues/13005#issuecomment-1916284724 I have a (rough) PR to address this: https://github.com/apache/lucene/pull/13054. I also moved the output word lookup off-heap, but it requires a random seek (within a hopeful

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub
sabi0 commented on PR #13052: URL: https://github.com/apache/lucene/pull/13052#issuecomment-1916306380 I've added a comment explaining the `forEach`. [Opening a bug](https://bugs.openjdk.org/) in OpenJDK requires "OpenJDK Author" status which I do not have. -- This is an automated

Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub
sabi0 commented on code in PR #13039: URL: https://github.com/apache/lucene/pull/13039#discussion_r1470768641 ## lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/PayloadAttributeImpl.java: ## @@ -62,8 +62,7 @@ public boolean equals(Object other) { return tr

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub
uschindler commented on PR #13052: URL: https://github.com/apache/lucene/pull/13052#issuecomment-1916340120 > I've added a comment explaining the `forEach`. > > [Opening a bug](https://bugs.openjdk.org/) in OpenJDK requires "OpenJDK Author" status which I do not have. I can do

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub
jpountz commented on code in PR #13052: URL: https://github.com/apache/lucene/pull/13052#discussion_r1470784089 ## lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java: ## @@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges( // the resulti

[PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub
sabi0 opened a new pull request, #13055: URL: https://github.com/apache/lucene/pull/13055 EnumSet.of() returns a mutable Set that should not be used for static final constants. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` [lucene]

2024-01-30 Thread via GitHub
sabi0 commented on PR #13051: URL: https://github.com/apache/lucene/pull/13051#issuecomment-1916363126 > static final constants should be unmodifiable sets > Could you open an issue about this? #13055 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub
sabi0 commented on code in PR #13052: URL: https://github.com/apache/lucene/pull/13052#discussion_r1470850110 ## lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java: ## @@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges( // the resulting

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub
sabi0 commented on code in PR #13052: URL: https://github.com/apache/lucene/pull/13052#discussion_r1470850110 ## lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java: ## @@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges( // the resulting

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
s1monw commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1916447395 > We should PnP this! What on earth means PnP? Mike, check out this search: https://www.google.com/search?q=pnp+acronym wikipedia FTW -- This is an automated message from the A

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-30 Thread via GitHub
jfreden commented on PR #13036: URL: https://github.com/apache/lucene/pull/13036#issuecomment-1916440041 I added code to only apply the optimization `if count(term-with-less-docs)/count(term-with-more-docs) < 0.1` and it yielded a way better result. Will investigate the term cache idea too

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub
uschindler commented on code in PR #13052: URL: https://github.com/apache/lucene/pull/13052#discussion_r1470981973 ## lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java: ## @@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges( // the resu

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub
uschindler merged PR #13052: URL: https://github.com/apache/lucene/pull/13052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub
uschindler commented on PR #13055: URL: https://github.com/apache/lucene/pull/13055#issuecomment-1916562601 Please add a changes.txt, as this is theoretically a backwards incompatible change. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub
uschindler merged PR #13039: URL: https://github.com/apache/lucene/pull/13039 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub
uschindler commented on PR #13039: URL: https://github.com/apache/lucene/pull/13039#issuecomment-1916576302 I added the changes entry to the existing one for you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub
sabi0 commented on PR #13039: URL: https://github.com/apache/lucene/pull/13039#issuecomment-1916580962 Thank you. I will make sure the new PRs have them out of the box. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub
uschindler merged PR #13055: URL: https://github.com/apache/lucene/pull/13055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub
uschindler commented on PR #13055: URL: https://github.com/apache/lucene/pull/13055#issuecomment-1916591556 I backported it. Like the previous PR this caused a conflict due to different stop tags, but this was easy to solve. -- This is an automated message from the Apache Git Service. To

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
uschindler commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1471018363 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] Update int array growth calls [lucene]

2024-01-30 Thread via GitHub
stefanvodita commented on code in PR #12947: URL: https://github.com/apache/lucene/pull/12947#discussion_r1471158692 ## lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/TaskSequence.java: ## Review Comment: My reasoning must have been way off the day I wro

[I] Reproducible failure in TestGeo3DPoint.testRandomBig [lucene]

2024-01-30 Thread via GitHub
easyice opened a new issue, #13056: URL: https://github.com/apache/lucene/issues/13056 ### Description ``` org.apache.lucene.spatial3d.TestGeo3DPoint > testRandomBig FAILED java.lang.AssertionError: FAIL: id=23785 should not have matched but did shape=GeoStandard

Re: [PR] Fix too many open files Exception for TestConcurrentMergeScheduler [lucene]

2024-01-30 Thread via GitHub
easyice commented on PR #13035: URL: https://github.com/apache/lucene/pull/13035#issuecomment-1917236900 Pushed a new fix for reproducible test failure `TestIndexWriterThreadsToSegments.testManyThreadsClose`: ``` ./gradlew test --tests TestIndexWriterThreadsToSegments.testManyT

[I] Reproducible failure in TestParentBlockJoinFloatKnnVectorQuery.testScoringWithMultipleChildren [lucene]

2024-01-30 Thread via GitHub
easyice opened a new issue, #13057: URL: https://github.com/apache/lucene/issues/13057 ### Description ``` org.apache.lucene.search.join.TestParentBlockJoinFloatKnnVectorQuery > testScoringWithMultipleChildren FAILED java.lang.AssertionError: expected:<1.0> but was:<0.019607

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
mikemccand commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917387757 > > We should PnP this! > > What on earth means PnP? Mike, check out this search: https://www.google.com/search?q=pnp+acronym wikipedia FTW PnP = progress not perfection!

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
uschindler commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917488764 > > > We should PnP this! > > > > > > What on earth means PnP? Mike, check out this search: https://www.google.com/search?q=pnp+acronym wikipedia FTW > > PnP = progre

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
s1monw commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917508359 @uschindler I need to hear you speaking german to tell, you know that ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
uschindler commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917572859 There is no formatted() with locale. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
uschindler commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1471695464 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java: ## @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-30 Thread via GitHub
jpountz commented on PR #13036: URL: https://github.com/apache/lucene/pull/13036#issuecomment-1917595110 Thanks @jfreden, the heuristic looks sensible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
uschindler commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1471716431 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java: ## @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Softwar

Re: [I] Advance to first position of 1 in BitSet before iterating the lead in BitSetConjunctionDISI? [lucene]

2024-01-30 Thread via GitHub
jpountz commented on issue #13024: URL: https://github.com/apache/lucene/issues/13024#issuecomment-1917610302 I'm not sure I like this idea, which feels quite arbitrary: why would there be a big gap of matching doc IDs towards the start of the doc ID space, and not in the middle or towards

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub
uschindler commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1471723409 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java: ## @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] clean up smoketester GPG leaks [lucene]

2024-01-30 Thread via GitHub
janhoy commented on PR #12882: URL: https://github.com/apache/lucene/pull/12882#issuecomment-1917641774 @hurutoriya Did you do any testing on the changes? If not, we need to QA that the script is not broken by this change. -- This is an automated message from the Apache Git Service. To re