Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13052: URL: https://github.com/apache/lucene/pull/13052#issuecomment-1915825022 Maybe put a comment here describing why foreach is faster, especially because merging will always produce smaller sets so always triggering slow path. @jpountz what do you think

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13052: URL: https://github.com/apache/lucene/pull/13052#issuecomment-1915822328 I think we should apply this PR. When merging, Lucene always has a larger set before the merge so the new set set is always smaller. So `AbstractSet#removeAll` will always use the slo

Re: [PR] Make FSTCompiler.compile() to only return the FSTMetadata [lucene]

2024-01-29 Thread via GitHub
github-actions[bot] commented on PR #12831: URL: https://github.com/apache/lucene/pull/12831#issuecomment-1915807375 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] clean up smoketester GPG leaks [lucene]

2024-01-29 Thread via GitHub
github-actions[bot] commented on PR #12882: URL: https://github.com/apache/lucene/pull/12882#issuecomment-1915807209 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Align instanceof check with type cast [lucene]

2024-01-29 Thread via GitHub
uschindler commented on code in PR #13039: URL: https://github.com/apache/lucene/pull/13039#discussion_r1470388288 ## lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/PayloadAttributeImpl.java: ## @@ -62,8 +62,7 @@ public boolean equals(Object other) { retu

Re: [PR] Align instanceof check with type cast [lucene]

2024-01-29 Thread via GitHub
uschindler commented on code in PR #13039: URL: https://github.com/apache/lucene/pull/13039#discussion_r1470387185 ## lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/PayloadAttributeImpl.java: ## @@ -62,8 +62,7 @@ public boolean equals(Object other) { retu

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13052: URL: https://github.com/apache/lucene/pull/13052#issuecomment-1915775381 I am not sure if I like this code. Unfortunately the spec of `Set#removeAll()` is a bit stupid and should only use `argument.contains()` if it is a set (so almost constant contains is

Re: [PR] Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13051: URL: https://github.com/apache/lucene/pull/13051#issuecomment-1915761017 When backporting this to Lucene 9.x, it confliced because the sets have different contents in older version. I fixed this. Nevertheless, the static final constants should be unm

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
uschindler merged PR #13048: URL: https://github.com/apache/lucene/pull/13048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use String.isEmpty() instead of equals("") [lucene]

2024-01-29 Thread via GitHub
uschindler merged PR #13050: URL: https://github.com/apache/lucene/pull/13050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` [lucene]

2024-01-29 Thread via GitHub
uschindler merged PR #13051: URL: https://github.com/apache/lucene/pull/13051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use String.indexOf(char) where possible [lucene]

2024-01-29 Thread via GitHub
uschindler merged PR #13049: URL: https://github.com/apache/lucene/pull/13049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use String.replace() instead of replaceAll() [lucene]

2024-01-29 Thread via GitHub
uschindler merged PR #13047: URL: https://github.com/apache/lucene/pull/13047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Remove concatenation in String.format() calls [lucene]

2024-01-29 Thread via GitHub
uschindler merged PR #13038: URL: https://github.com/apache/lucene/pull/13038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13048: URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915682882 Looks fine. Have you added all PR numbers that were merged today and the ones I approved already? I will merge them soon. -- This is an automated message from the Apache Git Servic

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
sabi0 commented on PR #13048: URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915656350 I hope I understood your suggestions with CHANGES.txt right. Please let me know if I should change something there. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Remove concatenation in String.format() calls [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13038: URL: https://github.com/apache/lucene/pull/13038#issuecomment-1915649179 Thanks for this. The code also has safety problems. If the concatted text para contain percent signs it would break. So format strings should never ever be constructed with variables

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13048: URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915623346 It would be nice to have a changes entry in `CHANGES.txt`. You can add it to 9.10 section and I will cherry pick the changes in Lucene 9.x. As you have more open PRs with cleanu

[PR] Clean up AnyQueryNode code [lucene]

2024-01-29 Thread via GitHub
sabi0 opened a new pull request, #13053: URL: https://github.com/apache/lucene/pull/13053 - removed redundant field initializers - fixed a typo in the field name: `minimumMatching_m_Elements` - removed redundant `instanceof` checks - replaced type casts with a pattern variable - c

Re: [PR] Align instanceof check with type cast [lucene]

2024-01-29 Thread via GitHub
sabi0 commented on code in PR #13039: URL: https://github.com/apache/lucene/pull/13039#discussion_r1470102508 ## lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/processors/AnalyzerQueryNodeProcessor.java: ## @@ -107,10 +105,9 @@ public QueryNode proce

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
sabi0 commented on PR #13048: URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915392997 I've added my name to the GitHub profile. And will review the PRs to describe the reasoning. -Dmitry -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
sabi0 commented on code in PR #13048: URL: https://github.com/apache/lucene/pull/13048#discussion_r1470080459 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -185,7 +185,7 @@ static VectorizationProvider lookup(boolean testMode)

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-01-29 Thread via GitHub
jmazanec15 commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1915207682 @kevindrosendahl This is really cool! I had a couple questions around product quantization implementation. I see in [VectorSandboxVamanaVectorsWriter](https://github.com/kevi

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
uschindler commented on PR #13048: URL: https://github.com/apache/lucene/pull/13048#issuecomment-1915203166 For your cleanup pull requests to be merged, please make sure to add a description to your issue, why those changes are useful. I agree to merge this (any some of your other PRs

Re: [PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
uschindler commented on code in PR #13048: URL: https://github.com/apache/lucene/pull/13048#discussion_r1469932148 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -185,7 +185,7 @@ static VectorizationProvider lookup(boolean test

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub
mikemccand commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1469886161 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexUpgradeBackwardsCompatibility.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub
dweiss commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1915017690 Oh, one more problem is that you can't "see" the structure of tests before you actually run them (in an IDE). Don't know how much of an issue this is in practice but it's impossible to so

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub
dweiss commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1915011660 I like it. Whenever there is test repetition that can be driven by data, it should be driven by data. The only downside to using ParametersFactory is that it's something that is i

[PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-29 Thread via GitHub
sabi0 opened a new pull request, #13052: URL: https://github.com/apache/lucene/pull/13052 When list size is greater `list.contains()` will be called for each set element. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[PR] Use String.indexOf(char) where possible [lucene]

2024-01-29 Thread via GitHub
sabi0 opened a new pull request, #13049: URL: https://github.com/apache/lucene/pull/13049 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[PR] Use orElseGet() to avoid unnecessary object allocation [lucene]

2024-01-29 Thread via GitHub
sabi0 opened a new pull request, #13048: URL: https://github.com/apache/lucene/pull/13048 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
jpountz merged PR #13043: URL: https://github.com/apache/lucene/pull/13043 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub
jpountz commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1469520708 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestBinaryBackwardsCompatibility.java: ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software

Re: [I] Directory based approach for index encryption [LUCENE-9379] [lucene]

2024-01-29 Thread via GitHub
osnatShomrony commented on issue #10419: URL: https://github.com/apache/lucene/issues/10419#issuecomment-1914640685 With the new requirement of PCI 4.0 that disk encryption cannot be the only protection for data at rest, this contribution becomes very crucial, is there any progress with thi

Re: [I] Contribution: Codec for index-level encryption [LUCENE-6966] [lucene]

2024-01-29 Thread via GitHub
osnatShomrony commented on issue #8023: URL: https://github.com/apache/lucene/issues/8023#issuecomment-1914638272 With the new requirement of PCI 4.0 that disk encryption cannot be the only protection for data at rest, this contribution becomes very crucial, is there any progress with this

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
mrkm4ntr commented on PR #13043: URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914637603 Added, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] AES Encrypted Directory [LUCENE-2228] [lucene]

2024-01-29 Thread via GitHub
osnatShomrony commented on issue #3304: URL: https://github.com/apache/lucene/issues/3304#issuecomment-1914637568 With the new requirement of PCI 4.0 that disk encryption cannot be the only protection for data at rest, this contribution becomes very crucial, is there any progress with this

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-01-29 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1914618652 I have done this test back in Lucene 9.4, and we still end up every once in a while a graph where the mean number of connections hovers around `1` and whose connectedness is very

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub
s1monw commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1469520535 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestIndexUpgradeBackwardsCompatibility.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Sof

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub
s1monw commented on code in PR #13046: URL: https://github.com/apache/lucene/pull/13046#discussion_r1469519342 ## lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestDVUpdateBackwardsCompatibility.java: ## @@ -0,0 +1,268 @@ +/* + * Licensed to the Apache Softwar

[PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-29 Thread via GitHub
s1monw opened a new pull request, #13046: URL: https://github.com/apache/lucene/pull/13046 After working on #12829 I was quite surprised about the complexity our BWC have and how ancient the appear compared to the rest of our test suite. I took a step back and tired to modernize the tests t

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
jpountz commented on PR #13043: URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914443248 Yes please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-29 Thread via GitHub
jfreden commented on PR #13036: URL: https://github.com/apache/lucene/pull/13036#issuecomment-1914370159 Thanks for the review! Great ideas! I will work on adding a simple heuristic and caching the `TermState`. -- This is an automated message from the Apache Git Service. T

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
mrkm4ntr commented on PR #13043: URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914355141 Thanks. 9.10 as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
jpountz commented on PR #13043: URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914335779 The change looks good to me, can you add a CHANGES entry? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
jpountz commented on PR #13043: URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914332845 OK, I see, it's about conjunctions within disjunctions. Thanks for explaining. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-29 Thread via GitHub
jpountz commented on PR #13036: URL: https://github.com/apache/lucene/pull/13036#issuecomment-1914327764 This is a great speedup on `CountOrHighMed`! Too bad it's not faster all the time, though I'm not too surprised as conjunctions have more overhead than disjunctions when all clauses have

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
mrkm4ntr commented on PR #13043: URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914318228 For this case. Suppose ScorerA, B, and C can return valid maxScore. If the ScorerD is dominant, larger minCompetitiveScore is set to WANDScorer. But ConjunctionScorer returns Infinity a

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-29 Thread via GitHub
jfreden commented on PR #13036: URL: https://github.com/apache/lucene/pull/13036#issuecomment-1914220139 Output from luceneutil. **The count tasks:** ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [I] What if we pick up segments in segment size's ascending order in TieredMergePolicy.doFindMerges? [lucene]

2024-01-29 Thread via GitHub
jpountz commented on issue #13022: URL: https://github.com/apache/lucene/issues/13022#issuecomment-1914190216 > but the main reason may be doStall takes longer with data growing This sounds plausible. FWIW I don't think that anyone actually performed testing on a modern HDD, values gr

Re: [I] What if we pick up segments in segment size's ascending order in TieredMergePolicy.doFindMerges? [lucene]

2024-01-29 Thread via GitHub
jpountz closed issue #13022: What if we pick up segments in segment size's ascending order in TieredMergePolicy.doFindMerges? URL: https://github.com/apache/lucene/issues/13022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Make use of null-checked variable [lucene]

2024-01-29 Thread via GitHub
jpountz merged PR #13040: URL: https://github.com/apache/lucene/pull/13040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Use short-circuit boolean expressions [lucene]

2024-01-29 Thread via GitHub
jpountz merged PR #13042: URL: https://github.com/apache/lucene/pull/13042 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Use short-circuit boolean expressions [lucene]

2024-01-29 Thread via GitHub
jpountz commented on PR #13042: URL: https://github.com/apache/lucene/pull/13042#issuecomment-1914176269 Thanks, eagerly evaluating all conditions looks unintended indeed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Propagate topLevelScoringClause from QueryProfiler [lucene]

2024-01-29 Thread via GitHub
jpountz merged PR #13031: URL: https://github.com/apache/lucene/pull/13031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Support getMaxScore of ConjunctionScorer for non top level scoring clause [lucene]

2024-01-29 Thread via GitHub
jpountz commented on PR #13043: URL: https://github.com/apache/lucene/pull/13043#issuecomment-1914167175 Thanks for looking into this, can you explain what kind of queries perform better with this change? -- This is an automated message from the Apache Git Service. To respond to the messa