Re: [PR] Improve PointRangeQuery's "inverse" optimization. [lucene]

2025-03-13 Thread via GitHub
jpountz commented on PR #14353: URL: https://github.com/apache/lucene/pull/14353#issuecomment-2721270141 Nightly benchmarks don't show a change since their points are not clustered by doc ID so they don't call `visit(DocIdSetIterator)`. -- This is an automated message from the Apache Git

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993755452 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993758687 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993751047 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993776438 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] knn search - add tests to perform exact search when filtering does not return enough results [lucene]

2025-03-13 Thread via GitHub
carlosdelest commented on code in PR #14274: URL: https://github.com/apache/lucene/pull/14274#discussion_r1993167987 ## lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java: ## @@ -646,6 +654,24 @@ public void testRandomWithFilter() throws IOException {

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993786103 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
dweiss commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2720479164 > @dweiss understands this one the best, he implemented it. ... 15 years ago in LUCENE-3832. Thanks for putting so much trust in my memory. I'll take a look. -- This is an automa

Re: [PR] Disable the query cache by default. [lucene]

2025-03-13 Thread via GitHub
jpountz commented on code in PR #14187: URL: https://github.com/apache/lucene/pull/14187#discussion_r1993476471 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -77,7 +77,8 @@ public class IndexSearcher { static int maxClauseCount = 1024; - priva

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993802895 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993812719 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] deduplicate standard BKDConfig records [lucene]

2025-03-13 Thread via GitHub
iverase merged PR #14338: URL: https://github.com/apache/lucene/pull/14338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[PR] Improve DenseConjunctionBulkScorer's sparse fallback. [lucene]

2025-03-13 Thread via GitHub
jpountz opened a new pull request, #14354: URL: https://github.com/apache/lucene/pull/14354 `DenseConjunctionBulkScorer` has a fallback to sparse evaluation mode when the intersection of the clauses evaluated so far becomes sparse. Currently, this sparse evaluation mode clears bits in the w

[PR] Fetch PR branch to fix changelog workflow [lucene]

2025-03-13 Thread via GitHub
stefanvodita opened a new pull request, #14355: URL: https://github.com/apache/lucene/pull/14355 I've been debugging this and I think without the `ref`, our diff was effectively between a commit and itself. Now we're actually diffing the PR against the base merge commit. Addresses #1

Re: [PR] Add venv to rat gitignore [lucene]

2025-03-13 Thread via GitHub
rmuir merged PR #14346: URL: https://github.com/apache/lucene/pull/14346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Case-insensitive TermInSetQuery Implementation (Proof of Concept) [lucene]

2025-03-13 Thread via GitHub
jpountz commented on PR #14349: URL: https://github.com/apache/lucene/pull/14349#issuecomment-2721204272 Implementing it as a query that rewrites to the proper query based on the list of terms makes sense to me. You may want to move this query to the lucene-queries module since it doesn't r

Re: [I] Create a bot to check if there is a CHANGES entry for new PRs [lucene]

2025-03-13 Thread via GitHub
stefanvodita commented on issue #13898: URL: https://github.com/apache/lucene/issues/13898#issuecomment-2721556890 @pseudo-nymous, I've attempted a fix in #14355. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Fetch PR branch to fix changelog workflow [lucene]

2025-03-13 Thread via GitHub
stefanvodita merged PR #14355: URL: https://github.com/apache/lucene/pull/14355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993741308 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Improve PointRangeQuery's "inverse" optimization. [lucene]

2025-03-13 Thread via GitHub
jpountz merged PR #14353: URL: https://github.com/apache/lucene/pull/14353 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

2025-03-13 Thread via GitHub
chishui commented on issue #13675: URL: https://github.com/apache/lucene/issues/13675#issuecomment-2717251375 > I have recently been interested in this direction and plan on spending non trivial amount of time on this over the next few weeks. Assuming we haven't started dev on this, I am as

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
dweiss commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2720681661 Crap, you're right. Didn't think of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1993768037 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
dweiss commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2720601955 I think this will work just fine in most cases and is a rather inexpensive way to implement this case-insensitive matching, but this comes at the cost of the output automaton that may not

Re: [PR] Improve PointRangeQuery's "inverse" optimization. [lucene]

2025-03-13 Thread via GitHub
jpountz commented on PR #14353: URL: https://github.com/apache/lucene/pull/14353#issuecomment-2721927266 Thanks @iverase ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Improve PointRangeQuery's "inverse" optimization. [lucene]

2025-03-13 Thread via GitHub
jpountz commented on PR #14353: URL: https://github.com/apache/lucene/pull/14353#issuecomment-2721296892 Maybe I spoke too soon, there seems to be a tiny speedup: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff

Re: [PR] Speed up scoring conjunctions a bit. [lucene]

2025-03-13 Thread via GitHub
jpountz merged PR #14345: URL: https://github.com/apache/lucene/pull/14345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[PR] Improve PointRangeQuery's "inverse" optimization. [lucene]

2025-03-13 Thread via GitHub
jpountz opened a new pull request, #14353: URL: https://github.com/apache/lucene/pull/14353 When a `PointRangeQuery` matches for than 50% of points and the field is dense and single-valued, it internally computes the set of docs that don't match the query, which should be faster since fewer

Re: [PR] PointInSetQuery use reverse collection to improve performance [lucene]

2025-03-13 Thread via GitHub
msfroh commented on PR #14352: URL: https://github.com/apache/lucene/pull/14352#issuecomment-2722703844 > Another new thought is that I believe SinglePointVisitor is unnecessary, as processing each point separately will traverse the bkd tree multiple times. MergePointVisitor will only trave

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
msfroh commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722105723 My thinking is that a query that uses this should lowercase, dedupe, and sort the input before feeding it into `StringsToAutomaton`. That would handle @dweiss's example (i.e. that input i

Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]

2025-03-13 Thread via GitHub
github-actions[bot] commented on PR #14270: URL: https://github.com/apache/lucene/pull/14270#issuecomment-2722992549 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add posTagFormat parameter for OpenNLPPOSFilter [lucene]

2025-03-13 Thread via GitHub
github-actions[bot] commented on PR #14194: URL: https://github.com/apache/lucene/pull/14194#issuecomment-2722992638 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Make Lucene better at skipping long runs of matches. [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14312: URL: https://github.com/apache/lucene/pull/14312#discussion_r1987237995 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -117,6 +107,65 @@ private static int advance(FixedBitSet set, int i) { }

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
rmuir commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722901888 > Would a check with `Character.isLowerCase()` on each input codepoint for the case-insensitive case be sufficient to reject that kind of input across all valid Unicode strings? I d

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
msfroh commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2722961519 To the best of my understanding from reading the through the code while sketching this PR, I believe it would produce a minimal DFA if every character in a set of alternatives in the inpu

Re: [PR] PointInSetQuery clips segments by lower and upper [lucene]

2025-03-13 Thread via GitHub
hanbj commented on PR #14268: URL: https://github.com/apache/lucene/pull/14268#issuecomment-2720535117 @stefanvodita Thank you for the review. Unit testing has been added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] knn search - add tests to perform exact search when filtering does not return enough results [lucene]

2025-03-13 Thread via GitHub
benwtrent commented on code in PR #14274: URL: https://github.com/apache/lucene/pull/14274#discussion_r1993097312 ## lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java: ## @@ -646,6 +654,24 @@ public void testRandomWithFilter() throws IOException {

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
mikemccand commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2720657669 This is an exciting change @gf2121! Smaller, simpler, and faster!? I'll try to review soon. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] PointInSetQuery clips segments by lower and upper [lucene]

2025-03-13 Thread via GitHub
iverase commented on PR #14268: URL: https://github.com/apache/lucene/pull/14268#issuecomment-2720590130 >When creating a PointInSetQuery object, the data in the packedPoints parameter is returned in order, so the maximum and minimum values ​​can be determined when iterating over packedPoin

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
rmuir commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2720652612 Bigger downside: that example isn't deterministic either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] PointInSetQuery use reverse collection to improve performance [lucene]

2025-03-13 Thread via GitHub
hanbj opened a new pull request, #14352: URL: https://github.com/apache/lucene/pull/14352 ### Description Performance issues with terms number field encountered in production environments: high query time and very high CPU usage. Through analysis and localization, it was found

Re: [PR] [DRAFT] Case-insensitive matching over union of strings [lucene]

2025-03-13 Thread via GitHub
dweiss commented on PR #14350: URL: https://github.com/apache/lucene/pull/14350#issuecomment-2720790704 I also don't think you can make it deterministic in any trivial way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-03-13 Thread via GitHub
gf2121 commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r1994852913 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Trie.java: ## @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o