Re: [PR] SortedSet DV Multi Range query [lucene]

2024-12-18 Thread via GitHub
mkhludnev commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r1891293118 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] SortedSet DV Multi Range query [lucene]

2024-12-18 Thread via GitHub
mkhludnev commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r1891293118 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] SortedSet DV Multi Range query [lucene]

2024-12-18 Thread via GitHub
mkhludnev commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r1891251385 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (A

[PR] Fix urls describing why NIOFS is not recommended for Windows [lucene]

2024-12-18 Thread via GitHub
YeonghyeonKO opened a new pull request, #14081: URL: https://github.com/apache/lucene/pull/14081 ### Description - Lucene decides to open segment files whether JRE_IS_64BIT is true or false as below: ```java public static FSDirectory open(Path path, LockFactory lockFactory) throws

Re: [PR] SortedSet DV Multi Range query [lucene]

2024-12-18 Thread via GitHub
gsmiller commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r1890984514 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Better encapsulate locking logic in HnswGraphBuilder [lucene]

2024-12-18 Thread via GitHub
github-actions[bot] commented on PR #14016: URL: https://github.com/apache/lucene/pull/14016#issuecomment-2552520192 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Missing word on Brazillian stop word list [lucene]

2024-12-18 Thread via GitHub
rmuir closed issue #14065: Missing word on Brazillian stop word list URL: https://github.com/apache/lucene/issues/14065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Update stopwords.txt [lucene]

2024-12-18 Thread via GitHub
rmuir merged PR #14075: URL: https://github.com/apache/lucene/pull/14075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Update stopwords.txt [lucene]

2024-12-18 Thread via GitHub
eusousu commented on PR #14075: URL: https://github.com/apache/lucene/pull/14075#issuecomment-2552417520 @rmuir sorry I was unaware of this requirement. Added the entry, I really hope I got the pattern right 😅 I really appreciate your work and patience, thank you 😄 -- This is an a

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2024-12-18 Thread via GitHub
mayya-sharipova commented on code in PR #14078: URL: https://github.com/apache/lucene/pull/14078#discussion_r1890793535 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Soft

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2024-12-18 Thread via GitHub
mayya-sharipova commented on code in PR #14078: URL: https://github.com/apache/lucene/pull/14078#discussion_r1890779090 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Soft

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2024-12-18 Thread via GitHub
mayya-sharipova commented on code in PR #14078: URL: https://github.com/apache/lucene/pull/14078#discussion_r1890778000 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/package-info.java: ## @@ -0,0 +1,436 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on code in PR #14080: URL: https://github.com/apache/lucene/pull/14080#discussion_r1890769528 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on code in PR #14080: URL: https://github.com/apache/lucene/pull/14080#discussion_r1890769019 ## lucene/core/src/java/org/apache/lucene/search/BooleanScorerSupplier.java: ## @@ -318,13 +318,26 @@ BulkScorer filteredOptionalBulkScorer() throws IOException {

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on code in PR #14080: URL: https://github.com/apache/lucene/pull/14080#discussion_r1890768640 ## lucene/core/src/java/org/apache/lucene/search/BooleanScorerSupplier.java: ## @@ -304,9 +304,9 @@ BulkScorer optionalBulkScorer() throws IOException { BulkScorer

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on code in PR #14080: URL: https://github.com/apache/lucene/pull/14080#discussion_r1890766083 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

Re: [PR] aws jmh benchmark cleanups [lucene]

2024-12-18 Thread via GitHub
rmuir merged PR #14072: URL: https://github.com/apache/lucene/pull/14072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] aws jmh benchmark cleanups [lucene]

2024-12-18 Thread via GitHub
rmuir commented on PR #14072: URL: https://github.com/apache/lucene/pull/14072#issuecomment-2552134247 merging this one for now: will report back with a PR to make the SSH more robust. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Update stopwords.txt [lucene]

2024-12-18 Thread via GitHub
rmuir commented on PR #14075: URL: https://github.com/apache/lucene/pull/14075#issuecomment-2552131795 @eusousu if you want, add entry to lucene/CHANGES.txt for your fix. Otherwise I'm happy to do it for you, you've done plenty to get this fix in, I want to merge it. -- This is an automa

Re: [PR] Optimize DFS while marking connected components [lucene]

2024-12-18 Thread via GitHub
vigyasharma commented on PR #14022: URL: https://github.com/apache/lucene/pull/14022#issuecomment-2552101424 > > With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bits

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
benwtrent commented on code in PR #14080: URL: https://github.com/apache/lucene/pull/14080#discussion_r1890670632 ## lucene/core/src/java/org/apache/lucene/search/BooleanScorerSupplier.java: ## @@ -318,13 +318,26 @@ BulkScorer filteredOptionalBulkScorer() throws IOException {

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
benwtrent commented on code in PR #14080: URL: https://github.com/apache/lucene/pull/14080#discussion_r1890656002 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Optimize DFS while marking connected components [lucene]

2024-12-18 Thread via GitHub
viswanathk commented on PR #14022: URL: https://github.com/apache/lucene/pull/14022#issuecomment-2551944259 > > Benchmark while indexing 100k docs: > > could you say what data set you used here -- is this random vectors? If so, it would be great to use some non-random vectors so we ca

Re: [PR] Optimize DFS while marking connected components [lucene]

2024-12-18 Thread via GitHub
viswanathk commented on code in PR #14022: URL: https://github.com/apache/lucene/pull/14022#discussion_r1890620446 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java: ## @@ -163,6 +164,10 @@ private static Component markRooted( throws IOException { //

Re: [PR] Optimize DFS while marking connected components [lucene]

2024-12-18 Thread via GitHub
viswanathk commented on code in PR #14022: URL: https://github.com/apache/lucene/pull/14022#discussion_r1890620834 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java: ## @@ -178,7 +183,10 @@ private static Component markRooted( int friendCount = 0; w

Re: [PR] Optimize DFS while marking connected components [lucene]

2024-12-18 Thread via GitHub
viswanathk commented on PR #14022: URL: https://github.com/apache/lucene/pull/14022#issuecomment-2551867368 > > With this change, we require that connectedNodes should not be set for any nodes. This is slightly different from before, where you could pass a partially set connectedNodes bitse

Re: [PR] Optimize DFS while marking connected components [lucene]

2024-12-18 Thread via GitHub
viswanathk commented on code in PR #14022: URL: https://github.com/apache/lucene/pull/14022#discussion_r1890613233 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswUtil.java: ## @@ -163,6 +164,10 @@ private static Component markRooted( throws IOException { //

Re: [PR] Support disabling IndexSearcher.maxClauseCount with a value of -1 [lucene]

2024-12-18 Thread via GitHub
dweiss commented on PR #13178: URL: https://github.com/apache/lucene/pull/13178#issuecomment-2551785513 An alternative could be to implement this logic in the default visitor returned from getNumClausesCheckVisitor(). This way, if somebody overrides this method, it'll still work for them af

Re: [I] Nightly build failed (missing unsupported index for 9.12.1) [lucene]

2024-12-18 Thread via GitHub
dweiss commented on issue #14068: URL: https://github.com/apache/lucene/issues/14068#issuecomment-2551764380 To be honest, I'm not sure myself how these are supposed to work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Nightly build failed (missing unsupported index for 9.12.1) [lucene]

2024-12-18 Thread via GitHub
ChrisHegarty commented on issue #14068: URL: https://github.com/apache/lucene/issues/14068#issuecomment-2551705943 Hmm.. I didn't add 9.x bwc tests to _main_, since there is no compat from main/11 with 9. Maybe these go elsewhere ? -- This is an automated message from the Apache Git Servi

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on PR #14080: URL: https://github.com/apache/lucene/pull/14080#issuecomment-2551649936 I made the heuristic more conservative, results now look like this on my M3 after a few iterations: ``` TaskQPS baseline StdDevQPS my_modified_

Re: [I] Nightly build failed (missing unsupported index for 9.12.1) [lucene]

2024-12-18 Thread via GitHub
ChrisHegarty commented on issue #14068: URL: https://github.com/apache/lucene/issues/14068#issuecomment-2551648221 I did merge the 9.12.1 bwc indices, but maybe I missed something! Argh!!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on PR #14080: URL: https://github.com/apache/lucene/pull/14080#issuecomment-2551572454 On my Apple M3: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value CountOrHighM

Re: [PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on PR #14080: URL: https://github.com/apache/lucene/pull/14080#issuecomment-2551518514 wikibigall on my AMD Ryzen 9 3900X: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

[PR] Use the new `loadIntoBitSet` API to speed up dense conjunctions. [lucene]

2024-12-18 Thread via GitHub
jpountz opened a new pull request, #14080: URL: https://github.com/apache/lucene/pull/14080 Now that loading doc IDs into a bit set is much more efficient thanks to auto-vectorization, it has become tempting to evaluate dense conjunctions by and-ing bit sets. -- This is an automated mess

Re: [PR] aws jmh benchmark cleanups [lucene]

2024-12-18 Thread via GitHub
rmuir commented on PR #14072: URL: https://github.com/apache/lucene/pull/14072#issuecomment-2551508242 @ChrisHegarty I want to make it so anybody can use it :) I understand the SSH reliability issue as I have experienced it before. The issue is that i have sub-10ms ping time to us-eas

Re: [PR] hunspell: tolerate REP rule count mismatches [lucene]

2024-12-18 Thread via GitHub
rmuir merged PR #14079: URL: https://github.com/apache/lucene/pull/14079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] aws jmh benchmark cleanups [lucene]

2024-12-18 Thread via GitHub
ChrisHegarty commented on PR #14072: URL: https://github.com/apache/lucene/pull/14072#issuecomment-2551481342 @rmuir this LGTM. I clearly have some misconfiguration and/or quirkiness in my ssh setup, so no need to do too much to appease me. I'm also not proficient with ansible! I'm

Re: [PR] hunspell: tolerate REP rule count mismatches [lucene]

2024-12-18 Thread via GitHub
rmuir commented on PR #14079: URL: https://github.com/apache/lucene/pull/14079#issuecomment-2551441249 I added a CHANGES. Also the underlying issue is now fixed in the LibreOffice. I will try to chase it down in the bataak repo after brushing up on some mongolian :) -- This is an automat

Re: [PR] Update stopwords.txt [lucene]

2024-12-18 Thread via GitHub
rmuir commented on PR #14075: URL: https://github.com/apache/lucene/pull/14075#issuecomment-2551434229 @eusousu I sent the email. I will also try to chase down the issue in the upstream bataak repo later, but it should not be an issue anymore. Thanks for the help and patience! -- This i

Re: [PR] Update stopwords.txt [lucene]

2024-12-18 Thread via GitHub
eusousu commented on PR #14075: URL: https://github.com/apache/lucene/pull/14075#issuecomment-2551221786 > I have an open bug report @rmuir they asked me about a license statement on the IRC chat that I think is necessary for them to accept your contribution -- This is an autom

Re: [PR] Update stopwords.txt [lucene]

2024-12-18 Thread via GitHub
rmuir commented on PR #14075: URL: https://github.com/apache/lucene/pull/14075#issuecomment-2551213164 See https://github.com/LibreOffice/dictionaries/commit/c3ff53711dcac4bdec24f23a2c1f9712a0833b67 Sorry for all the unrelated noise here on your PR! -- This is an automated message

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-12-18 Thread via GitHub
iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1890139004 ## lucene/core/src/java/org/apache/lucene/util/RandomAccessInputRef.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

Re: [PR] Speed up advancing on the disjunction iterator. [lucene]

2024-12-18 Thread via GitHub
jpountz commented on PR #14052: URL: https://github.com/apache/lucene/pull/14052#issuecomment-2551033528 Nigthly benchmarks confirmed the speedup: - https://benchmarks.mikemccandless.com/CountFilteredOrHighHigh.html - https://benchmarks.mikemccandless.com/CountFilteredOrMany.html --