Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1611094333 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] gradlew: no "--source 11" [lucene]

2024-05-22 Thread via GitHub
dweiss commented on PR #13404: URL: https://github.com/apache/lucene/pull/13404#issuecomment-2126340422 LGTM, although "some JDK/configs may complain about an incompatibility with --release" is intriguing - what are these JDK distributions, exactly? -- This is an automated message from th

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-22 Thread via GitHub
Pulkitg64 commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2126303810 Makes sense. Thanks @navneet1v for the suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Delete all live docs when query matched a whole segment. [lucene]

2024-05-22 Thread via GitHub
vsop-479 commented on PR #13395: URL: https://github.com/apache/lucene/pull/13395#issuecomment-2126075246 > Also, the use case this is optimizing for, feels rare? > Even so, it's quite rare that an index would invoke this pruning, and the likely smallish time it takes today is fine?

Re: [PR] Delete all live docs when query matched a whole segment. [lucene]

2024-05-22 Thread via GitHub
vsop-479 commented on code in PR #13395: URL: https://github.com/apache/lucene/pull/13395#discussion_r1610875380 ## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ## @@ -98,4 +100,14 @@ public int advanceShallow(int target) throws IOException { * {@link #advanceS

[PR] gradlew: no "--source 11" [lucene]

2024-05-22 Thread via GitHub
dsmiley opened a new pull request, #13404: URL: https://github.com/apache/lucene/pull/13404 * avoid WrapperDownloader if have the JAR * don't specify --source More specific than needed, and some JDK/configs may complain about an incompatibility with --release. From https://github.c

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-22 Thread via GitHub
navneet1v commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2125994144 +1 on the feature and functionality. I would like to recommend one thing here: Can we add the reloading the SPIs functionality for VectorSimilarityFunctions just like we have fo

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-22 Thread via GitHub
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2125281483 This is an interesting idea! You do not mention it explicitly in the issue description, but presumably this only makes sense if an index sort is configured, otherwise merges m

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609713219 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef

Re: [PR] Delete all live docs when query matched a whole segment. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13395: URL: https://github.com/apache/lucene/pull/13395#discussion_r1610019010 ## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ## @@ -98,4 +100,14 @@ public int advanceShallow(int target) throws IOException { * {@link #advanc

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13398: URL: https://github.com/apache/lucene/pull/13398#discussion_r1610001429 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -202,6 +183,89 @@ public boolean keepFullyDeletedSegment( dir.close(); } + pu

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609895554 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609879874 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609878850 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609839123 ## lucene/core/src/java/org/apache/lucene/search/TermQuery.java: ## @@ -150,7 +170,12 @@ public Scorer get(long leadCost) throws IOException { @Override

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609830164 ## lucene/core/src/java/org/apache/lucene/search/BlendedTermQuery.java: ## @@ -19,6 +19,7 @@ import java.io.IOException; import java.util.Arrays; import java.util.L

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609822485 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609822235 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609821173 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3754,13 +3754,17 @@ public static Status.TermVectorStatus testTermVectors( Ter

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1576373763 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -287,6 +287,68 @@ void rewind() { */ } + // Only re

Re: [I] What does the Lucene community think about dimensionality reduction for vectors, and should it be something the library does internally (at merge time perhaps)? [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on issue #13403: URL: https://github.com/apache/lucene/issues/13403#issuecomment-2124514088 +1 to explore these sorts of dimensionality-reduction compression techniques in Lucene! PQ indeed [looks compelling](https://www.irisa.fr/texmex/people/jegou/papers/jegou_search

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609713219 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609703874 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609702603 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609700636 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609634889 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609620348 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }