Re: [PR] Replace List by IntArrayList and List by LongArrayList. [lucene]

2024-05-23 Thread via GitHub
bruno-roustant commented on code in PR #13406: URL: https://github.com/apache/lucene/pull/13406#discussion_r1612922072 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java: ## @@ -349,7 +350,7 @@ public boolean incrementToken() throws IOExce

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13398: URL: https://github.com/apache/lucene/pull/13398#discussion_r1611335102 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -202,6 +183,89 @@ public boolean keepFullyDeletedSegment( dir.close(); } + publ

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-23 Thread via GitHub
navneet1v commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2128519709 > My only concern is why is this necessary? > > Is feature parity the only reason? Currently in my application I want to extend the KNNVectorsFormat just like Codec,

Re: [I] What does the Lucene community think about dimensionality reduction for vectors, and should it be something the library does internally (at merge time perhaps)? [lucene]

2024-05-23 Thread via GitHub
benwtrent commented on issue #13403: URL: https://github.com/apache/lucene/issues/13403#issuecomment-2128355186 > If the number of dimensions are reduced, you don't even need to quantize them? Does PCA work in non-Euclidean spaces? Does it work on out-of-domain queries? If

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
benwtrent commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2128344039 My old PR: https://github.com/apache/lucene/pull/13200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
benwtrent commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2128342633 I did kind of change before, and the added complexity and backwards compatibility concerns just didn't seem warranted. This is why the decision to do the scorer pluggability was added

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-23 Thread via GitHub
benwtrent commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2128332920 My only concern is why is this necessary? Is feature parity the only reason? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Terminate automaton after matched the whole prefix for PrefixQuery. [lucene]

2024-05-23 Thread via GitHub
github-actions[bot] commented on PR #13072: URL: https://github.com/apache/lucene/pull/13072#issuecomment-2128266558 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-05-23 Thread via GitHub
shatejas commented on PR #13407: URL: https://github.com/apache/lucene/pull/13407#issuecomment-2128074368 CC: @jimczi, @benwtrent similar to [#12551 ](https://github.com/apache/lucene/pull/12551) except efSearch is static here. -- This is an automated message from the Apache Git Servi

[PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-05-23 Thread via GitHub
shatejas opened a new pull request, #13407: URL: https://github.com/apache/lucene/pull/13407 ### Description efSearch is one of the parameters in HNSW algorithm which can help increase recall. efSearch value indicates the neighbors that the algorithm explores to get the nearest ones

Re: [PR] Add test for parsing brackets in range queries [lucene]

2024-05-23 Thread via GitHub
benchaplin commented on PR #13323: URL: https://github.com/apache/lucene/pull/13323#issuecomment-2127990572 Thanks for the advice @dweiss, I will play around with some test cases for this fix. In the meantime, I think we can merge this and I will open another PR with the grammar changes if

Re: [PR] Add 'passageSortComparator' option in FieldHighlighter [lucene]

2024-05-23 Thread via GitHub
Seunghan-Jung commented on code in PR #13276: URL: https://github.com/apache/lucene/pull/13276#discussion_r1612190552 ## lucene/CHANGES.txt: ## @@ -298,6 +298,8 @@ Improvements * GITHUB#13385: Add Intervals.noIntervals() method to produce an empty IntervalsSource. (Aniketh

Re: [PR] Replace List by IntArrayList and List by LongArrayList. [lucene]

2024-05-23 Thread via GitHub
dweiss commented on code in PR #13406: URL: https://github.com/apache/lucene/pull/13406#discussion_r1612106758 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/core/FlattenGraphFilter.java: ## @@ -349,7 +350,7 @@ public boolean incrementToken() throws IOException {

Re: [PR] Add 'passageSortComparator' option in FieldHighlighter [lucene]

2024-05-23 Thread via GitHub
dsmiley merged PR #13276: URL: https://github.com/apache/lucene/pull/13276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add 'passageSortComparator' option in FieldHighlighter [lucene]

2024-05-23 Thread via GitHub
dsmiley commented on code in PR #13276: URL: https://github.com/apache/lucene/pull/13276#discussion_r1612090361 ## lucene/CHANGES.txt: ## @@ -298,6 +298,8 @@ Improvements * GITHUB#13385: Add Intervals.noIntervals() method to produce an empty IntervalsSource. (Aniketh Jain,

Re: [PR] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-23 Thread via GitHub
navneet1v commented on PR #13394: URL: https://github.com/apache/lucene/pull/13394#issuecomment-2127668713 @uschindler @ChrisHegarty Could you please take a look, if you get a chance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[PR] Replace List by IntArrayList and List by LongArrayList. [lucene]

2024-05-23 Thread via GitHub
bruno-roustant opened a new pull request, #13406: URL: https://github.com/apache/lucene/pull/13406 Add IntArrayList and LongArrayList to the HPPC fork. Use them to replace usages of List of Integer or Long. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Add 'passageSortComparator' option in FieldHighlighter [lucene]

2024-05-23 Thread via GitHub
Seunghan-Jung commented on code in PR #13276: URL: https://github.com/apache/lucene/pull/13276#discussion_r1611850876 ## lucene/CHANGES.txt: ## @@ -298,6 +298,8 @@ Improvements * GITHUB#13385: Add Intervals.noIntervals() method to produce an empty IntervalsSource. (Aniketh

Re: [PR] Add 'passageSortComparator' option in FieldHighlighter [lucene]

2024-05-23 Thread via GitHub
Seunghan-Jung commented on code in PR #13276: URL: https://github.com/apache/lucene/pull/13276#discussion_r1611850876 ## lucene/CHANGES.txt: ## @@ -298,6 +298,8 @@ Improvements * GITHUB#13385: Add Intervals.noIntervals() method to produce an empty IntervalsSource. (Aniketh

[PR] Remove unchecked Scorable -> Scorer cast in lucene/monitor. [lucene]

2024-05-23 Thread via GitHub
jpountz opened a new pull request, #13405: URL: https://github.com/apache/lucene/pull/13405 While doing an unrelated refactoring, I got hit by this unchecked cast, which is incorrect when the presearcher query produces some specialized `BulkScorer`. -- This is an automated message from t

Re: [PR] Replace Set by IntHashSet and Set by LongHashSet [lucene]

2024-05-23 Thread via GitHub
bruno-roustant merged PR #13400: URL: https://github.com/apache/lucene/pull/13400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@luc

Re: [PR] Add 'passageSortComparator' option in FieldHighlighter [lucene]

2024-05-23 Thread via GitHub
dsmiley commented on PR #13276: URL: https://github.com/apache/lucene/pull/13276#issuecomment-2127252097 I like this change; very nice! I shall merge it. Please share how I shall list you in CHANGES.txt. Proposed text under improvements to 9.11: > GITHUB#13276: UnifiedHighlighter

Re: [PR] gradlew: no "--source 11" [lucene]

2024-05-23 Thread via GitHub
dsmiley commented on PR #13404: URL: https://github.com/apache/lucene/pull/13404#issuecomment-2127020182 Admittedly the --source issue was specific to a custom JVM variant where I work. Maybe --release alone would work; I didn't check. Any way, I'd rather remove needless specificity. --

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-23 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2126884759 Thanks Mike and Adrian for the feedback. > You do not mention it explicitly in the issue description, but presumably this only makes sense if an index sort is configured, o

Re: [PR] Add support for similarity-based vector searches [lucene]

2024-05-23 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-2126824340 Yes @alessandrobenedetti that is correct -- some result may be missed if nodes along its path from the entry node score below the result threshold (but still higher than a traversal thr

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13398: URL: https://github.com/apache/lucene/pull/13398#discussion_r1611318109 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -202,6 +183,89 @@ public boolean keepFullyDeletedSegment( dir.close(); } + publ

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13398: URL: https://github.com/apache/lucene/pull/13398#discussion_r1611354430 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -202,6 +183,89 @@ public boolean keepFullyDeletedSegment( dir.close(); } + publ

Re: [PR] Add support for similarity-based vector searches [lucene]

2024-05-23 Thread via GitHub
alessandrobenedetti commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-2126740768 Hi @kaivalnp, thanks for this contribution! My question is why do we have two thresholds, one for grap traversal (used to decide if it's worth exploring a candidate nei

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13398: URL: https://github.com/apache/lucene/pull/13398#discussion_r1611354430 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -202,6 +183,89 @@ public boolean keepFullyDeletedSegment( dir.close(); } + publ

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13398: URL: https://github.com/apache/lucene/pull/13398#discussion_r1611335102 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -202,6 +183,89 @@ public boolean keepFullyDeletedSegment( dir.close(); } + publ

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13398: URL: https://github.com/apache/lucene/pull/13398#discussion_r1611318109 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -202,6 +183,89 @@ public boolean keepFullyDeletedSegment( dir.close(); } + publ

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
uschindler commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1611268935 ## lucene/test-framework/src/java/module-info.java: ## @@ -19,6 +19,7 @@ @SuppressWarnings({"module", "requires-automatic", "requires-transitive-automatic"}) mod

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
uschindler commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1611267014 ## lucene/test-framework/src/java/module-info.java: ## @@ -19,6 +19,7 @@ @SuppressWarnings({"module", "requires-automatic", "requires-transitive-automatic"}) mod

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
uschindler commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1611267014 ## lucene/test-framework/src/java/module-info.java: ## @@ -19,6 +19,7 @@ @SuppressWarnings({"module", "requires-automatic", "requires-transitive-automatic"}) mod

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
uschindler commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1611261932 ## lucene/test-framework/src/java/module-info.java: ## @@ -19,6 +19,7 @@ @SuppressWarnings({"module", "requires-automatic", "requires-transitive-automatic"}) mod

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
uschindler commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1611253555 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseFieldInfoFormatTestCase.java: ## @@ -328,6 +332,17 @@ private int getVectorsMaxDimensions(String

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-23 Thread via GitHub
uschindler commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1611226882 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90FieldInfosFormat.java: ## @@ -103,16 +104,27 @@ * VectorSimilarityFunction

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1611094333 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }