Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2024-05-24 Thread via GitHub
msfroh commented on PR #13054: URL: https://github.com/apache/lucene/pull/13054#issuecomment-2130737374 @dungba88 - I forgot about this change for a while. Did you create a separate PR for the saveMetadata change? Should I? -- This is an automated message from the Apache Git Service. To r

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-05-24 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1613171816 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -434,8 +436,29 @@ public boolean seekExact(BytesRef target) throws I

[PR] Add new test case "testGetLines" for lucene/core/analysis/WordlistLoader [lucene]

2024-05-24 Thread via GitHub
hack4chang opened a new pull request, #13419: URL: https://github.com/apache/lucene/pull/13419 ## Description - Add new test case to test the ```getLines``` in ```WordlistLoader``` functions normally. I create a string testcase that contains comment lines and blank lines to test ```ge

Re: [PR] gradlew: no "--source 11" [lucene]

2024-05-24 Thread via GitHub
dsmiley merged PR #13404: URL: https://github.com/apache/lucene/pull/13404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] improve test coverage: TestIntRange for invalid input range [lucene]

2024-05-24 Thread via GitHub
harenlin closed pull request #13416: improve test coverage: TestIntRange for invalid input range URL: https://github.com/apache/lucene/pull/13416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] Improve Test Coverage: added tests for SlowLog [lucene]

2024-05-24 Thread via GitHub
matthewsah opened a new pull request, #13417: URL: https://github.com/apache/lucene/pull/13417 This pull request adds tests for the SlowLog class which can be found in org/apache/lucene/monitor. There were no previously written test cases that cover SlowLog, so these tests will improve the

[PR] improve test coverage: TestIntRange for invalid input range [lucene]

2024-05-24 Thread via GitHub
harenlin opened a new pull request, #13416: URL: https://github.com/apache/lucene/pull/13416 ### Description In the ```document/TestIntRange.java```, the test case only tests for the valid input. I added the test for the invalid input, which should catch the exception message. By doi

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
ChrisHegarty commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2130254715 Thanks for the explanation - sounds like a reasonable idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Deprecate COSINE VectorSimilarity function [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on PR #13308: URL: https://github.com/apache/lucene/pull/13308#issuecomment-2130251068 Hi @benwtrent, could we merge this change, if everything looks good to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-05-24 Thread via GitHub
shatejas commented on code in PR #13407: URL: https://github.com/apache/lucene/pull/13407#discussion_r1613901301 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -54,14 +54,20 @@ abstract class AbstractKnnVectorQuery extends Query { protec

Re: [I] Remove Scorer#getWeight. [lucene]

2024-05-24 Thread via GitHub
navneet1v commented on issue #13410: URL: https://github.com/apache/lucene/issues/13410#issuecomment-2130159284 @jpountz what would be the alternative to `getWeight` function? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Introduces efSearch as a separate parameter in KNN{Byte:Float}VectorQuery [lucene]

2024-05-24 Thread via GitHub
navneet1v commented on code in PR #13407: URL: https://github.com/apache/lucene/pull/13407#discussion_r1613873312 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -54,14 +54,20 @@ abstract class AbstractKnnVectorQuery extends Query { prote

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
navneet1v commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2130144565 > Thanks for the clarification @navneet1v I didn't know folks were dynamically loading jars for different vector formats. > > The idea sounds good to me. I haven't reviewed

Re: [PR] lucene-monitor: remove now-unused Scorable in QueryIndex.DataValues (#13412 backport) [lucene]

2024-05-24 Thread via GitHub
cpoerschke merged PR #13415: URL: https://github.com/apache/lucene/pull/13415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
benwtrent commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2130105488 Thanks for the clarification @navneet1v I didn't know folks were dynamically loading jars for different vector formats. The idea sounds good to me. I haven't reviewed the P

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
navneet1v commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2130099034 > These other formats are dynamically pluggable, and this change is just making `KnnVectorsFormat` consistent with them, right? Yes that is correct. -- This is an automa

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
msfroh commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2130092260 Is it fair to say that this change just brings `KnnVectorsFormat` in line with: 1. PostingsFormat: https://github.com/apache/lucene/blob/2d6ad2fee6dfd96388594f4de9b37c037efe80

Re: [PR] lucene-monitor: remove now-unused Scorable in QueryIndex.DataValues (#13412 backport) [lucene]

2024-05-24 Thread via GitHub
cpoerschke commented on code in PR #13415: URL: https://github.com/apache/lucene/pull/13415#discussion_r1613771430 ## lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java: ## @@ -123,12 +122,10 @@ public static final class DataValues { SortedDocValues cacheId;

[PR] lucene-monitor: remove now-unused Scorable in QueryIndex.DataValues (#13412 backport) [lucene]

2024-05-24 Thread via GitHub
cpoerschke opened a new pull request, #13415: URL: https://github.com/apache/lucene/pull/13415 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-24 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2129984751 Thanks for the suggestion. Above suggestion for clustering within the segment does improves skipping of documents (especially when combined with [BKD optimisation](https://github

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
navneet1v commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2129970534 @ChrisHegarty > The dynamic addition of formats from other class loaders seems reasonable, though I don't have a use case for it myself. Maybe I'm missing something

Re: [PR] lucene-monitor: replace wildcard imports [lucene]

2024-05-24 Thread via GitHub
cpoerschke merged PR #13413: URL: https://github.com/apache/lucene/pull/13413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] lucene-monitor: remove now-unused Scorable in QueryIndex.DataValues [lucene]

2024-05-24 Thread via GitHub
cpoerschke merged PR #13412: URL: https://github.com/apache/lucene/pull/13412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] lucene-monitor: replace wildcard imports [lucene]

2024-05-24 Thread via GitHub
cpoerschke commented on PR #13413: URL: https://github.com/apache/lucene/pull/13413#issuecomment-2129962442 > ... Does this mean that we can enable some code formatting on this module? Good question, I don't know. I was surprised to find there wasn't any logic already disallowing wild

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
ChrisHegarty commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2129931655 The dynamic addition of formats from other class loaders seems reasonable, though I don't have a use case for it myself. Maybe I'm missing something but before adding a new AP

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
navneet1v commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2129870019 That correct if the classes are in the main application it will work. but in my case during runtime I need to load some other jars(consider them as plugin) which has the KNN

[PR] Allow users to retrieve counts from taxo association facets [lucene]

2024-05-24 Thread via GitHub
stefanvodita opened a new pull request, #13414: URL: https://github.com/apache/lucene/pull/13414 Taxonomy facets always have counts since #12966. We add a `count` field to `LabelAndValue` so that users can retrieve those counts. What I like about this approach: 1. It's backwar

Re: [PR] lucene-monitor: remove now-unused Scorable in QueryIndex.DataValues [lucene]

2024-05-24 Thread via GitHub
jpountz commented on PR #13412: URL: https://github.com/apache/lucene/pull/13412#issuecomment-2129674240 The monitor doesn't seem to care about scores, so this makes sense to me but I'm curious of @romseygeek 's opinion? -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Remove unchecked Scorable -> Scorer cast in lucene/monitor. [lucene]

2024-05-24 Thread via GitHub
cpoerschke commented on code in PR #13405: URL: https://github.com/apache/lucene/pull/13405#discussion_r1613574675 ## lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java: ## @@ -377,9 +375,7 @@ public ScoreMode scoreMode() { @Override public void matchQuery(

[PR] lucene-monitor: remove now-unused Scorable in QueryIndex.DataValues [lucene]

2024-05-24 Thread via GitHub
cpoerschke opened a new pull request, #13412: URL: https://github.com/apache/lucene/pull/13412 Became unused via #13405 I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-24 Thread via GitHub
benwtrent commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2129653735 I still don't understand. If the SPIs are there, it should work. If the SPI extension didn't work, none of the current formats would work either as they all rely on it.

Re: [I] Make Weight#scorerSupplier abstract, Weight#scorer final [lucene]

2024-05-24 Thread via GitHub
jpountz closed issue #13180: Make Weight#scorerSupplier abstract, Weight#scorer final URL: https://github.com/apache/lucene/issues/13180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Make Weight#scorerSupplier abstract, Weight#scorer final [lucene]

2024-05-24 Thread via GitHub
jpountz commented on issue #13180: URL: https://github.com/apache/lucene/issues/13180#issuecomment-2129537995 Addressed via #13319. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[PR] Add prefetching for doc values and norms. [lucene]

2024-05-24 Thread via GitHub
jpountz opened a new pull request, #13411: URL: https://github.com/apache/lucene/pull/13411 This follows a similar approach as postings and only prefetches the first page of data. I verified that it works well for collectors such as `TopFieldCollector`, as `IndexSearcher` first pulls

[I] Remove Scorer#getWeight. [lucene]

2024-05-24 Thread via GitHub
jpountz opened a new issue, #13410: URL: https://github.com/apache/lucene/issues/13410 ### Description I've been working on some refactorings recently, and the fact that `Scorer` has a `getWeight` method is very annoying as it requires every simple `Scorer` implementation, e.g. for t

[PR] Criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-24 Thread via GitHub
RS146BIJAY opened a new pull request, #13409: URL: https://github.com/apache/lucene/pull/13409 ### Description Adding support for DWPT selection mechanism based on a specific criteria within the DocumentWriter. Users can define this criteria through a grouping function as a new [Ind

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-24 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1613219485 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2024-05-24 Thread via GitHub
vsop-479 commented on PR #13398: URL: https://github.com/apache/lucene/pull/13398#issuecomment-2129082892 @mikemccand Do you think we can make `FrozenBufferedUpdates.applyQueryDeletes` and `FrozenBufferedUpdates.applyTermDeletes` 's segment task parallel executed? -- This is an automa

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-24 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1613174241 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-05-24 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1613171816 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -434,8 +436,29 @@ public boolean seekExact(BytesRef target) throws I

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-05-24 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1613164257 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -434,8 +436,29 @@ public boolean seekExact(BytesRef target) throws I

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-05-24 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1613134899 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -434,8 +436,29 @@ public boolean seekExact(BytesRef target) throws I

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-05-24 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1613089056 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -434,8 +436,29 @@ public boolean seekExact(BytesRef target) throws I

Re: [PR] Move bulkScorer() from Weight to ScorerSupplier [lucene]

2024-05-24 Thread via GitHub
jpountz commented on PR #13408: URL: https://github.com/apache/lucene/pull/13408#issuecomment-2128855108 Note: `Boolean2ScorerSupplier` was renamed to `BooleanScorerSupplier` since it now handles both `BS1` and `BS2`, not only `BS2`. -- This is an automated message from the Apache Git Ser

[PR] Move bulkScorer() from Weight to ScorerSupplier [lucene]

2024-05-24 Thread via GitHub
jpountz opened a new pull request, #13408: URL: https://github.com/apache/lucene/pull/13408 This relates to #13359: we want to take advantage of the `Weight#scorerSupplier` call to start scheduling some I/O in the background in parallel across clauses. For this to work properly with top-lev

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1613034634 ## lucene/test-framework/src/java/module-info.java: ## @@ -19,6 +19,7 @@ @SuppressWarnings({"module", "requires-automatic", "requires-transitive-automatic"}) modu

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1613033562 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseFieldInfoFormatTestCase.java: ## @@ -328,6 +332,17 @@ private int getVectorsMaxDimensions(String

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1613033827 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90FieldInfosFormat.java: ## @@ -257,11 +269,12 @@ private static DocValuesType ge

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2128847145 > I did kind of change before, and the added complexity and backwards compatibility concerns just didn't seem warranted. This is why the decision to do the scorer pluggability was adde

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2128840022 > I have some comments, but this is not a final review. Just things that I stumbled upon on first walkthrough. > > I will have no time to do a closer review soon, so please give

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1613025309 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90FieldInfosFormat.java: ## @@ -257,11 +269,12 @@ private static DocValuesType ge

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-05-24 Thread via GitHub
Pulkitg64 commented on code in PR #13401: URL: https://github.com/apache/lucene/pull/13401#discussion_r1613024700 ## lucene/core/src/java/org/apache/lucene/index/VectorSimilarityFunction.java: ## @@ -16,104 +16,73 @@ */ package org.apache.lucene.index; -import static org.ap

Re: [PR] Remove unchecked Scorable -> Scorer cast in lucene/monitor. [lucene]

2024-05-24 Thread via GitHub
jpountz merged PR #13405: URL: https://github.com/apache/lucene/pull/13405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Improve Lucene's I/O concurrency [lucene]

2024-05-24 Thread via GitHub
jpountz commented on issue #13179: URL: https://github.com/apache/lucene/issues/13179#issuecomment-2128783334 Thanks for looking at this! > This can be changed to first collect all the matching documents and then perform prefetch of the blocks for matched documents followed by actual