Re: [I] Exploring GPU based kNN vector search [lucene]

2024-03-18 Thread via GitHub
chatman commented on issue #13003: URL: https://github.com/apache/lucene/issues/13003#issuecomment-2005698009 As an initial proof of concept integration to evaluate performance, we put together a repository. https://github.com/SearchScale/lucene-cuvs The benchmarks are against single

Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

2024-03-18 Thread via GitHub
github-actions[bot] commented on PR #13143: URL: https://github.com/apache/lucene/pull/13143#issuecomment-2005421733 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] TestTaxonomyFacetValueSource.testRandom fails [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on issue #13191: URL: https://github.com/apache/lucene/issues/13191#issuecomment-2004940562 git-bisect says its this commit: b5795db0cf517f8942eed868752249df9b105603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] TestTaxonomyFacetValueSource.testRandom fails [lucene]

2024-03-18 Thread via GitHub
benwtrent opened a new issue, #13191: URL: https://github.com/apache/lucene/issues/13191 ### Description ``` org.apache.lucene.facet.taxonomy.TestTaxonomyFacetValueSource > testRandom FAILED java.lang.AssertionError: expected:<10> but was:<9> at __randomizedtesti

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
shubhamvishu commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2004568222 @benwtrent So should we instead wait for the pluggability support and discard this for now? or Is it possible to go forward with this? > What makes this PR doubly worrying is

Re: [PR] gh-13147: use dense bit-encoding for frequent terms [lucene]

2024-03-18 Thread via GitHub
msokolov commented on PR #13153: URL: https://github.com/apache/lucene/pull/13153#issuecomment-2004334853 after disabling this for fields with positions, luceneutil perf looks pretty flat. I think it simply doesn't have any test cases that would exercise this. I wrote a small benchmark tha

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-18 Thread via GitHub
antonha commented on code in PR #13149: URL: https://github.com/apache/lucene/pull/13149#discussion_r1528679700 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -185,6 +186,13 @@ public void visit(DocIdSetIterator iterator) throws IOException {

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2004052748 > Though I'm not sure if this change conflicts with or makes things difficult for the ongoing efforts to have pluggability (maybe @benwtrent would be interested in sharing his thoughts

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on code in PR #13190: URL: https://github.com/apache/lucene/pull/13190#discussion_r1528642636 ## lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java: ## @@ -281,11 +297,11 @@ public IndexOutput createOutput(String name, IOContext conte

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on PR #13190: URL: https://github.com/apache/lucene/pull/13190#issuecomment-2004024374 @dweiss @mikemccand I am currently iterating on how to best make `RateLimitedIndexOutput` `MergePolicy` and `MergeRateLimiter` thread safe. Right now, it is all assumed that the

[PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent opened a new pull request, #13190: URL: https://github.com/apache/lucene/pull/13190 This commit adds a new interface to all MergeScheduler classes that allows the scheduler to provide an Executor for intra-merge parallelism. The first sub-class to satisfy this new interface is the

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
shubhamvishu commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2003991803 Thanks for the review @msokolov! The idea to make it pluggable seems relevant and interesting. Currently it was not possible to use any custom vector similarity function other than

Re: [PR] Fix TestLucene90FieldInfosFormat.testRandom [lucene]

2024-03-18 Thread via GitHub
shubhamvishu commented on code in PR #13135: URL: https://github.com/apache/lucene/pull/13135#discussion_r1528555456 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseFieldInfoFormatTestCase.java: ## @@ -278,46 +278,50 @@ public void testRandom() throws Excepti

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-03-18 Thread via GitHub
benwtrent merged PR #12915: URL: https://github.com/apache/lucene/pull/12915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-03-18 Thread via GitHub
msokolov commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2003799323 There is some discussion how to make similarities more pluggable https://github.com/apache/lucene/issues/13182 that seems relevant. Part of the idea there is to accept ordinal values ra

Re: [PR] Revert "Add new parallel merge task executor for parallel actions within a single merge action" [lucene]

2024-03-18 Thread via GitHub
benwtrent merged PR #13189: URL: https://github.com/apache/lucene/pull/13189 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on PR #13124: URL: https://github.com/apache/lucene/pull/13124#issuecomment-2003789175 I am going to revert the change and open a new PR for iterating a fix. `RateLimitedIndexOutput` isn't threadsafe and our rate limiting assumes a single thread. With this commit

Re: [PR] Fix TestLucene90FieldInfosFormat.testRandom [lucene]

2024-03-18 Thread via GitHub
msokolov commented on code in PR #13135: URL: https://github.com/apache/lucene/pull/13135#discussion_r1528450684 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseFieldInfoFormatTestCase.java: ## @@ -278,46 +278,50 @@ public void testRandom() throws Exception {

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-03-18 Thread via GitHub
daixque commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1528440247 ## lucene/CHANGES.txt: ## @@ -174,12 +174,14 @@ API Changes New Features - - * GITHUB#12679: Add support for similarity-based vector searches

[PR] Revert "Add new parallel merge task executor for parallel actions within a single merge action" [lucene]

2024-03-18 Thread via GitHub
benwtrent opened a new pull request, #13189: URL: https://github.com/apache/lucene/pull/13189 Reverts apache/lucene#13124 The reason for this revert is `RateLimitedIndexOutput` `RateLimitedIndexOutput` assumes a single thread and is not multi-threaded safe. Will revert the mult

Re: [PR] Add new token filters for Japanese sutegana (捨て仮名) [lucene]

2024-03-18 Thread via GitHub
benwtrent commented on code in PR #12915: URL: https://github.com/apache/lucene/pull/12915#discussion_r1528339145 ## lucene/CHANGES.txt: ## @@ -174,12 +174,14 @@ API Changes New Features - - * GITHUB#12679: Add support for similarity-based vector searche