[GitHub] [lucene] vsop-479 commented on pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-25 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1733041087 @iverase Does it make sense to you if MatchState defined in other class, such as BKDReader or IntersectVisitor, and only leave the sorted dimension in IntersectVisitor' visit method as

[GitHub] [lucene] sylph-eu commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-09-25 Thread via GitHub
sylph-eu commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1733118806 Last comment is already a couple of months old, so please let me clarify the status of this initiative. If there's a chance it's going to be merged? If there's any blocker or actio

[GitHub] [lucene] gf2121 opened a new pull request, #12586: Remove over-counting of deleted terms

2023-09-25 Thread via GitHub
gf2121 opened a new pull request, #12586: URL: https://github.com/apache/lucene/pull/12586 `BufferedUpdates` used to count deleted terms without deduplication to respect `IndexWriterConfig.setMaxBufferedDeleteTerms`. As `IndexWriterConfig.setMaxBufferedDeleteTerms` is removed since [LUCENE

[GitHub] [lucene] gf2121 commented on a diff in pull request #12586: Remove over-counting of deleted terms

2023-09-25 Thread via GitHub
gf2121 commented on code in PR #12586: URL: https://github.com/apache/lucene/pull/12586#discussion_r1335560326 ## lucene/core/src/java/org/apache/lucene/index/BufferedUpdates.java: ## @@ -284,6 +276,13 @@ void forEachOrdered(DeletedTermConsumer consumer) throw public long

[GitHub] [lucene] uschindler commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-09-25 Thread via GitHub
uschindler commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1733223795 Hi, actually this issue is already resolved, although the DEFAULT did not change (and won't change due to performance risks), see here: https://github.com/apache/lucene/pull/1

[GitHub] [lucene] uschindler commented on issue #11507: Increase the number of dims for KNN vectors to 2048 [LUCENE-10471]

2023-09-25 Thread via GitHub
uschindler commented on issue #11507: URL: https://github.com/apache/lucene/issues/11507#issuecomment-1733230392 @mayya-sharipova: Should we close this issue or are there any plans to also change the default maximum? I don't think so. -- This is an automated message from the Apache Git Se

[GitHub] [lucene] iverase merged pull request #12581: Allow reading / writing binary stored fields as DataInput

2023-09-25 Thread via GitHub
iverase merged PR #12581: URL: https://github.com/apache/lucene/pull/12581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] iverase closed issue #12556: Allow reading binary stored values as DataInput

2023-09-25 Thread via GitHub
iverase closed issue #12556: Allow reading binary stored values as DataInput URL: https://github.com/apache/lucene/issues/12556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] iverase commented on pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-25 Thread via GitHub
iverase commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1733441807 My recommendation is to use the following method as you are just trying to flag if the visit method needs to keep processing points. ``` /** Similar to {@link IntersectVisitor#vi

[GitHub] [lucene] gf2121 opened a new pull request, #12587: Use radix sort to speed up the sorting of terms in TermInSetQuery

2023-09-25 Thread via GitHub
gf2121 opened a new pull request, #12587: URL: https://github.com/apache/lucene/pull/12587 ### Description Sort terms in TermInSetQuery with radix sort. This helps TermInSetQueries with a number of terms. ### Benchmark I made a simple benchmark on sorting `BytesRef[]` wi

[GitHub] [lucene] jpountz commented on pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-25 Thread via GitHub
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1733490475 FWIW I ran the benchmark from https://tantivy-search.github.io/bench/ and also observed a speedup on conjunctions, so I think that the speedup is indeed real. -- This is an automated

[GitHub] [lucene] jpountz merged pull request #12382: Run top-level conjunctions of term queries with a specialized BulkScorer.

2023-09-25 Thread via GitHub
jpountz merged PR #12382: URL: https://github.com/apache/lucene/pull/12382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna opened a new pull request, #12588: Shared executor for LuceneTestCase#newSearcher callers

2023-09-25 Thread via GitHub
javanna opened a new pull request, #12588: URL: https://github.com/apache/lucene/pull/12588 Until now, LuceneTestCase#newSearcher randomly associates the returned IndexSearcher instance with an executor that is ad-hoc created, which gets shut down when the index reader is closed. Thi

[GitHub] [lucene] rmuir commented on pull request #12588: Shared executor for LuceneTestCase#newSearcher callers

2023-09-25 Thread via GitHub
rmuir commented on PR #12588: URL: https://github.com/apache/lucene/pull/12588#issuecomment-1733714024 this is nice, much cleaner. I think i added the original cache-helper-hack. Just one question: can we reduce the number of threads used? I have 2 cores. shouldn't 2 be enough to occasional

[GitHub] [lucene] javanna commented on pull request #12588: Shared executor for LuceneTestCase#newSearcher callers

2023-09-25 Thread via GitHub
javanna commented on PR #12588: URL: https://github.com/apache/lucene/pull/12588#issuecomment-1733731499 Thanks @rmuir for looking, I lowered the number of threads -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [lucene] jpountz opened a new pull request, #12589: Sometimes intersect the essential clause and the best non-essential clause.

2023-09-25 Thread via GitHub
jpountz opened a new pull request, #12589: URL: https://github.com/apache/lucene/pull/12589 The idea behind MAXSCORE is to run disjunctions as `+(essentialClause1 ... essentialClauseM) nonEssentialClause1 ... nonEssentialClauseN`, moving more and more clauses from the essential list to the

[GitHub] [lucene] jpountz commented on pull request #12589: Sometimes intersect the essential clause and the best non-essential clause.

2023-09-25 Thread via GitHub
jpountz commented on PR #12589: URL: https://github.com/apache/lucene/pull/12589#issuecomment-1733875246 Opening as a draft as I still need to figure out how to test this optimization. I tested on wikibigall where this yielded a good speedup. I would expect an even better speedup if

[GitHub] [lucene] javanna merged pull request #12588: Shared executor for LuceneTestCase#newSearcher callers

2023-09-25 Thread via GitHub
javanna merged PR #12588: URL: https://github.com/apache/lucene/pull/12588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] kaivalnp opened a new pull request, #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-25 Thread via GitHub
kaivalnp opened a new pull request, #12590: URL: https://github.com/apache/lucene/pull/12590 ### Context Vector search is performed in [`AbstractKnnVectorQuery`](https://github.com/kaivalnp/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java), w

[GitHub] [lucene] benwtrent commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-25 Thread via GitHub
benwtrent commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1734167969 The new KnnCollector abstraction doesn't already address these needs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-25 Thread via GitHub
kaivalnp commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1734230703 Thanks for the quick response @benwtrent! As far as I understand (please let me know if I'm missing something), the new [`KnnCollector`](https://github.com/apache/lucene/blob/mai

[GitHub] [lucene] benwtrent commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-25 Thread via GitHub
benwtrent commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1734248507 @kaivalnp I guess you could have a collector that spans all the segments (created once at the query level). I am not really against this change, I am just wondering if there is a

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-25 Thread via GitHub
kaivalnp commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1734321355 > you could have a collector that spans all the segments (created once at the query level). Interesting, so a single `KnnCollector` passed to all [`searchNearestVectors`](https:/

[GitHub] [lucene] benwtrent commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-25 Thread via GitHub
benwtrent commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1734371468 @kaivalnp depends on what you need to do. You can easily get around all this without any expensive locking. The collector has a "topDocs" method that could call some higher

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-25 Thread via GitHub
kaivalnp commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1734432239 > You can easily get around all this without any expensive locking. > The collector has a "topDocs" method that could call some higher level collector. Nice idea! So basically

[GitHub] [lucene] vsop-479 commented on pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-25 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1734701754 @iverase Thanks for your recommendation, it may makes code more clear. I will try to implement it, and run the performance test. -- This is an automated message from the Apache Git