[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub
sgup432 commented on issue #12597: URL: https://github.com/apache/lucene/issues/12597#issuecomment-1738594461 @jpountz What do you think on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [lucene] gf2121 commented on issue #12598: FST#Compiler allocates too much memory

2023-09-28 Thread via GitHub
gf2121 commented on issue #12598: URL: https://github.com/apache/lucene/issues/12598#issuecomment-1738687168 I did an experiment: Index random BytesRefs and count the byte usage when `BytesStore#finish` called. The following are the statistical results: ``` total sample: 23130

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-28 Thread via GitHub
kaivalnp commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1738933241 > My main concern is that we are adding yet another extension point for "partial control" when we already have that with the rewrite or something even more complex with the collector

[GitHub] [lucene] javanna opened a new pull request, #12603: Simplify TaskExecutor API

2023-09-28 Thread via GitHub
javanna opened a new pull request, #12603: URL: https://github.com/apache/lucene/pull/12603 We recently made TaskExecutor public. It currently exposes two methods: one to create tasks given a collection of callables, and one to execute all tasks created at step 1. We can rather expose a sin

[GitHub] [lucene] gf2121 commented on issue #12598: FST#Compiler allocates too much memory

2023-09-28 Thread via GitHub
gf2121 commented on issue #12598: URL: https://github.com/apache/lucene/issues/12598#issuecomment-1739489111 I get similar statistics for wikimediumall and here are the results when `BytesStore#finish` called 1,000,000 times. ``` BytesStore#finish called: 100 times min:

[GitHub] [lucene] gf2121 opened a new pull request, #12604: Reduce FST block size for BlockTreeTermsWriter

2023-09-28 Thread via GitHub
gf2121 opened a new pull request, #12604: URL: https://github.com/apache/lucene/pull/12604 ### Description https://blunders.io/jfr-demo/indexing-4kb-2023.09.25.18.03.36/allocations-drill-down Nightly benchmark shows that `FSTCompiler#init` allocated most of the memory during i

[GitHub] [lucene] cpoerschke commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

2023-09-28 Thread via GitHub
cpoerschke commented on code in PR #12380: URL: https://github.com/apache/lucene/pull/12380#discussion_r1340380526 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/SuggestIndexSearcher.java: ## @@ -67,14 +68,16 @@ public void suggest(CompletionQuery query, T

[GitHub] [lucene] cpoerschke opened a new pull request, #12605: IndexingChain.validateMaxVectorDimension: add missing space wording

2023-09-28 Thread via GitHub
cpoerschke opened a new pull request, #12605: URL: https://github.com/apache/lucene/pull/12605 (Keeping as draft for now, to see if similar change might be needed nearby too.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [lucene] benwtrent commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-28 Thread via GitHub
benwtrent commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1739838321 Ok, if we really want the ability to get the final topK for a Lucene index, I think a new method should be added (merge results or something). That seems like a better extension than o

[GitHub] [lucene] jpountz commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

2023-09-28 Thread via GitHub
jpountz commented on code in PR #12380: URL: https://github.com/apache/lucene/pull/12380#discussion_r1340582160 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/SuggestIndexSearcher.java: ## @@ -67,14 +68,16 @@ public void suggest(CompletionQuery query, TopS

[GitHub] [lucene] javanna opened a new pull request, #12606: Create a task executor when executor is not provided

2023-09-28 Thread via GitHub
javanna opened a new pull request, #12606: URL: https://github.com/apache/lucene/pull/12606 As we introduce more places where we add concurrency (there are currently three) there is a common pattern around checking whether there is an executor provided, and then going sequential on the call

[GitHub] [lucene] javanna commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-28 Thread via GitHub
javanna commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1340606929 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** - * Re

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub
jpountz commented on issue #12597: URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739925151 Can you explain a bit more why you need to store your cache keys off-heap? Presumably this isn't because of memory usage since these cache keys are tiny. -- This is an automated m

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub
sgup432 commented on issue #12597: URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739942303 Yeah this isn't because of memory consumption as such. But more towards providing capability to offload cache data to off-heap/disk so that one can maintain a larger cache if needed

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub
jpountz commented on issue #12597: URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739956888 Would it work to only store the cache values on disk and keep cache keys in memory? These cache keys should be a small fraction of the memory that open `IndexReader`s take anyway?

[GitHub] [lucene] javanna opened a new pull request, #12607: Add missing create github release step to release wizard

2023-09-28 Thread via GitHub
javanna opened a new pull request, #12607: URL: https://github.com/apache/lucene/pull/12607 The "create github release" step was missing from the release wizard. We have forgotten about it a few times recently. While at it, I also expanded the instructions around closing the current

[GitHub] [lucene] quux00 commented on a diff in pull request #12523: TaskExecutor waits for all tasks to complete before returning

2023-09-28 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1340643119 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return slic

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-28 Thread via GitHub
kaivalnp commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1739975925 Thanks, makes sense to me! Added an explicit method to merge per-segment results -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub
sgup432 commented on issue #12597: URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739977503 We can, that's more of a implementation choice and should be kept open. Apart from that, OpenSearch also uses RequestCache which uses a composite key and CacheKey being one of them.

[GitHub] [lucene] gsmiller opened a new issue, #12608: Should DrillSidewaysQuery BulkScorer leverage scoreSuppliers of the base query and dimensions?

2023-09-28 Thread via GitHub
gsmiller opened a new issue, #12608: URL: https://github.com/apache/lucene/issues/12608 ### Description When DrillSidewaysQuery creates a BulkScorer, it directly calls `#scorer` on the base query and each drill-down dim. This means the drill-down dims are not able to optimize based o

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub
jpountz commented on issue #12597: URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739991979 Got it. I'm a bit torn on this change, on the one hand it would be harmless as you pointed out, on the other hand I could see it being a bit of a rabbit hole with future feat