Re: [PR] Removing unnecessary ByteArrayDataInput allocations by resetting inplace [lucene]

2025-01-07 Thread via GitHub
iverase commented on code in PR #14113: URL: https://github.com/apache/lucene/pull/14113#discussion_r1906591388 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -1324,8 +1324,12 @@ private void decompressBlock() throws IOException {

Re: [I] Unnecessary ByteArrayDataInput allocations during aggregation query [lucene]

2025-01-07 Thread via GitHub
jainankitk commented on issue #14112: URL: https://github.com/apache/lucene/issues/14112#issuecomment-2576880175 These allocations disappeared after running the experiment again with the changes proposed in the PR. -- This is an automated message from the Apache Git Service. To respond to

[PR] Removing unnecessary ByteArrayDataInput allocations by resetting inplace [lucene]

2025-01-07 Thread via GitHub
jainankitk opened a new pull request, #14113: URL: https://github.com/apache/lucene/pull/14113 ### Description Resolves #14112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[I] Unnecessary ByteArrayDataInput allocations during aggregation query [lucene]

2025-01-07 Thread via GitHub
jainankitk opened a new issue, #14112: URL: https://github.com/apache/lucene/issues/14112 ### Description While doing some experiments with big5 workload on Opensearch, I noticed significant ByteArrayDataInput allocations during aggregation query execution. I noticed the allocations

Re: [PR] Cuvs integration main [lucene]

2025-01-07 Thread via GitHub
noblepaul closed pull request #14111: Cuvs integration main URL: https://github.com/apache/lucene/pull/14111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] Cuvs integration main [lucene]

2025-01-07 Thread via GitHub
noblepaul opened a new pull request, #14111: URL: https://github.com/apache/lucene/pull/14111 ### Description DO NOT merge , created to make review easier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] javadocs: fix invalid refs in `queryparsers` #14086 [lucene]

2025-01-07 Thread via GitHub
github-actions[bot] commented on PR #14087: URL: https://github.com/apache/lucene/pull/14087#issuecomment-2576470232 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] TestBpVectorReorderer.testQuantizedIndex failing [lucene]

2025-01-07 Thread via GitHub
msokolov commented on issue #14110: URL: https://github.com/apache/lucene/issues/14110#issuecomment-2576361615 Thanks, this reproduces for me ... I'll take a look tomorrow if nobody else does -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments. [lucene]

2025-01-07 Thread via GitHub
jpountz commented on PR #266: URL: https://github.com/apache/lucene/pull/266#issuecomment-2576334717 I had missed it, but this change yielded a good speedup on the stored fields benchmark: https://benchmarks.mikemccandless.com/stored_fields_benchmarks.html -- This is an automated message

Re: [I] TestDefaultCodecParallelizesIO.testTermsSeekExact fails [lucene]

2025-01-07 Thread via GitHub
jpountz commented on issue #14108: URL: https://github.com/apache/lucene/issues/14108#issuecomment-2576326816 The correct fix would probably be to improve the terms index to record the length of blocks (there was a related discussion about whether we already have this info at https://github

Re: [I] TestDefaultCodecParallelizesIO.testTermsSeekExact fails [lucene]

2025-01-07 Thread via GitHub
jpountz commented on issue #14108: URL: https://github.com/apache/lucene/issues/14108#issuecomment-2576298601 I think I understand the failure. Since the terms dictionary doesn't know about the length of its blocks, it always prefetches a length of 1. But if you are unlucky and your terms d

[I] TestBpVectorReorderer.testQuantizedIndex failing [lucene]

2025-01-07 Thread via GitHub
benwtrent opened a new issue, #14110: URL: https://github.com/apache/lucene/issues/14110 ### Description brand new test, I am guessing some weird edge case. Just saw it fail a periodic build. Trace: ``` TestBpVectorReorderer > testQuantizedIndex FAILED --   | ja

Re: [PR] Add two new "Seeded" Knn queries for seeded vector search [lucene]

2025-01-07 Thread via GitHub
benwtrent commented on PR #14084: URL: https://github.com/apache/lucene/pull/14084#issuecomment-2576131880 @seanmacavaney @cpoerschke I am gonna merge this in the next couple of days. I flagged the queries and such as experimental if we want to change the interface. But I think it reached a

Re: [PR] Updated releaseWizard.py to use timezone-aware objects to represent datetimes in UTC [lucene]

2025-01-07 Thread via GitHub
shubhamsrkdev commented on PR #14102: URL: https://github.com/apache/lucene/pull/14102#issuecomment-2575973127 @ChrisHegarty / @rmuir fyi please check this whenever convenient. TIA! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[PR] prefetch may select the wrong memory segment for multi-segment slices [lucene]

2025-01-07 Thread via GitHub
ChrisHegarty opened a new pull request, #14109: URL: https://github.com/apache/lucene/pull/14109 This commit fixes a bug where by prefetch may select the wrong memory segment for multi-segment slices. The issue was discovered when debugging a large test scenario, where the index inpu

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-01-07 Thread via GitHub
tteofili commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r1905742849 ## lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java: ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-01-07 Thread via GitHub
tteofili commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r1905741487 ## lucene/core/src/java/org/apache/lucene/search/HnswQueueSaturationCollector.java: ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
jpountz commented on PR #14097: URL: https://github.com/apache/lucene/pull/14097#issuecomment-2575419730 > I don't think any of that should block merging Agreed, especially for the misc module. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
msokolov merged PR #14097: URL: https://github.com/apache/lucene/pull/14097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Optimize DFS while marking connected components (#14022) [lucene]

2025-01-07 Thread via GitHub
msokolov commented on PR #14105: URL: https://github.com/apache/lucene/pull/14105#issuecomment-2575464458 No problem! Thanks for the follow up here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Optimize DFS while marking connected components (#14022) [lucene]

2025-01-07 Thread via GitHub
msokolov merged PR #14105: URL: https://github.com/apache/lucene/pull/14105 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
benwtrent commented on PR #14097: URL: https://github.com/apache/lucene/pull/14097#issuecomment-2575447034 @msokolov I agree, we shouldn't block merging as it doesn't adjust default behavior. I was just surprised at the numbers and curious to the cause. -- This is an automated mes

Re: [I] TestDefaultCodecParallelizesIO.testTermsSeekExact fails [lucene]

2025-01-07 Thread via GitHub
benwtrent commented on issue #14108: URL: https://github.com/apache/lucene/issues/14108#issuecomment-2575432922 git bisect says: b6512a46803f2bf5126d46b5d82adaa844b3552c Which I suppose makes sense as that was the last fix attempted for this test case :) -- This is an automated mes

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
jpountz commented on code in PR #14097: URL: https://github.com/apache/lucene/pull/14097#discussion_r1905528784 ## lucene/misc/src/java/org/apache/lucene/misc/index/BpVectorReorderer.java: ## @@ -0,0 +1,788 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
jpountz commented on code in PR #14097: URL: https://github.com/apache/lucene/pull/14097#discussion_r1905528113 ## lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java: ## @@ -63,41 +62,38 @@ public IncrementalHnswGraphMerger( /** * Adds a re

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
msokolov commented on code in PR #14097: URL: https://github.com/apache/lucene/pull/14097#discussion_r1905511599 ## lucene/misc/src/java/org/apache/lucene/misc/index/BpVectorReorderer.java: ## @@ -0,0 +1,788 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
msokolov commented on code in PR #14097: URL: https://github.com/apache/lucene/pull/14097#discussion_r1905512009 ## lucene/misc/src/java/org/apache/lucene/misc/index/BpVectorReorderer.java: ## @@ -0,0 +1,788 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[I] TestDefaultCodecParallelizesIO.testTermsSeekExact fails [lucene]

2025-01-07 Thread via GitHub
benwtrent opened a new issue, #14108: URL: https://github.com/apache/lucene/issues/14108 ### Description On 10x, this seed fails repeatably with trace: ``` TestDefaultCodecParallelizesIO > testTermsSeekExact FAILED java.lang.AssertionError at __randomizedte

Re: [I] TestDefaultCodecParallelizesIO.testTermsSeekExact fails [lucene]

2025-01-07 Thread via GitHub
benwtrent commented on issue #14108: URL: https://github.com/apache/lucene/issues/14108#issuecomment-2575369273 Running this test 100k+ times on main and it never failed. So, I tried many thousands of other seeds on 10x and it never failed. Seems like an exceptionally rare failu

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
msokolov commented on code in PR #14097: URL: https://github.com/apache/lucene/pull/14097#discussion_r1905495877 ## lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java: ## @@ -63,41 +62,38 @@ public IncrementalHnswGraphMerger( /** * Adds a r

Re: [PR] HNSW BP reordering [lucene]

2025-01-07 Thread via GitHub
msokolov commented on PR #14097: URL: https://github.com/apache/lucene/pull/14097#issuecomment-2575347701 Thanks for the reviews. I agree the measurements are not well explained. I have other runs that show no or less impact on search times, unchanged index sizes, and sometimes negative imp

Re: [I] Add an S3-based directory. [lucene]

2025-01-07 Thread via GitHub
Bukhtawar commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2574978587 > @albogdano I'm curious if you have any interest in contributing your https://github.com/albogdano/lucene-s3directory? > > @shubhamvishu @atris Thanks for volunteering to h

Re: [PR] Optimize DirectIOIndexInput [lucene]

2025-01-07 Thread via GitHub
ChrisHegarty commented on PR #14103: URL: https://github.com/apache/lucene/pull/14103#issuecomment-2574818136 To help move this forward, I'm going to separate out the changes into several smaller more targeted PRs, tracked by #14106 -- This is an automated message from the Apache Git Serv

[PR] DirectIOIndexInput - add overloads for primitive access [lucene]

2025-01-07 Thread via GitHub
ChrisHegarty opened a new pull request, #14107: URL: https://github.com/apache/lucene/pull/14107 This commit adds overloads for primitive access to DirectIOIndexInput. Existing tests in TestDirectIOIndexInput already provide sufficient coverage for the changes in this PR. -- This i