[I] Build and test for Linux/ARM64 [lucene]

2024-11-20 Thread via GitHub
odidev opened a new issue, #14006: URL: https://github.com/apache/lucene/issues/14006 ### Description ## Description For aarch64 platform, I have built " Lucene " from source. ## Steps Following [STEPS](https://github.com/apache/lucene?tab=readme-ov-file#bu

Re: [PR] Update ComplexPhraseQueryParser.java [lucene]

2024-11-20 Thread via GitHub
mkhludnev commented on PR #14005: URL: https://github.com/apache/lucene/pull/14005#issuecomment-2490162870 Hi @paulk-asert! Thanks for your contribution. Please create backport commit targeting https://github.com/apache/lucene/tree/branch_10x -- This is an automated message fro

Re: [PR] Update ComplexPhraseQueryParser.java [lucene]

2024-11-20 Thread via GitHub
mkhludnev merged PR #14005: URL: https://github.com/apache/lucene/pull/14005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-11-20 Thread via GitHub
vigyasharma commented on code in PR #13951: URL: https://github.com/apache/lucene/pull/13951#discussion_r1851338572 ## lucene/core/src/java/org/apache/lucene/index/IndexWriterRAMManager.java: ## @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[PR] Update ComplexPhraseQueryParser.java [lucene]

2024-11-20 Thread via GitHub
paulk-asert opened a new pull request, #14005: URL: https://github.com/apache/lucene/pull/14005 Fix typo ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] How to configure TieredMergePolicy for very low segment count? [lucene]

2024-11-20 Thread via GitHub
jpountz commented on issue #14004: URL: https://github.com/apache/lucene/issues/14004#issuecomment-2489618305 I confirmed that #266 seems to help for this case. It does find merges to run with the above example (100kB, 300kB, 800kB, 2MB, 5MB, 12MB, 30MB, 70MB, 150MB, 400MB). And when I run

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-11-20 Thread via GitHub
krickert commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2489546204 > Chunk-Based Highlighting – Interesting. With getAllVectorValues(), we can find all vector values with similarity above a separate sim-threshold for highlights? Not sure. But i

Re: [PR] Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors [lucene]

2024-11-20 Thread via GitHub
shatejas commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2489459115 > The `org.apache.lucene.index.TestConcurrentMergeScheduler.testNoWaitClose` test hits a new assert that I added - sorry. I need to look to see if it is a test issue or more of a design

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-11-20 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2489342443 Thank you for sharing these use-cases @krickert ! 1. **Aggregate Scoring** – I think we can do this today by joining the child doc hits with their parents and calculating score

Re: [PR] LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments. [lucene]

2024-11-20 Thread via GitHub
jpountz commented on code in PR #266: URL: https://github.com/apache/lucene/pull/266#discussion_r1850802831 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -495,36 +495,36 @@ private MergeSpecification doFindMerges( for (int startIdx = 0; sta

Re: [PR] LUCENE-10073: Reduce merging overhead of NRT by using a greater mergeFactor on tiny segments. [lucene]

2024-11-20 Thread via GitHub
jpountz commented on code in PR #266: URL: https://github.com/apache/lucene/pull/266#discussion_r1850802366 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -495,36 +495,36 @@ private MergeSpecification doFindMerges( for (int startIdx = 0; sta

Re: [I] How to configure TieredMergePolicy for very low segment count? [lucene]

2024-11-20 Thread via GitHub
jpountz commented on issue #14004: URL: https://github.com/apache/lucene/issues/14004#issuecomment-2489139597 Interestingly, it looks like this PR https://github.com/apache/lucene/pull/266 would do what I'm looking for, though it aimed at solving a different problem (!) -- This is an aut

[I] How to configure TieredMergePolicy for very low segment count? [lucene]

2024-11-20 Thread via GitHub
jpountz opened a new issue, #14004: URL: https://github.com/apache/lucene/issues/14004 ### Description I have been experimenting with configuring `TieredMergePolicy` to keep the segment count very low: - segsPerTier = 2 - floorSegmentSize = 512MB This typically helps if

Re: [PR] Add IndexInput isLoaded [lucene]

2024-11-20 Thread via GitHub
ChrisHegarty commented on PR #13998: URL: https://github.com/apache/lucene/pull/13998#issuecomment-2489087426 > This works for me. Maybe implement this API on our in-memory index inputs to return true, e.g. `ByteBuffersIndexInput`? yeah, I think that this prob makes sense. Lemme satis

Re: [PR] Add IndexInput isLoaded [lucene]

2024-11-20 Thread via GitHub
ChrisHegarty commented on code in PR #13998: URL: https://github.com/apache/lucene/pull/13998#discussion_r1850657822 ## lucene/core/src/java/org/apache/lucene/store/IndexInput.java: ## @@ -226,4 +227,17 @@ public String toString() { * @param length the number of bytes to pre

Re: [PR] Add IndexInput isLoaded [lucene]

2024-11-20 Thread via GitHub
ChrisHegarty commented on code in PR #13998: URL: https://github.com/apache/lucene/pull/13998#discussion_r1850653616 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -406,6 +406,15 @@ void advise(long offset, long length, IOConsumer advice)

Re: [PR] Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors [lucene]

2024-11-20 Thread via GitHub
ChrisHegarty commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2488990896 The `org.apache.lucene.index.TestConcurrentMergeScheduler.testNoWaitClose` test hits a new assert that I added - sorry. I need to look to see if it is a test issue or more of a desi

Re: [PR] Only consider clauses whose cost is less than the lead cost to compute block boundaries in WANDScorer. [lucene]

2024-11-20 Thread via GitHub
jpountz commented on PR #14003: URL: https://github.com/apache/lucene/pull/14003#issuecomment-2488789866 The speedup is not as good, but still significant: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-valu

[PR] Only take leads into account to compute block boundaries2 [lucene]

2024-11-20 Thread via GitHub
jpountz opened a new pull request, #14003: URL: https://github.com/apache/lucene/pull/14003 WANDScorer implements block-max WAND and needs to recompute score upper bounds whenever it moves to a different block. Thus it's important for these blocks to be large enough to avoid re-computing sc

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-11-20 Thread via GitHub
benwtrent commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2488523095 @dungba88 I left a comment on your POC. It seems to me the best abstraction is a new query that doesn't inherit from AbstractKnnVectorQuery. Instead its just a new query that requ

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-11-20 Thread via GitHub
krickert commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2488410934 > And we can use getAllVectorValues() for scoring with max or avg of all vectors in the doc at query time. Your proposal to implement `getAllVectorValues()` for scoring documents

Re: [PR] Only consider clauses whose cost is less than the lead cost to compute block boundaries in WANDScorer. [lucene]

2024-11-20 Thread via GitHub
jpountz commented on PR #14000: URL: https://github.com/apache/lucene/pull/14000#issuecomment-2487985918 This changed top hits in nightly benchmarks, which caused a failure. I'm reverting and will look into it. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-11-20 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2487597269 _...contd. from above – thoughts on supporting independent multi-vectors specified via `NONE` multi-vector aggregation..._ __ The `Knn{Float|Byte}Vector` fields will accept