[PR] Only search soft deleted in SoftDeletesRetentionMergePolicy.applyRetentionQuery [lucene]

2024-07-02 Thread via GitHub
vsop-479 opened a new pull request, #13536: URL: https://github.com/apache/lucene/pull/13536 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2204763316 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #13201: URL: https://github.com/apache/lucene/pull/13201#issuecomment-2204763543 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] This commit adds a new test CMS that always provides intra-merge parallelism [lucene]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #13475: URL: https://github.com/apache/lucene/pull/13475#issuecomment-2204763225 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Removing usage of TopScoreDocCollector + TopFieldCollector deprecated methods (#create, #createSharedManager) [lucene]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #13500: URL: https://github.com/apache/lucene/pull/13500#issuecomment-2204763139 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-02 Thread via GitHub
uschindler commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2204533884 > I believe that this could be a problem for cross-data-structure merging concurrency (which we just disabled, but would like to re-enable soon-ish) since merging uses `READONCE`. In

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-02 Thread via GitHub
uschindler commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2204198800 Not sure about this: We could possibly also modify the general Exception handler which catches IllegalStateException and rethrow it as IOException. We do this already for closed inpu

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-02 Thread via GitHub
uschindler commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2204070010 Looks ok to me. Maybe ask @jpountz for his opinion; maybe he has more ideas where we only work single threaded. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-02 Thread via GitHub
ChrisHegarty commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2203889420 loopy testing with `-Ptests.directory=MMapDirectory` all successfully after several dozen runs. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-02 Thread via GitHub
uschindler commented on code in PR #13535: URL: https://github.com/apache/lucene/pull/13535#discussion_r1662520292 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -45,7 +45,12 @@ public IndexInput openInput(Path path, IOContext conte

Re: [PR] Add target search concurrency to TieredMergePolicy [lucene]

2024-07-02 Thread via GitHub
carlosdelest commented on code in PR #13430: URL: https://github.com/apache/lucene/pull/13430#discussion_r1662412942 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -522,21 +550,28 @@ private MergeSpecification doFindMerges( final List cand

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-02 Thread via GitHub
ChrisHegarty commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2203199663 > Cool. Seems useful to achieve the goal. > > As written in the original issue maybe we should disallow clones, random access and slices of IndexInput on top of that. This may

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-02 Thread via GitHub
uschindler commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2203161601 When backporting we need to apply same changes for java 19 and 20. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-02 Thread via GitHub
benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2203125856 The only way to find out is to test it. I don't see how your suggestion would work without trying it out. Its better to think about what it would be in the unsigned `byte` c

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

2024-07-02 Thread via GitHub
benwtrent commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2200120237 > I could update the existing FlatVectorsFormat and write these data offsets only for when the field is a tensor. I was thinking something like this. We should dynamically handl

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-02 Thread via GitHub
benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2200052667 My concern for 8 bit quantization is the algebraic expansion of dot-product and the corrective terms. For scalar quantization, the score corrections for dotProduct are deriv

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-02 Thread via GitHub
naveentatikonda commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2197796395 @benwtrent Can you please help me understand the following: 1. In terms of quantization, are we doing any extra processing for 4 and 7 bits when compared to 8 bits ? I

Re: [PR] test: kuromoji [lucene]

2024-07-02 Thread via GitHub
github-actions[bot] commented on PR #13485: URL: https://github.com/apache/lucene/pull/13485#issuecomment-2197791716 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-07-02 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2194731622 Thanks a lot for suggestions @jpountz and @mikemccand. As suggested above, we worked on a POC to explore using separate IndexWriter for different groups. Each IndexWriter

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-02 Thread via GitHub
mikemccand commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2194673340 OK I managed to run `knnPerfTest.py` from `luceneutil`, using `mpnet` vectors (768 dims) and I think I am also seeing horrific performance for `int8` but OK for `int4` and `int7`

Re: [I] Query matching difference in Lucene 2 and Lucene 4 [lucene]

2024-07-02 Thread via GitHub
jpountz closed issue #13522: Query matching difference in Lucene 2 and Lucene 4 URL: https://github.com/apache/lucene/issues/13522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Query matching difference in Lucene 2 and Lucene 4 [lucene]

2024-07-02 Thread via GitHub
jpountz commented on issue #13522: URL: https://github.com/apache/lucene/issues/13522#issuecomment-2191601800 Your phrase query has `salle` immediately followed by `manger`, while your indexed document has `à` in-between, so the phrase doesn't match. You can get the original behavior back b