Re: [PR] Remove redundant code in PointInSetQuery [lucene]

2024-10-15 Thread via GitHub
easyice merged PR #13905: URL: https://github.com/apache/lucene/pull/13905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Remove redundant code in PointInSetQuery [lucene]

2024-10-15 Thread via GitHub
easyice commented on PR #13905: URL: https://github.com/apache/lucene/pull/13905#issuecomment-2415601472 Thanks for your review Adrien. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Inter-segment I/O concurrency. [lucene]

2024-10-15 Thread via GitHub
github-actions[bot] commented on PR #13509: URL: https://github.com/apache/lucene/pull/13509#issuecomment-2415386902 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Use `instanceof` pattern-matching where possible [lucene]

2024-10-15 Thread via GitHub
github-actions[bot] commented on PR #12295: URL: https://github.com/apache/lucene/pull/12295#issuecomment-2415387815 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add changelog verifier [lucene]

2024-10-15 Thread via GitHub
stefanvodita commented on PR #13909: URL: https://github.com/apache/lucene/pull/13909#issuecomment-2415309799 Right now this would show up like a separate automated check on PRs, which would fail if the changelog is untouched and we don't have the `Skip-Changelog` label. We can override it

Re: [PR] Introduce multiSelect for ScalarQuantizer [lucene]

2024-10-15 Thread via GitHub
HoustonPutman commented on PR #13919: URL: https://github.com/apache/lucene/pull/13919#issuecomment-2415275109 So after fixing the innocuous bugs in the implementations, it looks like there is no speed up here. The confidence interval finding can be up to 30% faster or so, but that's such a

[PR] Introduce multiSelect for ScalarQuantizer [lucene]

2024-10-15 Thread via GitHub
HoustonPutman opened a new pull request, #13919: URL: https://github.com/apache/lucene/pull/13919 Resolves #13918 ### Description This introduces a `multiSelect(from, to, k[])` method on the `Selector` abstract class, and gives implementations of the method for both `Selector`

[I] Speed up ScalarQuantization by selecting quantiles together [lucene]

2024-10-15 Thread via GitHub
HoustonPutman opened a new issue, #13918: URL: https://github.com/apache/lucene/issues/13918 ### Description Currently in `ScalarQuantizer`, `ScalarQuantizer.fromVectorsAutoInterval()` will issue 4 calls (per to scratch-batch, basically `len(vector)/20`) `Selector.select()` and `Scal

Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-15 Thread via GitHub
original-brownbear merged PR #13906: URL: https://github.com/apache/lucene/pull/13906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...

Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-15 Thread via GitHub
original-brownbear commented on PR #13906: URL: https://github.com/apache/lucene/pull/13906#issuecomment-2414656848 Thanks @jpountz and @uschindler. I reverted the `randomAccessSlice` thing now, special cased `clone` to not go through slice as asked. Tests we seem to have for this, doing

Re: [PR] Avoid slicing memory segments unnecessarily [lucene]

2024-10-15 Thread via GitHub
original-brownbear commented on code in PR #13906: URL: https://github.com/apache/lucene/pull/13906#discussion_r1801661683 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -563,6 +563,13 @@ public final MemorySegmentIndexInput slice(String s

Re: [I] Add an S3-based directory. [lucene]

2024-10-15 Thread via GitHub
msfroh commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2414502715 I've been thinking about this for a bit. In addition to an S3-based directory, I believe there could be some benefit from defining an S3 (or other object store) codec inspired by Par

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-10-15 Thread via GitHub
bugmakerr commented on PR #13763: URL: https://github.com/apache/lucene/pull/13763#issuecomment-2414387606 > Would you like to make this PR up-to-date and open a new one for the other change (a cherry-pick isn't clean due to other changes)? @jpountz sure, I‘d be happy to do this.

Re: [I] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on issue #13913: URL: https://github.com/apache/lucene/issues/13913#issuecomment-2414271372 There was also some quick discussion about multi-tenant (multiple active `IndexWriter`s on a single JVM) RAM buffers easier at https://github.com/apache/lucene/issues/13387. --

Re: [PR] Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. [lucene]

2024-10-15 Thread via GitHub
vsop-479 commented on PR #13915: URL: https://github.com/apache/lucene/pull/13915#issuecomment-2414188828 Maybe help saving memory because resetting can early happens? I am not sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Skip madvise calls on tiny inner files of compound files. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on PR #13917: URL: https://github.com/apache/lucene/pull/13917#issuecomment-2414057167 Thanks Mike! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Skip madvise calls on tiny inner files of compound files. [lucene]

2024-10-15 Thread via GitHub
jpountz merged PR #13917: URL: https://github.com/apache/lucene/pull/13917 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Better handle dynamic pruning when the leading clause has a single impact block. [lucene]

2024-10-15 Thread via GitHub
jpountz merged PR #13904: URL: https://github.com/apache/lucene/pull/13904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-10-15 Thread via GitHub
jpountz commented on PR #13763: URL: https://github.com/apache/lucene/pull/13763#issuecomment-2414051812 Would you like to make this PR up-to-date and open a new one for the other change (a cherry-pick isn't clean due to other changes)? -- This is an automated message from the Apache Git

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-10-15 Thread via GitHub
jpountz commented on PR #13763: URL: https://github.com/apache/lucene/pull/13763#issuecomment-2414048312 Yes, let's add this in 10.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Use growNoCopy and setInt in Util#toIntsRef. [lucene]

2024-10-15 Thread via GitHub
jpountz merged PR #13889: URL: https://github.com/apache/lucene/pull/13889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on PR #13914: URL: https://github.com/apache/lucene/pull/13914#issuecomment-2414026528 I have not looked closely but this sounds very cool!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Make dynamic range facets value collection and sorting faster [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on issue #13760: URL: https://github.com/apache/lucene/issues/13760#issuecomment-2414003870 Learned Sort looks amazing -- @josefschiefer27 maybe open a dedicated spinoff issue to see if there are other places where it could help Lucene? Lucene does a lot of sorting ...

Re: [PR] Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on PR #13915: URL: https://github.com/apache/lucene/pull/13915#issuecomment-2413980018 Hello again @vsop-479 -- can you please explain what this change is trying to accomplish? Thanks. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Align TestGenerateBwcIndices.java with AddBackcompatindices.py [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on PR #13911: URL: https://github.com/apache/lucene/pull/13911#issuecomment-2413926802 Thanks @javanna. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Forward port "Fix 9.12.0 backcompat break" to main [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on PR #13912: URL: https://github.com/apache/lucene/pull/13912#issuecomment-2413923001 Woops, thank you @javanna ... I had the sinking sensation that in all the flurry of cherry-picks to different branches I missed something! -- This is an automated message from the A

Re: [PR] Skip madvise calls on tiny inner files of compound files. [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on PR #13917: URL: https://github.com/apache/lucene/pull/13917#issuecomment-2413915340 Does our madvise impl otherwise do the right thing for compound files? I.e. just madvises the right slice of bytes? -- This is an automated message from the Apache Git Service. To

Re: [I] Significant performance regression with search after [lucene]

2024-10-15 Thread via GitHub
mikemccand commented on issue #13856: URL: https://github.com/apache/lucene/issues/13856#issuecomment-2413911330 > The luceneutil benchmarks don't show the issue that we see in the Elasticsearch benchmarks, as it seems that there is no coverage for _search after_. Also, the issue we see is

Re: [PR] Dry up EverythingEnum and BlockDocsEnum in Lucene912PostingsReader [lucene]

2024-10-15 Thread via GitHub
jpountz commented on PR #13901: URL: https://github.com/apache/lucene/pull/13901#issuecomment-2413910417 `CountOrHighHigh` had its best QPS ever, which is likely due to this change. https://benchmarks.mikemccandless.com/CountOrHighHigh.html I pushed an annotation. -- This is an automated

Re: [PR] Make MaxScoreBulkScorer repartition scorers when the min competitive increases. [lucene]

2024-10-15 Thread via GitHub
jpountz merged PR #13800: URL: https://github.com/apache/lucene/pull/13800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Significant performance regression with search after [lucene]

2024-10-15 Thread via GitHub
javanna commented on issue #13856: URL: https://github.com/apache/lucene/issues/13856#issuecomment-2413878571 I am kind of anxious that we resolved this regression for the 10.0 release, but we don't have a clear plan for branch_10x and main. Shall we forward port the revert to main and bran

Re: [I] Address backward compat test issues after upgrading main to Lucene 11 [lucene]

2024-10-15 Thread via GitHub
javanna commented on issue #13847: URL: https://github.com/apache/lucene/issues/13847#issuecomment-2413868725 Heads up: I pushed 10.0.0 backwards indices to main, and re-enabled backwards compatibility tests for indices >= 10.0, see https://github.com/apache/lucene/commit/352d85cbe4d3e64e3a

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2024-10-15 Thread via GitHub
expani commented on code in PR #13521: URL: https://github.com/apache/lucene/pull/13521#discussion_r1801081957 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocIdEncodingBenchmark.java: ## @@ -0,0 +1,404 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2024-10-15 Thread via GitHub
expani commented on code in PR #13521: URL: https://github.com/apache/lucene/pull/13521#discussion_r1801053720 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocIdEncodingBenchmark.java: ## @@ -0,0 +1,404 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2024-10-15 Thread via GitHub
expani commented on code in PR #13521: URL: https://github.com/apache/lucene/pull/13521#discussion_r1801052715 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocIdEncodingBenchmark.java: ## @@ -0,0 +1,404 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2024-10-15 Thread via GitHub
expani commented on code in PR #13521: URL: https://github.com/apache/lucene/pull/13521#discussion_r1801052715 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocIdEncodingBenchmark.java: ## @@ -0,0 +1,404 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
jpountz merged PR #13907: URL: https://github.com/apache/lucene/pull/13907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on code in PR #13907: URL: https://github.com/apache/lucene/pull/13907#discussion_r1801048564 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -567,7 +567,8 @@ public final MemorySegmentIndexInput slice(String sliceDescript

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
uschindler commented on code in PR #13907: URL: https://github.com/apache/lucene/pull/13907#discussion_r1801030063 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -567,7 +567,8 @@ public final MemorySegmentIndexInput slice(String sliceDescr

Re: [I] Add an S3-based directory. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2413717480 No special requirements, you may just need to adjust formatting (running `./gradlew tidy`) and make sure it conforms with other requirements that are checked by the build, like forb

Re: [PR] Align TestGenerateBwcIndices.java with AddBackcompatindices.py [lucene]

2024-10-15 Thread via GitHub
javanna merged PR #13911: URL: https://github.com/apache/lucene/pull/13911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Forward port "Fix 9.12.0 backcompat break" to main [lucene]

2024-10-15 Thread via GitHub
javanna merged PR #13912: URL: https://github.com/apache/lucene/pull/13912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Add an S3-based directory. [lucene]

2024-10-15 Thread via GitHub
albogdano commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2413606817 Yes, of course! Are there any requirements for the PR? It would be a fairly large chunk of code for a single PR and I'm not sure if that's allowed. Should I just add the code to a

Re: [PR] Fix 9.12.0 backcompat break (Lucene 9.12.0 cannot read 9.11.x indices written with quantized HNSW, `Lucene99HnswScalarQuantizedVectorsFormat`) [lucene]

2024-10-15 Thread via GitHub
ChrisHegarty commented on PR #13874: URL: https://github.com/apache/lucene/pull/13874#issuecomment-2413575785 > * I think that we need to forward port this change to main as well? Yes, it should be forward ported. > * We need to update the tooling to generate the backwards indic

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on PR #13907: URL: https://github.com/apache/lucene/pull/13907#issuecomment-2413513756 > I was just stumbling on the javadocs that its not "legal" This comment refers to the compound file, not the inner file. I just tried to make it clearer. > Possibly we coul

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
uschindler commented on PR #13907: URL: https://github.com/apache/lucene/pull/13907#issuecomment-2413504761 > > The latter check is still there, so maybe we can clean this up, too. > > OK I'll do this. You basically restored the original state. Thanks. Possibly we could still i

Re: [PR] Add changelog verifier [lucene]

2024-10-15 Thread via GitHub
jpountz commented on PR #13909: URL: https://github.com/apache/lucene/pull/13909#issuecomment-2413498513 Is it possible to disable this check based on files that are touched (e.g. test-only fixes don't require a changelog entry) and based on the presence of a label (e.g. mechanical refactor

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
uschindler commented on PR #13907: URL: https://github.com/apache/lucene/pull/13907#issuecomment-2413496158 > If we want to not call `madvise` when the advice is `NORMAL`, then the compound file should be open with NORMAL so that we can also skip the `madvise` call on the inner file when th

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on PR #13907: URL: https://github.com/apache/lucene/pull/13907#issuecomment-2413484055 > The latter check is still there, so maybe we can clean this up, too. OK I'll do this. > why should the madvise only called when read advice is NORMAL If we want to n

Re: [I] Add an S3-based directory. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2413470470 Sounds good. Would you like to work on the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] Indicate frontier init length in Lucene90BlockTreeTermsWriter#compileIndex. [lucene]

2024-10-15 Thread via GitHub
vsop-479 opened a new pull request, #13916: URL: https://github.com/apache/lucene/pull/13916 ### Description I think it could avoid waste or resize `FSTCompiler.frontier` when building leaf block's FST, which has no `block.subIndices`. -- This is an automated message from the

Re: [PR] Only call madvise when necessary. [lucene]

2024-10-15 Thread via GitHub
uschindler commented on PR #13907: URL: https://github.com/apache/lucene/pull/13907#issuecomment-2413307576 In the original code when I added madvise for the first time, I had an exclusion on the mapping function between ReadAdvise and the platform constant a NULL mapping for NORMAL. The la

Re: [I] Add an S3-based directory. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2413269880 I'm thinking of a PR that would create a new `lucene/directory/s3` module where we'd check in the code. > proof of concept What is your gut feeling: should we rather st

Re: [I] Add an S3-based directory. [lucene]

2024-10-15 Thread via GitHub
albogdano commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2413243843 @jpountz Yes! How can I help you guys? My knowledge of Lucene internals is quite limited and the goal of the `lucene-s3directory` was mainly to be a proof of concept. -- This i

[PR] Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. [lucene]

2024-10-15 Thread via GitHub
vsop-479 opened a new pull request, #13915: URL: https://github.com/apache/lucene/pull/13915 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add an S3-based directory. [lucene]

2024-10-15 Thread via GitHub
jpountz commented on issue #13868: URL: https://github.com/apache/lucene/issues/13868#issuecomment-2413100360 @albogdano I'm curious if you have any interest in contributing your https://github.com/albogdano/lucene-s3directory? @shubhamvishu @atris Thanks for volunteering to help! I'