Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2435611867 Yes, exactly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

2024-10-25 Thread via GitHub
shatejas commented on issue #13920: URL: https://github.com/apache/lucene/issues/13920#issuecomment-2435944343 > @shatejas I think all the required details are present, so are you going to raise a PR for this? Yeah I am working on it, I have the changes and I am trying to figure out

Re: [I] Check ahead of time if the `count` can be obtained [lucene]

2024-10-25 Thread via GitHub
LuXugang closed issue #13890: Check ahead of time if the `count` can be obtained URL: https://github.com/apache/lucene/issues/13890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Check ahead if we can get the count [lucene]

2024-10-25 Thread via GitHub
LuXugang merged PR #13899: URL: https://github.com/apache/lucene/pull/13899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-25 Thread via GitHub
yugushihuang commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2435780436 We have measured performance using [knnPerfTest.py](https://github.com/mikemccand/luceneutil/blob/main/src/python/knnPerfTest.py) in lucene util with this PR [commit](https://githu

Re: [PR] Check ahead if we can get the count [lucene]

2024-10-25 Thread via GitHub
jpountz commented on code in PR #13899: URL: https://github.com/apache/lucene/pull/13899#discussion_r1815247300 ## lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -186,10 +186,44 @@ public boolean isCacheable(LeafReaderContext

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-25 Thread via GitHub
mikemccand commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1814888763 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -87,6 +87,28 @@ public Builder add(BooleanClause clause) { return this; } +

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-10-25 Thread via GitHub
iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1816322441 ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInputDataInput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-10-25 Thread via GitHub
iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1816321990 ## lucene/core/src/java/org/apache/lucene/index/BinaryDocValues.java: ## @@ -33,4 +34,15 @@ protected BinaryDocValues() {} * @return binary value */ public

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-10-25 Thread via GitHub
iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1816323891 ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInputDataInput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-25 Thread via GitHub
ljak commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2435609721 Ha, I see. Could we say that the new `List orderedQueries` would have the same behavior that `Query[] disjuncts` before https://github.com/apache/lucene/pull/110/files ? If yes, I presume i

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
msokolov commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816770842 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class Cosi

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
benwtrent commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437820707 I think a "merging scorer" would be good. The only place the "scorer supplier" is used is during graph building. My initial concern with a "mutable scorer" is that it would also

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816687000 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437836935 Yes, OK I now see quite a bit of this is a "preexisting condition" and maybe not exacerbated by this change. We are still creating more scratch arrays than we did before though, I think

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13948: URL: https://github.com/apache/lucene/pull/13948#issuecomment-2437732473 In my experience, binary doc values are more often used to encode structured data, such as maps that help build scoring signals, geo shapes, etc. than actual binary content, so this chan

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437752945 > Can you clarify which allocation is the problematic one, and where it's done on the indexing path? See Ben's comments from ~2 weeks ago where he calls out the problem of overal

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-25 Thread via GitHub
ljak commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2438170606 Done. Thanks for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz opened a new pull request, #13958: URL: https://github.com/apache/lucene/pull/13958 PR #13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks. This PR tries a differen

Re: [I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]

2024-10-25 Thread via GitHub
derreisende77 commented on issue #13959: URL: https://github.com/apache/lucene/issues/13959#issuecomment-2438658215 I made some tests with Ubuntu 24.10: JDK 23: 9.9 seconds JDK 22: 1.4 seconds -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-25 Thread via GitHub
jpountz commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1815173658 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -136,20 +158,20 @@ public List clauses() { } /** Return the collection of queries for

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437740226 Can you clarify which allocation is the problematic one, and where it's done on the indexing path? -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816669062 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class

Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13864: URL: https://github.com/apache/lucene/pull/13864#issuecomment-2437765755 Sorry, I don't feel good about relying on `paddingBitsNeeded` on the read path. I suggest we close this PR, IMO the better fix would be to change the way we store terms dictionaries to r

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437761782 > that we instead have a mutable Scorer that can accept a new target vector. Yes, that is something that I've noodled on for a while now too - a scorer that accepts two ords,

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816669062 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816669062 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class

Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]

2024-10-25 Thread via GitHub
original-brownbear closed pull request #13864: Make DirectMonotonicReader.Meta more compact URL: https://github.com/apache/lucene/pull/13864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]

2024-10-25 Thread via GitHub
original-brownbear commented on PR #13864: URL: https://github.com/apache/lucene/pull/13864#issuecomment-2437848161 yea that's cool sorry forgot about this one, we for starters just store the offsets in a more compact form that'll help already. I'll open a PR once I find a little time :)

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-25 Thread via GitHub
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437853233 Maybe we could add a `RandomVectorScorer.setTarget(int node)` method that would only be implemented by the Scorers returned from ScorerSuppliers? -- This is an automated message from

Re: [PR] Remove some useless code in TopScoreDocCollector. [lucene]

2024-10-25 Thread via GitHub
jpountz merged PR #13955: URL: https://github.com/apache/lucene/pull/13955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]

2024-10-25 Thread via GitHub
jpountz merged PR #13950: URL: https://github.com/apache/lucene/pull/13950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add MIGRATE entry about the fact that readVLong() may now read negative values, and up to 10 bytes. [lucene]

2024-10-25 Thread via GitHub
jpountz merged PR #13956: URL: https://github.com/apache/lucene/pull/13956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2437549776 Can you add an entry to `lucene/CHANGES.txt` under version 10.1.0? Then I'll merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Ensure doc order for TestCommonTermsQuery#testMinShouldMatch [lucene]

2024-10-25 Thread via GitHub
benwtrent merged PR #13953: URL: https://github.com/apache/lucene/pull/13953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] TestCommonTermsQuery.testMinShouldMatch test failure [lucene]

2024-10-25 Thread via GitHub
benwtrent closed issue #13946: TestCommonTermsQuery.testMinShouldMatch test failure URL: https://github.com/apache/lucene/issues/13946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]

2024-10-25 Thread via GitHub
derreisende77 opened a new issue, #13959: URL: https://github.com/apache/lucene/issues/13959 ### Description I am using Lucene in my app for several years happily with JDKs up to 22. My use case searches through film data and Lucene can return fairly huge result sets to my app - wh

Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]

2024-10-25 Thread via GitHub
jpountz merged PR #13944: URL: https://github.com/apache/lucene/pull/13944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-25 Thread via GitHub
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1817245059 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java: ## @@ -146,6 +146,7 @@ public float getScoreCorrectionConstant(int tar

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-10-25 Thread via GitHub
benwtrent commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2438671673 Hey @vigyasharma there is a lot of good work here. I am going to shift my focus and see about how I can help here more fully. What are the next steps? I am guessing handl

Re: [I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]

2024-10-25 Thread via GitHub
benwtrent commented on issue #13959: URL: https://github.com/apache/lucene/issues/13959#issuecomment-2438673337 @derreisende77 do you have profiling of the two different runs? Maybe through async-profiler? It would be interesting to see where the time is being spent. -- This is an automa

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-25 Thread via GitHub
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1817385236 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -84,6 +91,76 @@ public void init() { floatsA[i] = random.nextFl

Re: [I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]

2024-10-25 Thread via GitHub
derreisende77 commented on issue #13959: URL: https://github.com/apache/lucene/issues/13959#issuecomment-2438907870 @benwtrent I have JProfiler but I am not really experienced in using it - or profiling at all. I made two runs on macOS and made screenshots from the hotspot page. JD

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438911637 And I seem to be getting a better speedup by using `trueCount()` instead of `firstTrue()`: ``` TaskQPS baseline StdDevQPS my_modified_version

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438925486 you are using VectorMask, only use this where implemented in HW (AVX-512 and ARM SVE). -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438919587 I ran this PR on my Mac laptop (M3), where this gives a massive slowdown, I imagine because some of the vector operations I'm using are emulated. I need to find what to check against in

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438947715 For these uses of vectormask you are ok with AVX2 (so just use existing FAST_INTEGER_VECTORS check): https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1597-L160

Re: [PR] Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. [lucene]

2024-10-25 Thread via GitHub
vsop-479 commented on PR #13915: URL: https://github.com/apache/lucene/pull/13915#issuecomment-2437267763 I will close it, since it is insignificant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. [lucene]

2024-10-25 Thread via GitHub
vsop-479 closed pull request #13915: Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. URL: https://github.com/apache/lucene/pull/13915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Remove LeafSimScorer abstraction. [lucene]

2024-10-25 Thread via GitHub
jpountz opened a new pull request, #13957: URL: https://github.com/apache/lucene/pull/13957 `LeafSimScorer` is a specialization of a `SimScorer` for a given segment. It doesn't add much value, but benchmarks suggest that it adds measurable overhead to queries sorted by score. Here is

Re: [PR] Disable exchanging minimum scores across slices for exhaustive evaluation. [lucene]

2024-10-25 Thread via GitHub
jpountz merged PR #13954: URL: https://github.com/apache/lucene/pull/13954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13951: URL: https://github.com/apache/lucene/pull/13951#issuecomment-2437616406 > I couldn't think of a clean way to integrate the two... but I'll give it some more thought For what it's worth, these classes are package-private, so we can feel free to change

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2024-10-25 Thread via GitHub
HoustonPutman commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1810967900 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used t

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438973598 maybe its a bug that it doesnt work on your mac either. because elsewhere they have code that looks like it is supposed to be doing this stuff: https://github.com/openjdk/jdk/blob/f1a9a8d2

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-10-25 Thread via GitHub
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1817415010 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -84,6 +91,76 @@ public void init() { floatsA[i] = random.nextFl

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438737799 Specializing `ImpactsDISI#nextDoc()` helped get rid of the slowdown: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-25 Thread via GitHub
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438944785 https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L280-L283 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Simplify leaf slice calculation [lucene]

2024-10-25 Thread via GitHub
github-actions[bot] commented on PR #13893: URL: https://github.com/apache/lucene/pull/13893#issuecomment-2439076438 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Optimize slice calculation in IndexSearcher a little [lucene]

2024-10-25 Thread via GitHub
github-actions[bot] commented on PR #13860: URL: https://github.com/apache/lucene/pull/13860#issuecomment-2439076472 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Reduce allocations in BKDReaderDocIDSetIterator [lucene]

2024-10-25 Thread via GitHub
github-actions[bot] commented on PR #13888: URL: https://github.com/apache/lucene/pull/13888#issuecomment-2439076449 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi