Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2290359914 As one more thought, all I can think is that these impacted tasks are possibly taking the `leafSlices.length == 0` condition branch and avoiding the need to create the collectors list a

Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2290353375 I'll also mention that I experimented with inlining the private search method to avoid having to pass leafSlices and collectors as arguments to the private method and I see the same eff

[PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-14 Thread via GitHub
gsmiller opened a new pull request, #13657: URL: https://github.com/apache/lucene/pull/13657 After merging #13568 @mikemccand and @jpountz [noticed](https://github.com/apache/lucene/pull/13568#issuecomment-2288666763) some surprising nightly benchmark performance regressions. @epotyom and I

Re: [PR] Release memory for cancelled tasks earlier in TaskExecutor [lucene]

2024-08-14 Thread via GitHub
github-actions[bot] commented on PR #13609: URL: https://github.com/apache/lucene/pull/13609#issuecomment-2290122158 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Compare the missing value with the top value even after the hit queue is full [lucene]

2024-08-14 Thread via GitHub
gsmiller merged PR #13644: URL: https://github.com/apache/lucene/pull/13644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler merged PR #13649: URL: https://github.com/apache/lucene/pull/13649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] `gradlew eclipse` no longer works [lucene]

2024-08-14 Thread via GitHub
uschindler closed issue #13638: `gradlew eclipse` no longer works URL: https://github.com/apache/lucene/issues/13638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler commented on PR #13649: URL: https://github.com/apache/lucene/pull/13649#issuecomment-2289976761 Works all fine, merging now. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler commented on code in PR #13649: URL: https://github.com/apache/lucene/pull/13649#discussion_r1717587581 ## gradle/validation/dependencies.gradle: ## @@ -75,6 +75,22 @@ configure(rootProject) { it.dependsOn(":versionCatalogFormatDeps") } + // correct crlf/ d

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-08-14 Thread via GitHub
navneet1v commented on PR #13469: URL: https://github.com/apache/lucene/pull/13469#issuecomment-2289797047 @msokolov and @benwtrent I see that we exposed the FlatVectorsFormat but I am not seeing entry for this format in this file: https://github.com/apache/lucene/blob/main/lucene/core/src/

Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]

2024-08-14 Thread via GitHub
rmuir commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2289745259 sorry @goankur i'm super-behind, vacations and end of summer. i definitely want to try out the VNNI path but just haven't found the cycles yet. -- This is an automated message from the A

Re: [PR] gh-13653: Call finish() from HnswGraphBuilder.build() [lucene]

2024-08-14 Thread via GitHub
msokolov merged PR #13655: URL: https://github.com/apache/lucene/pull/13655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] gh-13653: Call finish() from HnswGraphBuilder.build() [lucene]

2024-08-14 Thread via GitHub
msokolov commented on PR #13655: URL: https://github.com/apache/lucene/pull/13655#issuecomment-2289624603 > I think this is sane. Though it is used by incremental merger still, which is still a viable and good way to merge graphs :) Oh! well, I have no excuse then :) Thanks for lo

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler commented on PR #13649: URL: https://github.com/apache/lucene/pull/13649#issuecomment-2289610751 Let me try it out later and merge it. Not at 💻. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler commented on code in PR #13649: URL: https://github.com/apache/lucene/pull/13649#discussion_r1717404931 ## gradle/validation/dependencies.gradle: ## @@ -75,6 +75,22 @@ configure(rootProject) { it.dependsOn(":versionCatalogFormatDeps") } + // correct crlf/ d

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler commented on code in PR #13649: URL: https://github.com/apache/lucene/pull/13649#discussion_r1717405562 ## gradle/validation/dependencies.gradle: ## @@ -75,6 +75,22 @@ configure(rootProject) { it.dependsOn(":versionCatalogFormatDeps") } + // correct crlf/ d

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler commented on code in PR #13649: URL: https://github.com/apache/lucene/pull/13649#discussion_r1717404931 ## gradle/validation/dependencies.gradle: ## @@ -75,6 +75,22 @@ configure(rootProject) { it.dependsOn(":versionCatalogFormatDeps") } + // correct crlf/ d

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
dweiss commented on PR #13649: URL: https://github.com/apache/lucene/pull/13649#issuecomment-2289582426 I think this is ready to merge, if you agree, click on merge any time, Uwe. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
dweiss commented on code in PR #13649: URL: https://github.com/apache/lucene/pull/13649#discussion_r1717397089 ## gradle/validation/dependencies.gradle: ## @@ -75,6 +75,22 @@ configure(rootProject) { it.dependsOn(":versionCatalogFormatDeps") } + // correct crlf/ defau

Re: [PR] Compare the missing value with the top value even after the hit queue is full [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13644: URL: https://github.com/apache/lucene/pull/13644#issuecomment-2289362834 Sounds right to me. Thanks @bugmakerr ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2289348515 OK I pushed a small change (#13656) to `IndexSearcher` that reverts the "common" (i.e., non-drill-sideways) code-path back to what it was before this faceting change. It appears to reve

Re: [PR] Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-14 Thread via GitHub
gsmiller merged PR #13656: URL: https://github.com/apache/lucene/pull/13656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13656: URL: https://github.com/apache/lucene/pull/13656#issuecomment-2289336382 Because this change is small (just undoing a small bit of #13568 in IndexSearcher) and appears to correct the performance regressions the nightly benchmarks demonstrated, I'm going to m

Re: [PR] Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13656: URL: https://github.com/apache/lucene/pull/13656#issuecomment-2289330332 Also, benchmarks (wikimedium10m) comparing this PR to the tip of main seems to indicate it reverses the performance impact we observed in the nightly runs: ```

Re: [PR] Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13656: URL: https://github.com/apache/lucene/pull/13656#issuecomment-2289311663 Benchmarks (`wikimedium10m`) comparing this change to `9b481f76f77b40a5061fb3b70763e4b4a06d31d0` (last commit on main before #13568) show no performance regression, which is what we're

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2289299245 Thanks @epotyom ! I think I've isolated the change that's causing the regression and I've put up a PR (#13656) to temporarily fix it. It just reverts the IndexSearcher changes back to w

[PR] nocommit: Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-14 Thread via GitHub
gsmiller opened a new pull request, #13656: URL: https://github.com/apache/lucene/pull/13656 Note that this results in some silly code duplication so I don't think we ought to merge this as-is. This is more to show the performance regression in isolation. -- This is an automated message

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
epotyom commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2289274913 @gsmiller oh you are right, it was wikimedium10**k**. Results for `wikimediumall` confirm that this PR causes the regression. ``` TaskQPS baseline

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2289239609 @epotyom just to confirm, are you using `wikimedium10k` or `wikimedium10m`? As another check, I reverted out the changes to IndexSearcher and DrillSideways/DrillSidewaysQuery (so

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
epotyom commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2289194440 @gsmiller my results for `wikimedium10k` are quite different, but it might be that something is wrong with my setup, double-checking; I'm also running tests for `wikimediumall` but they

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2289194124 After looking through the change for possible ways it could impact non-faceting tasks, the only thing that jumped out to me was the synchronization of the collector list in `CollectorOw

Re: [PR] Compare the missing value with the top value even after the hit queue is full [lucene]

2024-08-14 Thread via GitHub
bugmakerr commented on PR #13644: URL: https://github.com/apache/lucene/pull/13644#issuecomment-2289159552 @gsmiller I add the entry under 9.12.0 Optimizations, please correct me if I'm wrong. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[PR] gh-13653: Call finish() from HnswGraphBuilder.build() [lucene]

2024-08-14 Thread via GitHub
msokolov opened a new pull request, #13655: URL: https://github.com/apache/lucene/pull/13655 This utility method is no longer used by the indexing chain, only by some unit tests, so we missed tracking the change in behavior when merging. Maybe we ought to change the unit test instead? But i

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2289109976 I was already setup to run benchmarks so I did a quick run with `wikimedium10m` that isolated this change. The results are a bit surprising. @epotyom I'll be curious if you reproduce si

Re: [I] What does the Lucene community think about dimensionality reduction for vectors, and should it be something the library does internally (at merge time perhaps)? [lucene]

2024-08-14 Thread via GitHub
mikemccand commented on issue #13403: URL: https://github.com/apache/lucene/issues/13403#issuecomment-2289068792 I was reading up on PCA and uncovered [this interesting enumeration of the challenges/prerequisites for PCA](https://ekamperi.github.io/mathematics/2021/02/23/pca-limitations.htm

Re: [PR] Compare the missing value with the top value even after the hit queue is full [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13644: URL: https://github.com/apache/lucene/pull/13644#issuecomment-2288982237 @bugmakerr could you add an entry in CHANGES.txt before we merge please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Compare the missing value with the top value even after the hit queue is full [lucene]

2024-08-14 Thread via GitHub
gsmiller commented on PR #13644: URL: https://github.com/apache/lucene/pull/13644#issuecomment-2288953441 Looks good. Thanks @bugmakerr! (Looks like the check task failed in a transient way... out of disk space? I ran the checks locally just to make sure they pass and they do, as expect

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
epotyom commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2288910722 @mikemccand , @jpountz thanks for heads up! I'm running local benchmarks now to check if it is related to this change. -- This is an automated message from the Apache Git Service. To r

Re: [I] TestHnswFloatVectorGraph.testReadWrite fails on branch 9x [lucene]

2024-08-14 Thread via GitHub
msokolov commented on issue #13653: URL: https://github.com/apache/lucene/issues/13653#issuecomment-2288831420 I'm suspecting this may have something to do with the connectComponents changes - I'll dig -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
mikemccand commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2288769876 [Term](https://home.apache.org/~mikemccand/lucenebench/Term.html) as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
jpountz commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2288723938 [SloppyPhrase](https://home.apache.org/~mikemccand/lucenebench/SloppyPhrase.html) and [SpanNear](https://home.apache.org/~mikemccand/lucenebench/SpanNear.html) had a noticeable slowdown

Re: [PR] Compute facets while collecting [lucene]

2024-08-14 Thread via GitHub
mikemccand commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2288666763 Hmm is it possible the `CollectorOwner` change here somehow led to [this surprising nightly benchmark performance drop](https://home.apache.org/~mikemccand/lucenebench/CombinedHighMed

Re: [PR] Call HnswGraphBuilder.getCompletedGraph() in 94/95 back-compat writers [lucene]

2024-08-14 Thread via GitHub
msokolov merged PR #13654: URL: https://github.com/apache/lucene/pull/13654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [I] testMergeStability failing for Knn formats [lucene]

2024-08-14 Thread via GitHub
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2288628863 No it's not `HnswGraphBuilder` *per se* but the `HnswConcurrentMergeBuilder` that calls `finish()` and thence `connectComponents`. Since it's part of the o.a.l.u.hnsw package it's

Re: [I] testMergeStability failing for Knn formats [lucene]

2024-08-14 Thread via GitHub
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2288618203 See https://github.com/apache/lucene/pull/13654/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] Call HnswGraphBuilder.getCompletedGraph() in 94/95 back-compat writers [lucene]

2024-08-14 Thread via GitHub
msokolov opened a new pull request, #13654: URL: https://github.com/apache/lucene/pull/13654 another in an ongoing series of patches to address merge stability test failures. Here we tweak the back-compat Lucene 94/95 writers so they will call the new connectComponents() method (via `getCom

Re: [I] testMergeStability failing for Knn formats [lucene]

2024-08-14 Thread via GitHub
benwtrent commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2288559628 Not to be the bearer of bad news: ``` ./gradlew test --tests TestLucene95HnswVectorsFormat.testMergeStability -Dtests.seed=3D48DB65BA75CD03 -Dtests.locale=uz-Latn-UZ -Dtest

[I] TestHnswFloatVectorGraph.testReadWrite fails on branch 9x [lucene]

2024-08-14 Thread via GitHub
benwtrent opened a new issue, #13653: URL: https://github.com/apache/lucene/issues/13653 ### Description TestHnswFloatVectorGraph.testReadWrite fails due to a mismatched number of connections. `java.lang.AssertionError: arcs differ for node 3 expected:<[2, 52, 6]> but was:<[2

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-14 Thread via GitHub
ChrisHegarty commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2288426886 Apologies for not interacting on this PR before, but I was away. Thanks for fixing the type of the length parameter in MemorySegmentIndexInput. I'll spend some time studying the cod

Re: [I] `gradlew eclipse` no longer works [lucene]

2024-08-14 Thread via GitHub
uschindler commented on issue #13638: URL: https://github.com/apache/lucene/issues/13638#issuecomment-2288112074 I have the git version shipped with TortoiseGit (rather new one, can check later). Basically the warning was not the most annoying thing, it was more that the file was rep

Re: [PR] Fix eclipse ide settings generation [lucene]

2024-08-14 Thread via GitHub
uschindler commented on code in PR #13649: URL: https://github.com/apache/lucene/pull/13649#discussion_r1716483653 ## gradle/validation/dependencies.gradle: ## @@ -75,6 +75,22 @@ configure(rootProject) { it.dependsOn(":versionCatalogFormatDeps") } + // correct crlf/ d

Re: [I] `gradlew eclipse` no longer works [lucene]

2024-08-14 Thread via GitHub
dweiss commented on issue #13638: URL: https://github.com/apache/lucene/issues/13638#issuecomment-2288034567 I compared main with 8f50976c268. When you list the plugins applied to the root project, it'll give you this: ``` class org.gradle.api.plugins.HelpTasksPlugin$Inject class or

Re: [I] `gradlew eclipse` no longer works [lucene]

2024-08-14 Thread via GitHub
dweiss commented on issue #13638: URL: https://github.com/apache/lucene/issues/13638#issuecomment-2288013013 It's the plugin that formats version catalog - versionCatalogFormatDeps task writes back the file with the default encoding/ platform line endings. I've added a post-processing step