[GitHub] [lucene] risdenk commented on pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights
risdenk commented on PR #12266: URL: https://github.com/apache/lucene/pull/12266#issuecomment-1545691179 Is it possible that all the build logic can be simplified like the other plugins and not be in `settings.gradle`? See `build.gradle` and `gradle/validation/*.gradle` files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1545694199 > Use ArrayDeque instead of java.util.Stack (synchronized methods)? We should but this on forbiddenapis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1192326625 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -39,6 +39,7 @@ import java.util.List; import java.util.Map; import java.util.Set; +import java.util.Stack; Review Comment: Use `Deque` interface with `ArrayDeque` as implementation instead, Stack is a relic (synchronized) from JDK 1.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent merged pull request #12197: [Backport] GITHUB-11838 Add api to allow concurrent query rewrite
benwtrent merged PR #12197: URL: https://github.com/apache/lucene/pull/12197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1192435347 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -1284,7 +1290,7 @@ public static int[] topoSortStates(Automaton a) { int numStates = a.getNumStates(); int[] states = new int[numStates]; final BitSet visited = new BitSet(numStates); Review Comment: yes, it will make code look more cleaner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1192435950 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -39,6 +39,7 @@ import java.util.List; import java.util.Map; import java.util.Set; +import java.util.Stack; Review Comment: fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1545825629 @rmuir I have added a unit test to evaluate the toposort functionality. I developed a method that generates a random automaton, and you can include a cycle by passing the hasCycle argument. This will allow us to test both toposort and cycle detection. @uschindler @dweiss Thank you for your advice. I have made the change from using Stack to Deque -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent opened a new pull request, #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)
benwtrent opened a new pull request, #12288: URL: https://github.com/apache/lucene/pull/12288 Backport of #12160 --- ### Description Issue #11862 ### Solution `AbstractKnnVectorQuery` currently performs HNSW searches (one per-segment) iteratively Since this is done in a single thread, we can make it concurrent by spinning off per-segment searches to different threads (and make use of available processors) The actual search is performed in `Query#rewrite`, and support to allow concurrency there was added recently (#11838) by passing an `IndexSearcher` (which wraps an `IndexReader` and `Executor`) Proposing to achieve this by `CompletableFuture`: - If the `Executor` is not set, we can perform a blocking call to `CompletableFuture#completedFuture` - Else we submit the task of per-segment search to the `Executor` using `CompletableFuture#supplyAsync` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent merged pull request #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)
benwtrent merged PR #12288: URL: https://github.com/apache/lucene/pull/12288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)
benwtrent commented on PR #12288: URL: https://github.com/apache/lucene/pull/12288#issuecomment-1545889373 //cc @zhaih && @kaivalnp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent closed issue #11862: Concurrent rewrite for KnnVectorQuery
benwtrent closed issue #11862: Concurrent rewrite for KnnVectorQuery URL: https://github.com/apache/lucene/issues/11862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on issue #11862: Concurrent rewrite for KnnVectorQuery
benwtrent commented on issue #11862: URL: https://github.com/apache/lucene/issues/11862#issuecomment-1545890266 Merged and backported for a future Lucene release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent closed issue #11838: Adding concurrency to query rewrite?
benwtrent closed issue #11838: Adding concurrency to query rewrite? URL: https://github.com/apache/lucene/issues/11838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on issue #11838: Adding concurrency to query rewrite?
benwtrent commented on issue #11838: URL: https://github.com/apache/lucene/issues/11838#issuecomment-1545891005 Merged and backported for a future Lucene release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12235: Optimize HNSW diversity calculation
benwtrent commented on PR #12235: URL: https://github.com/apache/lucene/pull/12235#issuecomment-1546025921 @zhaih I verified your change doesn't hurt recall. Is this change good to continue working on? I tried replicating your performance improvements, but did not see that big of a change. My search time latency might be drowning out the index build latency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller opened a new pull request, #12289: FilterLeafReader$FilterTerms#intersect now delegates to wrapped Terms
gsmiller opened a new pull request, #12289: URL: https://github.com/apache/lucene/pull/12289 ### Description Right now, `FilterTerms#intersect` is picking up the default implementation from `Terms`, which hides any custom implementation provided by the wrapped `Terms` instance. This is unfortunate since some `Terms` implementations provide optimized intersection logic (e.g., BlockTree's `FieldReader#intersect`). We should instead delegate to the wrapped `Terms` instance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #12289: FilterLeafReader$FilterTerms#intersect now delegates to wrapped Terms
gsmiller commented on PR #12289: URL: https://github.com/apache/lucene/pull/12289#issuecomment-1546288958 Note that `testOverrideMethods` is failing right now. I can update this test, but I wonder if there's a good reason we wouldn't want to make the change proposed here? Seems like we should, but open to feedback. Maybe I'm missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?
gsmiller commented on issue #12276: URL: https://github.com/apache/lucene/issues/12276#issuecomment-1546298721 +1 to `StringsToAutomaton` language. I'd suggest keeping "Builder" in the name. In general, also +1 to renaming. The algorithm doesn't have to only be applied to strings (e.g., it could directly be applied to binary data, as long as it's sorted). So having something that references "string" in the name feels right to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]
gsmiller commented on issue #11547: URL: https://github.com/apache/lucene/issues/11547#issuecomment-1546327008 @stefanvodita it's been a while since I looked at this, and @LuXugang had the idea originally so he's really best positioned to clarify, but since you haven't gotten a response yet, I'll attempt to describe what I _think_ this is capturing. In faceting, we have this pattern where we intersect the collected hits (from a `FacetsCollector`) with some doc values iterator. We tend to do this with `ConjunctionUtils#intersectIterators`. I think the observation is that, if every doc in a segment matched (and it being faceted on), we could just iterate the doc values directly without the intersection. We can know this with `FacetsCollector` since `MatchingDocs$totalHits` would equal the number of docs in the segment. I don't see a way to do this directly in `ConjunctionUtils#intersectIterators` since—as you point out—we can't generally know the number of hits an iterator will provide up-front. I guess we could have some new utility method that intersects a `MatchingDocs` instance with other iterators? This feels clunky to me though, and I don't like the idea of adding to a general utility for such a specific case. We could add some conditional logic in faceting implementations that optimizes for this case I suppose. Most faceting implementations already have a "facet on everything" type implementation that we could delegate to? To be honest, I'm not sure this is a particularly common use-case, or that we'll see meaningful performance from it, but we could benchmark and see? Anyway, this is my interpretation, but I may be missing something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]
stefanvodita commented on issue #11547: URL: https://github.com/apache/lucene/issues/11547#issuecomment-1546370839 Thanks @gsmiller! Your explanation makes perfect sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #12235: Optimize HNSW diversity calculation
zhaih commented on PR #12235: URL: https://github.com/apache/lucene/pull/12235#issuecomment-1546537128 @benwtrent I was originally waiting for https://github.com/apache/lucene/pull/12246 to be merged first so that this PR won't break their test, but seems that one is blocked somehow by other PR so I'll just fix the problem here. I think if this time all the checks passed I'll just merge this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
zhaih commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1546540855 > would be ok for you to try to get involved in the other contribution, trying to align your intent with the guy in there? I'll for sure trying to review that PR, but that one is trying to introduce a new data structure while this one is actually trying to add a quick workaround for `HnswGraph`'s API (which discourage thread-safe), there's no conflict from the beginning IMO. > is there any reason you believe the contribution from @zhaih to not be an improvement? In my understanding @jimczi is trying to say this change some what encourages the use of `OnHeapHnswGraph`, which originally is only used to build the graph in memory and shouldn't be used outside. I'm ok with that, but to be fair: **this PR is not trying to promote the use of that, it is trying to solve a potential issue in the current trunk.** And the real factor that encourages the use is: 1. It is a public class, even without `lucene.internal` tag 2. All related tools (Builder, Searcher) are public available 3. There's already an example of using it being recently added and no people are against it. I'm not sure what are all the solutions are, but stopping this PR won't solve any of the problems above, but will only make the thread-safety problem stays in the trunk longer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kaivalnp commented on pull request #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)
kaivalnp commented on PR #12288: URL: https://github.com/apache/lucene/pull/12288#issuecomment-1546547247 Thank you for following up on this @benwtrent! Much appreciated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org