[GitHub] [lucene] risdenk commented on pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-12 Thread via GitHub


risdenk commented on PR #12266:
URL: https://github.com/apache/lucene/pull/12266#issuecomment-1545691179

   Is it possible that all the build logic can be simplified like the other 
plugins and not be in `settings.gradle`? See `build.gradle` and 
`gradle/validation/*.gradle` files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-12 Thread via GitHub


uschindler commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1545694199

   > Use ArrayDeque instead of java.util.Stack (synchronized methods)?
   
   We should but this on forbiddenapis.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-12 Thread via GitHub


uschindler commented on code in PR #12286:
URL: https://github.com/apache/lucene/pull/12286#discussion_r1192326625


##
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java:
##
@@ -39,6 +39,7 @@
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.Stack;

Review Comment:
   Use `Deque` interface with `ArrayDeque` as implementation instead, Stack is 
a relic (synchronized) from JDK 1.0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent merged pull request #12197: [Backport] GITHUB-11838 Add api to allow concurrent query rewrite

2023-05-12 Thread via GitHub


benwtrent merged PR #12197:
URL: https://github.com/apache/lucene/pull/12197


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-12 Thread via GitHub


tang-hi commented on code in PR #12286:
URL: https://github.com/apache/lucene/pull/12286#discussion_r1192435347


##
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java:
##
@@ -1284,7 +1290,7 @@ public static int[] topoSortStates(Automaton a) {
 int numStates = a.getNumStates();
 int[] states = new int[numStates];
 final BitSet visited = new BitSet(numStates);

Review Comment:
   yes, it will make code look more cleaner



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-12 Thread via GitHub


tang-hi commented on code in PR #12286:
URL: https://github.com/apache/lucene/pull/12286#discussion_r1192435950


##
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java:
##
@@ -39,6 +39,7 @@
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.Stack;

Review Comment:
   fix it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-12 Thread via GitHub


tang-hi commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1545825629

   @rmuir I have added a unit test to evaluate the toposort functionality. I 
developed a method that generates a random automaton, and you can include a 
cycle by passing the hasCycle argument. This will allow us to test both 
toposort and cycle detection.
   
   @uschindler @dweiss Thank you for your advice. I have made the change from 
using Stack to Deque


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent opened a new pull request, #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)

2023-05-12 Thread via GitHub


benwtrent opened a new pull request, #12288:
URL: https://github.com/apache/lucene/pull/12288

   Backport of #12160 
   
   ---
   
   ### Description
   Issue #11862
   
   ### Solution
   `AbstractKnnVectorQuery` currently performs HNSW searches (one per-segment) 
iteratively
   Since this is done in a single thread, we can make it concurrent by spinning 
off per-segment searches to different threads (and make use of available 
processors)
   
   The actual search is performed in `Query#rewrite`, and support to allow 
concurrency there was added recently (#11838) by passing an `IndexSearcher` 
(which wraps an `IndexReader` and `Executor`)
   
   Proposing to achieve this by `CompletableFuture`:
   - If the `Executor` is not set, we can perform a blocking call to 
`CompletableFuture#completedFuture`
   - Else we submit the task of per-segment search to the `Executor` using 
`CompletableFuture#supplyAsync`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent merged pull request #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)

2023-05-12 Thread via GitHub


benwtrent merged PR #12288:
URL: https://github.com/apache/lucene/pull/12288


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)

2023-05-12 Thread via GitHub


benwtrent commented on PR #12288:
URL: https://github.com/apache/lucene/pull/12288#issuecomment-1545889373

   //cc @zhaih && @kaivalnp 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent closed issue #11862: Concurrent rewrite for KnnVectorQuery

2023-05-12 Thread via GitHub


benwtrent closed issue #11862: Concurrent rewrite for KnnVectorQuery
URL: https://github.com/apache/lucene/issues/11862


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on issue #11862: Concurrent rewrite for KnnVectorQuery

2023-05-12 Thread via GitHub


benwtrent commented on issue #11862:
URL: https://github.com/apache/lucene/issues/11862#issuecomment-1545890266

   Merged and backported for a future Lucene release


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent closed issue #11838: Adding concurrency to query rewrite?

2023-05-12 Thread via GitHub


benwtrent closed issue #11838: Adding concurrency to query rewrite?
URL: https://github.com/apache/lucene/issues/11838


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on issue #11838: Adding concurrency to query rewrite?

2023-05-12 Thread via GitHub


benwtrent commented on issue #11838:
URL: https://github.com/apache/lucene/issues/11838#issuecomment-1545891005

   Merged and backported for a future Lucene release


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12235: Optimize HNSW diversity calculation

2023-05-12 Thread via GitHub


benwtrent commented on PR #12235:
URL: https://github.com/apache/lucene/pull/12235#issuecomment-1546025921

   @zhaih I verified your change doesn't hurt recall. Is this change good to 
continue working on? I tried replicating your performance improvements, but did 
not see that big of a change. My search time latency might be drowning out the 
index build latency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller opened a new pull request, #12289: FilterLeafReader$FilterTerms#intersect now delegates to wrapped Terms

2023-05-12 Thread via GitHub


gsmiller opened a new pull request, #12289:
URL: https://github.com/apache/lucene/pull/12289

   ### Description
   
   Right now, `FilterTerms#intersect` is picking up the default implementation 
from `Terms`, which hides any custom implementation provided by the wrapped 
`Terms` instance. This is unfortunate since some `Terms` implementations 
provide optimized intersection logic (e.g., BlockTree's 
`FieldReader#intersect`). We should instead delegate to the wrapped `Terms` 
instance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #12289: FilterLeafReader$FilterTerms#intersect now delegates to wrapped Terms

2023-05-12 Thread via GitHub


gsmiller commented on PR #12289:
URL: https://github.com/apache/lucene/pull/12289#issuecomment-1546288958

   Note that `testOverrideMethods` is failing right now. I can update this 
test, but I wonder if there's a good reason we wouldn't want to make the change 
proposed here? Seems like we should, but open to feedback. Maybe I'm missing 
something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on issue #12276: Maybe rename `DaciukMihovAutomatonBuilder`?

2023-05-12 Thread via GitHub


gsmiller commented on issue #12276:
URL: https://github.com/apache/lucene/issues/12276#issuecomment-1546298721

   +1 to `StringsToAutomaton` language. I'd suggest keeping "Builder" in the 
name.
   
   In general, also +1 to renaming. The algorithm doesn't have to only be 
applied to strings (e.g., it could directly be applied to binary data, as long 
as it's sorted). So having something that references "string" in the name feels 
right to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]

2023-05-12 Thread via GitHub


gsmiller commented on issue #11547:
URL: https://github.com/apache/lucene/issues/11547#issuecomment-1546327008

   @stefanvodita it's been a while since I looked at this, and @LuXugang had 
the idea originally so he's really best positioned to clarify, but since you 
haven't gotten a response yet, I'll attempt to describe what I _think_ this is 
capturing.
   
   In faceting, we have this pattern where we intersect the collected hits 
(from a `FacetsCollector`) with some doc values iterator. We tend to do this 
with `ConjunctionUtils#intersectIterators`. I think the observation is that, if 
every doc in a segment matched (and it being faceted on), we could just iterate 
the doc values directly without the intersection. We can know this with 
`FacetsCollector` since `MatchingDocs$totalHits` would equal the number of docs 
in the segment.
   
   I don't see a way to do this directly in 
`ConjunctionUtils#intersectIterators` since—as you point out—we can't generally 
know the number of hits an iterator will provide up-front. I guess we could 
have some new utility method that intersects a `MatchingDocs` instance with 
other iterators? This feels clunky to me though, and I don't like the idea of 
adding to a general utility for such a specific case.
   
   We could add some conditional logic in faceting implementations that 
optimizes for this case I suppose. Most faceting implementations already have a 
"facet on everything" type implementation that we could delegate to? To be 
honest, I'm not sure this is a particularly common use-case, or that we'll see 
meaningful performance from it, but we could benchmark and see?
   
   Anyway, this is my interpretation, but I may be missing something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] stefanvodita commented on issue #11547: IntersectIterators is not necessary under matchAll case in Facet [LUCENE-10511]

2023-05-12 Thread via GitHub


stefanvodita commented on issue #11547:
URL: https://github.com/apache/lucene/issues/11547#issuecomment-1546370839

   Thanks @gsmiller! Your explanation makes perfect sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #12235: Optimize HNSW diversity calculation

2023-05-12 Thread via GitHub


zhaih commented on PR #12235:
URL: https://github.com/apache/lucene/pull/12235#issuecomment-1546537128

   @benwtrent I was originally waiting for 
https://github.com/apache/lucene/pull/12246 to be merged first so that this PR 
won't break their test, but seems that one is blocked somehow by other PR so 
I'll just fix the problem here.
   
   I think if this time all the checks passed I'll just merge this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-12 Thread via GitHub


zhaih commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1546540855

   > would be ok for you to try to get involved in the other contribution, 
trying to align your intent with the guy in there?
   
   I'll for sure trying to review that PR, but that one is trying to introduce 
a new data structure while this one is actually trying to add a quick 
workaround for `HnswGraph`'s API (which discourage thread-safe), there's no 
conflict from the beginning IMO.
   
   > is there any reason you believe the contribution from @zhaih to not be an 
improvement?
   
   In my understanding @jimczi  is trying to say this change some what 
encourages the use of `OnHeapHnswGraph`, which originally is only used to build 
the graph in memory and shouldn't be used outside.
   
   I'm ok with that, but to be fair: **this PR is not trying to promote the use 
of that, it is trying to solve a potential issue in the current trunk.** And 
the real factor that encourages the use is:
   1. It is a public class, even without `lucene.internal` tag
   2. All related tools (Builder, Searcher) are public available
   3. There's already an example of using it being recently added and no people 
are against it.
   
   I'm not sure what are all the solutions are, but stopping this PR won't 
solve any of the problems above, but will only make the thread-safety problem 
stays in the trunk longer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] kaivalnp commented on pull request #12288: Backport: Concurrent rewrite for KnnVectorQuery (#12160)

2023-05-12 Thread via GitHub


kaivalnp commented on PR #12288:
URL: https://github.com/apache/lucene/pull/12288#issuecomment-1546547247

   Thank you for following up on this @benwtrent! Much appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org