[GitHub] [lucene] alessandrobenedetti commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
alessandrobenedetti commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1541925563 I still think this contribution to be valuable, as I don't like much the fact that OnHeapGraph is stateful. But I agree that the other contribution is solving a very similar problem. @zhaih would be ok for you to try to get involved in the other contribution, trying to align your intent with the guy in there? If you don't have the time, I'll try to find some, by the end of the week, as it's a very interesting topic. In regard to avoiding people using the OnHeapHnswHraph outside the builder, I suspect we'll need to restructure the code in some way, as simple comments won't prevent future people to not using public classes. And by the way, I also agree that we can improve the Word2Vec synonym filter in the future, changing the OnHeap approach with an OffHeap one, but rather than reverting it (and very likely lose the contribution) I do believe that incremental updates are the key here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph
alessandrobenedetti commented on PR #12257: URL: https://github.com/apache/lucene/pull/12257#issuecomment-1541938725 Another option, to be honest, is also to merge this first, and then the other committer will have to resolve the conflicts. This one is a very minimal change, so it doesn't seem to me like a massive problem for the other pull request to deal with. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jbellis opened a new pull request, #12281: Improve error checking in similarity functions, and use double precision internally
jbellis opened a new pull request, #12281: URL: https://github.com/apache/lucene/pull/12281 Cosine of two equal vectors is exactly 1, but we're losing too much precision on large-dimension vectors and ending up with NaN. (Presumably this does bad things for vectors that are not exactly equal as well.) This PR adds error checking (if similarity results in NaN or infinity, try to show where it came from), adds a failing test, and updates the similarity functions to use double precision math internally. (They continue to return float; the signatures do not change.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev merged pull request #12245: `ToParentBlockJoinQuery` Explain Support Score Mode
mkhludnev merged PR #12245: URL: https://github.com/apache/lucene/pull/12245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev opened a new pull request, #12283: `ToParentBlockJoinQuery` Explain Support Score Mode (#12245)
mkhludnev opened a new pull request, #12283: URL: https://github.com/apache/lucene/pull/12283 `ToParentBlockJoinQuery` Explain Support Score Mode (#12245) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev commented on pull request #12283: `ToParentBlockJoinQuery` Explain Support Score Mode (#12245)
mkhludnev commented on PR #12283: URL: https://github.com/apache/lucene/pull/12283#issuecomment-1542489488 @MarcusSorealheis FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev merged pull request #12283: `ToParentBlockJoinQuery` Explain Support Score Mode (#12245)
mkhludnev merged PR #12283: URL: https://github.com/apache/lucene/pull/12283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] veita opened a new issue, #12284: input automaton is too large: 1001 in Operations.topoSortStatesRecurse(Operations.java:1357)
veita opened a new issue, #12284: URL: https://github.com/apache/lucene/issues/12284 ### Description The error below appears many times per day and quickly fills the Solr logs up to gigabytes in size. Unfortunately I cannot tell what kind of input causes the error. However, the issue seems to be similar to https://github.com/apache/lucene/issues/11809. ``` ERROR 2023-05-09T18:31:56,633Z - org.apache.solr.servlet.HttpSolrCall[qtp1441070244-500081] java.lang.IllegalArgumentException: input automaton is too large: 1001 java.lang.IllegalArgumentException: input automaton is too large: 1001 at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1349) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] ... 990 identical lines at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.util.automaton.Operations.topoSortStates(Operations.java:1325) ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.replaceSep(AnalyzingSuggester.java:278) ~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.toAutomaton(AnalyzingSuggester.java:877) ~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:417) ~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:175) ~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera - 2022-07-25 12:30:23] at org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:175) ~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - magibney - 2023-01-17 19:58:00] at org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:197) ~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - magibney - 2023-01-17 19:58:00] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:384) ~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - magibney - 2023-01-17 19:58:00] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:224) ~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - magibney - 2023-01-17 19:58:00] at org.apache.solr.core.SolrCore.e
[GitHub] [lucene] MarcusSorealheis commented on issue #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode
MarcusSorealheis commented on issue #12204: URL: https://github.com/apache/lucene/issues/12204#issuecomment-1542796091 @mkhludnev, @kashkambath, et. al this issue can be closed as the PR has been merged and backported to `9_x` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request, #12285: Simplify SliceExecutor and QueueSizeBasedExecutor
javanna opened a new pull request, #12285: URL: https://github.com/apache/lucene/pull/12285 The only behaviour that QueueSizeBasedExecutor overrides from SliceExecutor is when to execute on the caller thread. There is no need to override the whole invokeAll method for that. Instead, this commit introduces a shouldExecuteOnCallerThread method that can be overridden. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #12284: input automaton is too large: 1001 in Operations.topoSortStatesRecurse(Operations.java:1357)
tang-hi commented on issue #12284: URL: https://github.com/apache/lucene/issues/12284#issuecomment-1543284928 It seems like the automaton has become too large and has exceeded the recursion limit. Perhaps we should consider changing topoSort to a non-recursive approach. I will work on fixing it when I have some time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi opened a new pull request, #12286: toposort use iterator to avoid stackoverflow
tang-hi opened a new pull request, #12286: URL: https://github.com/apache/lucene/pull/12286 ### Description In Issue #12284, I observed that Lucene limits the recursion level in the `topoSortStatesRecurse` method to avoid a StackOverflow error during automaton topological sorting. I propose we could use an iterative approach instead of recursion. I've implemented an iterative version as shown below. private static int topoSortStatesRecurse( Automaton a, BitSet visited, int[] states) { Stack stack = new Stack<>(); stack.push(0); // Assuming that the initial state is 0. int upto = 0; Transition t = new Transition(); while (!stack.empty()) { int state = stack.pop(); int count = a.initTransition(state, t); for (int i = 0; i < count; i++) { a.getNextTransition(t); if (!visited.get(t.dest)) { visited.set(t.dest); stack.push(t.dest); } } states[upto] = state; upto++; } return upto; } However, I noticed that the test [TestAutomaton.java](https://github.com/apache/lucene/blob/963ed7ce888724c2dd55fff8c13a08b81b20f535/lucene/core/src/test/org/apache/lucene/util/automaton/TestAutomaton.java#LL1205C7-L1205C7) depends on the order in which recursive calls are made or items are added to and removed from the stack. To maintain the same order as the recursive version in the iterative approach, so I've used a particular technique in the pull request. To further improve this change, here are my plans: 1. Remove the trick and modify the test code accordingly. 2. Perhaps we should check if the automaton contains cycles, and throw an `IllegalArgumentException` if it does? 3. Should we continue to limit the size of the Automaton? I welcome any suggestions or feedback on this approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev commented on issue #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode
mkhludnev commented on issue #12204: URL: https://github.com/apache/lucene/issues/12204#issuecomment-1543378416 should be at 9.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mkhludnev closed issue #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode
mkhludnev closed issue #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode URL: https://github.com/apache/lucene/issues/12204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org