[GitHub] [lucene] alessandrobenedetti commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-10 Thread via GitHub


alessandrobenedetti commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1541925563

   I still think this contribution to be valuable, as I don't like much the 
fact that OnHeapGraph is stateful.
   But I agree that the other contribution is solving a very similar problem.
   @zhaih would be ok for you to try to get involved in the other contribution, 
trying to align your intent with the guy in there?
   If you don't have the time, I'll try to find some, by the end of the week, 
as it's a very interesting topic.
   
   In regard to avoiding people using the OnHeapHnswHraph outside the builder, 
I suspect we'll need to restructure the code in some way, as simple comments 
won't prevent future people to not using public classes.
   And by the way, I also agree that we can improve the Word2Vec synonym filter 
in the future, changing the OnHeap approach with an OffHeap one, but rather 
than reverting it (and very likely lose the contribution) I do believe that 
incremental updates are the key here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on pull request #12257: Add multi-thread searchability to OnHeapHnswGraph

2023-05-10 Thread via GitHub


alessandrobenedetti commented on PR #12257:
URL: https://github.com/apache/lucene/pull/12257#issuecomment-1541938725

   Another option, to be honest, is also to merge this first, and then the 
other committer will have to resolve the conflicts.
   This one is a very minimal change, so it doesn't seem to me like a massive 
problem for the other pull request to deal with.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis opened a new pull request, #12281: Improve error checking in similarity functions, and use double precision internally

2023-05-10 Thread via GitHub


jbellis opened a new pull request, #12281:
URL: https://github.com/apache/lucene/pull/12281

   Cosine of two equal vectors is exactly 1, but we're losing too much 
precision on large-dimension vectors and ending up with NaN.  (Presumably this 
does bad things for vectors that are not exactly equal as well.)
   
   This PR adds error checking (if similarity results in NaN or infinity, try 
to show where it came from), adds a failing test, and updates the similarity 
functions to use double precision math internally.  (They continue to return 
float; the signatures do not change.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev merged pull request #12245: `ToParentBlockJoinQuery` Explain Support Score Mode

2023-05-10 Thread via GitHub


mkhludnev merged PR #12245:
URL: https://github.com/apache/lucene/pull/12245


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev opened a new pull request, #12283: `ToParentBlockJoinQuery` Explain Support Score Mode (#12245)

2023-05-10 Thread via GitHub


mkhludnev opened a new pull request, #12283:
URL: https://github.com/apache/lucene/pull/12283

   `ToParentBlockJoinQuery` Explain Support Score Mode (#12245)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev commented on pull request #12283: `ToParentBlockJoinQuery` Explain Support Score Mode (#12245)

2023-05-10 Thread via GitHub


mkhludnev commented on PR #12283:
URL: https://github.com/apache/lucene/pull/12283#issuecomment-1542489488

   @MarcusSorealheis FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev merged pull request #12283: `ToParentBlockJoinQuery` Explain Support Score Mode (#12245)

2023-05-10 Thread via GitHub


mkhludnev merged PR #12283:
URL: https://github.com/apache/lucene/pull/12283


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] veita opened a new issue, #12284: input automaton is too large: 1001 in Operations.topoSortStatesRecurse(Operations.java:1357)

2023-05-10 Thread via GitHub


veita opened a new issue, #12284:
URL: https://github.com/apache/lucene/issues/12284

   ### Description
   
   The error below appears many times per day and quickly fills the Solr logs 
up to gigabytes in size.
   
   Unfortunately I cannot tell what kind of input causes the error. However, 
the issue seems to be similar to https://github.com/apache/lucene/issues/11809.
   
   ```
   ERROR 2023-05-09T18:31:56,633Z - 
org.apache.solr.servlet.HttpSolrCall[qtp1441070244-500081]
java.lang.IllegalArgumentException: input automaton is too large: 1001
   java.lang.IllegalArgumentException: input automaton is too large: 1001
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1349)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
... 990 identical lines
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1357)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.util.automaton.Operations.topoSortStates(Operations.java:1325)
 ~[lucene-core-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - ivera 
- 2022-07-25 12:30:23]
at 
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.replaceSep(AnalyzingSuggester.java:278)
 ~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - 
ivera - 2022-07-25 12:30:23]
at 
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.toAutomaton(AnalyzingSuggester.java:877)
 ~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - 
ivera - 2022-07-25 12:30:23]
at 
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:417)
 ~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - 
ivera - 2022-07-25 12:30:23]
at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:175) 
~[lucene-suggest-9.3.0.jar:9.3.0 d25cebcef7a80369f4dfb9285ca7360a810b75dc - 
ivera - 2022-07-25 12:30:23]
at 
org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:175) 
~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - magibney 
- 2023-01-17 19:58:00]
at 
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:197)
 ~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - 
magibney - 2023-01-17 19:58:00]
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:384)
 ~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - 
magibney - 2023-01-17 19:58:00]
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:224)
 ~[solr-core-9.1.1.jar:9.1.1 d998e63978abfedde3b75bab4ba6e1e78ddb5944 - 
magibney - 2023-01-17 19:58:00]
at org.apache.solr.core.SolrCore.e

[GitHub] [lucene] MarcusSorealheis commented on issue #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode

2023-05-10 Thread via GitHub


MarcusSorealheis commented on issue #12204:
URL: https://github.com/apache/lucene/issues/12204#issuecomment-1542796091

   @mkhludnev, @kashkambath, et. al this issue can be closed as the PR has been 
merged and backported to `9_x`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #12285: Simplify SliceExecutor and QueueSizeBasedExecutor

2023-05-10 Thread via GitHub


javanna opened a new pull request, #12285:
URL: https://github.com/apache/lucene/pull/12285

   The only behaviour that QueueSizeBasedExecutor overrides from SliceExecutor 
is when to execute on the caller thread. There is no need to override the whole 
invokeAll method for that. Instead, this commit introduces a 
shouldExecuteOnCallerThread method that can be overridden.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on issue #12284: input automaton is too large: 1001 in Operations.topoSortStatesRecurse(Operations.java:1357)

2023-05-10 Thread via GitHub


tang-hi commented on issue #12284:
URL: https://github.com/apache/lucene/issues/12284#issuecomment-1543284928

   It seems like the automaton has become too large and has exceeded the 
recursion limit. Perhaps we should consider changing topoSort to a 
non-recursive approach. I will work on fixing it when I have some time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi opened a new pull request, #12286: toposort use iterator to avoid stackoverflow

2023-05-10 Thread via GitHub


tang-hi opened a new pull request, #12286:
URL: https://github.com/apache/lucene/pull/12286

   ### Description
   
   In Issue #12284, I observed that Lucene limits the recursion level in the 
`topoSortStatesRecurse` method to avoid a StackOverflow error during automaton 
topological sorting. I propose we could use an iterative approach instead of 
recursion.
   I've implemented an iterative version as shown below. 
   
   private static int topoSortStatesRecurse(
 Automaton a, BitSet visited, int[] states) {
 
   Stack stack = new Stack<>();
   stack.push(0); // Assuming that the initial state is 0.
   int upto = 0;
   Transition t = new Transition();
   
   while (!stack.empty()) {
   int state = stack.pop();
   
   int count = a.initTransition(state, t);
   for (int i = 0; i < count; i++) {
   a.getNextTransition(t);
   if (!visited.get(t.dest)) {
   visited.set(t.dest);
   stack.push(t.dest);
   }
   }
   states[upto] = state;
   upto++;
   }
   return upto;
   }
   
   However, I noticed that the test 
[TestAutomaton.java](https://github.com/apache/lucene/blob/963ed7ce888724c2dd55fff8c13a08b81b20f535/lucene/core/src/test/org/apache/lucene/util/automaton/TestAutomaton.java#LL1205C7-L1205C7)
 depends on the order in which recursive calls are made or items are added to 
and removed from the stack. To maintain the same order as the recursive version 
in the iterative approach, so I've used a particular technique in the pull 
request.
   
   To further improve this change, here are my plans:
   1. Remove the trick and modify the test code accordingly.
   2. Perhaps we should check if the automaton contains cycles, and throw an 
`IllegalArgumentException` if it does?
   3. Should we continue to limit the size of the Automaton?
   I welcome any suggestions or feedback on this approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev commented on issue #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode

2023-05-10 Thread via GitHub


mkhludnev commented on issue #12204:
URL: https://github.com/apache/lucene/issues/12204#issuecomment-1543378416

   should be at 9.7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mkhludnev closed issue #12204: ToParentBlockJoinQuery's explain should depend on its ScoreMode

2023-05-10 Thread via GitHub


mkhludnev closed issue #12204: ToParentBlockJoinQuery's explain should depend 
on its ScoreMode
URL: https://github.com/apache/lucene/issues/12204


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org