[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


uschindler commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547398338

   I can take care of that, but I would like to get a final "go" by @rmuir. :-) 
   Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] romseygeek commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


romseygeek commented on code in PR #12286:
URL: https://github.com/apache/lucene/pull/12286#discussion_r1193625395


##
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java:
##
@@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) {
 return states;
   }
 
-  // TODO: not great that this is recursive... in theory a
-  // large automata could exceed java's stack so the maximum level of 
recursion is bounded to 1000
-  private static int topoSortStatesRecurse(
-  Automaton a, BitSet visited, int[] states, int upto, int state, int 
level) {
-if (level > MAX_RECURSION_LEVEL) {
-  throw new IllegalArgumentException("input automaton is too large: " + 
level);
-}
+  /**
+   * Performs a topological sort on the states of the given Automaton.
+   *
+   * @param a The automaton whose states are to be topologically sorted.
+   * @param states An int array which stores the states.
+   * @return the reversed topologically sorted array of state ids

Review Comment:
   minor nit: this should say 'returns the number of states in the final sorted 
list' I think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


tang-hi commented on code in PR #12286:
URL: https://github.com/apache/lucene/pull/12286#discussion_r1193672636


##
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java:
##
@@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) {
 return states;
   }
 
-  // TODO: not great that this is recursive... in theory a
-  // large automata could exceed java's stack so the maximum level of 
recursion is bounded to 1000
-  private static int topoSortStatesRecurse(
-  Automaton a, BitSet visited, int[] states, int upto, int state, int 
level) {
-if (level > MAX_RECURSION_LEVEL) {
-  throw new IllegalArgumentException("input automaton is too large: " + 
level);
-}
+  /**
+   * Performs a topological sort on the states of the given Automaton.
+   *
+   * @param a The automaton whose states are to be topologically sorted.
+   * @param states An int array which stores the states.
+   * @return the reversed topologically sorted array of state ids

Review Comment:
   Yes, My bad.Already fix that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


uschindler merged PR #12286:
URL: https://github.com/apache/lucene/pull/12286


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub


tang-hi commented on issue #12291:
URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547968987

   Good Catch! Could you submit a PR to fix that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


tang-hi commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547976361

   Thank you everyone for your valuable comments on my PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub


uschindler commented on issue #12291:
URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547978601

   In general I'd suggest to figure out if we should not change the stopword 
file parser to strip blank lines like comments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


uschindler commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547980379

   Thanks @tang-hi for the nice work! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


uschindler commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547983951

   One additional thing as my last comment: We moved from recursive to 
iterative, but we still have a stack (deque). It is not so limited like the OS 
stack by the Java VM, but still for some FSA it could consume a lot of heap 
space.
   
   Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


tang-hi commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548011512

   > One additional thing as my last comment: We moved from recursive to 
iterative, but we still have a stack (deque). It is not so limited like the OS 
stack by the Java VM, but still for some FSA it could consume a lot of heap 
space.
   > 
   > Any thoughts?
   
   The only approach I can think of to limit heap consumption is to restrict 
the size of deque. However, I believe this will only occur in special cases. In 
addition, if users wish to perform topological sorting on large automata and 
are prepared to accommodate the associated memory consumption and time costs, I 
don't think we should impose any limitations.If there are any better 
suggestions, I am open to them at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


mikemccand commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548020174

   I think the RAM consumption is OK, but, we should clearly advertise it in 
the javadocs for this method?  Since we detect cycles we will never have an 
"attempt to use infinite RAM".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub


mikemccand commented on issue #12291:
URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548022505

   I think the stoplist loader already ignores comment lines, but, does not 
ignore empty lines!  Darned empty string rears its head at us again...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


uschindler commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548033460

   I think we have the same problem with other methods in this class that were 
transformed to be iterative earlier.
   So we should maybe make a comment about what types of automatons could 
consume lot of space. But for end users it might be important to have this in 
our query classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JerryChin commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub


JerryChin commented on issue #12291:
URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548065374

   Hi @tang-hi,
   
   I can summit a PR to fix this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi opened a new pull request, #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub


tang-hi opened a new pull request, #12292:
URL: https://github.com/apache/lucene/pull/12292

   ### Description
   Update javadoc based on comment in #12286 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


tang-hi commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548069204

   I think adding a comment would be great. I have already submitted a new pull 
request. @uschindler @mikemccand 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


uschindler commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548081406

   Looks fine, I can merge that later, no need for additional changes.txt.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub


uschindler commented on PR #12292:
URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548082691

   I will merge that later, no need to add changes.txt. If you really like, you 
can of course add this PR's issue number to the existing change log entry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub


uschindler commented on issue #12291:
URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548087039

   Also here: 
https://github.com/apache/lucene/blob/5d203f8337cb6a2350c1abe5d83e3e103d060645/lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java#L119


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub


tang-hi commented on PR #12292:
URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548096138

   > I will merge that later, no need to add changes.txt. If you really like, 
you can of course add this PR's issue number to the existing change log entry.
   
   It's okay to merge without updating the changes.txt file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


uschindler commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548141965

   Hi,
   I had to fix the test to use `TestUtil#nextInt(min, max)` as Java 11 has no 
two-parameter `Random#netInt(origin, bound)`. I had to substract 1 from bound 
as its is inclusive in TestUtil.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub


uschindler merged PR #12292:
URL: https://github.com/apache/lucene/pull/12292


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] runningcode commented on pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-15 Thread via GitHub


runningcode commented on PR #12266:
URL: https://github.com/apache/lucene/pull/12266#issuecomment-1548170425

   @risdenk Yes that is certainly possible. I've opened a PR here to do this: 
https://github.com/apache/lucene/pull/12293
   
   Feel free to close this PR in favor of the other one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] runningcode opened a new pull request, #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-15 Thread via GitHub


runningcode opened a new pull request, #12293:
URL: https://github.com/apache/lucene/pull/12293

   Description
   This PR publishes a build scan for every CI build on Jenkins and GitHub 
Actions and for every local build from an authenticated Apache committer. The 
build will not fail if publishing fails.
   
   The build scans of the Apache Lucene project are published to the Gradle 
Enterprise instance at [ge.apache.org](https://ge.apache.org/), hosted by the 
Apache Software Foundation and run in partnership between the ASF and Gradle. 
This Gradle Enterprise instance has all features and extensions enabled and is 
freely available for use by the Apache Lucene project and all other Apache 
projects.
   
   This pull request enhances the functionality of publishing build scans to 
the publicly available [scans.gradle.com](https://scans.gradle.com/) by instead 
publishing build scans to [ge.apache.org](https://ge.apache.org/). On this 
Gradle Enterprise instance, Apache Lucene will have access not only to all of 
the published build scans but other aggregate data features such as:
   
   Dashboards to view all historical build scans, along with performance trends 
over time
   Build failure analytics for enhanced investigation and diagnosis of build 
failures
   Test failure analytics to better understand trends and causes around slow, 
failing, and flaky tests
   If interested in exploring a fully populated Gradle Enterprise instance, 
please explore the builds already connected to 
[ge.apache.org](https://ge.apache.org/), the [Spring project’s 
instance](https://ge.spring.io/scans?search.relativeStartTime=P28D&search.timeZoneId=Europe/Zurich),
 or any number of other [OSS 
projects](https://gradle.com/enterprise-customers/oss-projects/) for which we 
sponsor instances of Gradle Enterprise.
   
   Please let me know if there are any questions about the value of Gradle 
Enterprise or the changes in this pull request and I’d be happy to address them.
   
   This PR supersedes https://github.com/apache/lucene/pull/12266


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new pull request, #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub


uschindler opened a new pull request, #12294:
URL: https://github.com/apache/lucene/pull/12294

   This is the Java 21 version of MemorySegments following [JEP 
442](https://openjdk.org/jeps/442).
   
   There are not many changes:
   - Update scriptDepVersion's ASM to 9.5 and extract the preview API to a 
separate APIJAR file
   - make a clone of `MemorySegmentIndexInput` (no changes needed)
   - make a clone of `MemorySegmentIndexInputProvider` and adapt allocating a 
shared `Arena`. Also adapt FileChannel's map API to use `Arena`directly (scope 
is gone there).
   
   I will make this PR a draft until JDK 21 is in final phase (RC).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub


uschindler commented on PR #12294:
URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548241540

   I will setup testing on Policeman Jenkins soon!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub


uschindler commented on PR #12294:
URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548251301

   Hi  @mcimadamore,
   maybe also have a quick look.
   Thanks, Uwe


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


rmuir commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548401420

   @tang-hi @uschindler @mikemccand I think now that there are no more 
recursive algorithms in `src/java` we can now move 
`Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `src/test`? This 
test helper still uses it only because it has some simple recursive 
implementations used for validating correctness. And I think it is a good step 
to be able to move it to just tests!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub


uschindler commented on PR #12294:
URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548450784

   Policeman Jenkins looks fine: 
https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/845/console


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


mikemccand commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548617786

   > @tang-hi @uschindler @mikemccand I think now that there are no more 
recursive algorithms in `src/java` we can now move 
`Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `src/test`? This 
test helper still uses it only because it has some simple recursive 
implementations used for validating correctness. And I think it is a good step 
to be able to move it to just tests!
   
   Yay, what a great milestone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft opened a new pull request, #12295: Use `instanceof` pattern-matching where possible

2023-05-15 Thread via GitHub


JarvisCraft opened a new pull request, #12295:
URL: https://github.com/apache/lucene/pull/12295

   ### Description
   
   This PR enables the usage of `instanceof` pattern matching wherever possible 
(without changing semantics) reducing error-proneness and potentially enhancing 
readability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft opened a new pull request, #12296: Seal `IndexReaderContext`

2023-05-15 Thread via GitHub


JarvisCraft opened a new pull request, #12296:
URL: https://github.com/apache/lucene/pull/12296

   ### Description
   
   `IndexReaderContext` is already effectively sealed since it's constructor 
does type check throwing `Error` if `this` is neither instance of 
`CompositeReaderContext` nor `LeafReaderContext`.
   
   This PR simply makes this restriction explicit (codewise) turning it from a 
runtime error into a compile time one to extend this class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jainankitk opened a new issue, #12297: Unnecessary BM25Scorer allocations for non-scoring queries

2023-05-15 Thread via GitHub


jainankitk opened a new issue, #12297:
URL: https://github.com/apache/lucene/issues/12297

   ### Description
   
   While looking into customer issue, I noticed increase in GC time from Lucene 
7.x to 8.x. From the JVM histograms, one of the primary difference was float[] 
allocation. Took a heap dump to check the dominator and it was coming from 
BM25Scorer.
   
   The change seems to have come in with 
https://github.com/apache/lucene/commit/8fd7ead940f69a892dfc951a1aa042e8cae806c1,
 which removed some of the special-case logic around the "non-scoring 
similarity" embedded in IndexSearcher (returned in the false case from the old 
IndexSearcher#scorer(boolean needsScores)).
   
   ```
num #instances #bytes  class name (module)
   ---
  1:  24601972 4773247024  [B (java.base@11.0.17)
  2:   2100779 2061684496  [F (java.base@11.0.17)
  3:  33501475  804035400  java.util.ArrayList (java.base@11.0.17)
  4:  16232322  716523504  [Ljava.lang.Object; (java.base@11.0.17)
  5:  14819347  711328656  java.util.HashMap (java.base@11.0.17)
   ...
   ...
 34:   1106011   79632792  
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl
 35:   1979609   79184360  
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer
   ```
   
   I also validated that the scoring mode for these queries is 
COMPLETE_NO_SCORES, that has `needsScore` set to false:
   
   ```
   method=org.apache.lucene.search.TermQuery$TermWeight. location=AtExit
   ts=2023-05-01 22:32:57; [cost=0.014381ms] result=@ArrayList[
   @ScoreMode[COMPLETE_NO_SCORES],
   ]
   method=org.apache.lucene.search.TermQuery$TermWeight. location=AtExit
   ts=2023-05-01 22:32:56; [cost=0.029482ms] result=@ArrayList[
   @ScoreMode[COMPLETE_NO_SCORES],
   ]
   method=org.apache.lucene.search.TermQuery$TermWeight. location=AtExit
   ts=2023-05-01 22:32:57; [cost=0.0135ms] result=@ArrayList[
   @ScoreMode[COMPLETE_NO_SCORES],
   ]
   ```
   
   ### Version and environment details
   
   Using Lucene 8.10.1, though the issue is there starting 8.x goes into 9.x as 
well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jainankitk commented on issue #12297: Unnecessary BM25Scorer allocations for non-scoring queries

2023-05-15 Thread via GitHub


jainankitk commented on issue #12297:
URL: https://github.com/apache/lucene/issues/12297#issuecomment-1548859574

   I also came across [this 
discussion](https://lists.apache.org/thread/nrlkswkqh1bp80owb9yd9zzotcz81soj). 
Maybe I am missing some context, but could not understand why this is not an 
issue. Seems like regression to me, for specific customer use case. I also 
tried quick hack patch (always returning non-scoring similarity) just to 
confirm, and the float[] did not appear in the histogram / heap dump anymore.
   
   ```
public class IndexSearcher {

   +  /** A search-time {@link Similarity} that does not make use of scoring 
factors
   +   *  and may be used when scores are not needed. */
   +  private static final Similarity NON_SCORING_SIMILARITY = new Similarity() 
{
   + 
   +@Override
   +public long computeNorm(FieldInvertState state) {
   +  throw new UnsupportedOperationException("This Similarity may only be 
used for searching, not indexing");
   +}
   + 
   +@Override
   +public SimScorer scorer(float boost, CollectionStatistics 
collectionStats, TermStatistics... termStats) {
   +  return new SimScorer() {
   +@Override
   +public float score(float freq, long norm) {
   +  return 0f;
   +}
   +  };
   +}
   + 
   +  };
   +
  private static QueryCache DEFAULT_QUERY_CACHE;
  private static QueryCachingPolicy DEFAULT_CACHING_POLICY = new 
UsageTrackingQueryCachingPolicy();
  static {
   @@ -314,7 +336,7 @@ public class IndexSearcher {
   *  {@link Similarity} that has been set through {@link 
#setSimilarity(Similarity)}
   *  or the default {@link Similarity} if none has been set explicitly. */
  public Similarity getSimilarity() {
   -return similarity;
   +return NON_SCORING_SIMILARITY;
  }

  /**
   ```
   
   @jpountz - Can you check once and confirm if I am missing something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi opened a new pull request, #12298: move max recursion from Operations.java to AutomatonTestUtil.java

2023-05-15 Thread via GitHub


tang-hi opened a new pull request, #12298:
URL: https://github.com/apache/lucene/pull/12298

   ### Description
   move max recursion from Operations.java to AutomatonTestUtil.java according 
to @rmuir 's comment in #12286 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub


tang-hi commented on PR #12286:
URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548875731

   @rmuir Already raise a PR to move that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih merged pull request #12235: Optimize HNSW diversity calculation

2023-05-15 Thread via GitHub


zhaih merged PR #12235:
URL: https://github.com/apache/lucene/pull/12235


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-15 Thread via GitHub


zhaih commented on code in PR #12246:
URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java:
##
@@ -85,7 +86,7 @@ public List getSynonyms(
   SIMILARITY_FUNCTION,
   hnswGraph,
   null,
-  word2VecModel.size());
+  NO_LIMIT_ON_VISITED_NODES);

Review Comment:
   FYI I applied the same change to https://github.com/apache/lucene/pull/12235 
(to unblock that change) without having this as a static final, you can still 
make it static here if you want tho :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-15 Thread via GitHub


zhaih commented on code in PR #12246:
URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java:
##
@@ -85,7 +86,7 @@ public List getSynonyms(
   SIMILARITY_FUNCTION,
   hnswGraph,
   null,
-  word2VecModel.size());
+  NO_LIMIT_ON_VISITED_NODES);

Review Comment:
   FYI I applied the same change to https://github.com/apache/lucene/pull/12235 
(to unblock that PR) without having this as a static final, you can still make 
it static here if you want tho :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org