[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547398338 I can take care of that, but I would like to get a final "go" by @rmuir. :-) Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow
romseygeek commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1193625395 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) { return states; } - // TODO: not great that this is recursive... in theory a - // large automata could exceed java's stack so the maximum level of recursion is bounded to 1000 - private static int topoSortStatesRecurse( - Automaton a, BitSet visited, int[] states, int upto, int state, int level) { -if (level > MAX_RECURSION_LEVEL) { - throw new IllegalArgumentException("input automaton is too large: " + level); -} + /** + * Performs a topological sort on the states of the given Automaton. + * + * @param a The automaton whose states are to be topologically sorted. + * @param states An int array which stores the states. + * @return the reversed topologically sorted array of state ids Review Comment: minor nit: this should say 'returns the number of states in the final sorted list' I think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1193672636 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) { return states; } - // TODO: not great that this is recursive... in theory a - // large automata could exceed java's stack so the maximum level of recursion is bounded to 1000 - private static int topoSortStatesRecurse( - Automaton a, BitSet visited, int[] states, int upto, int state, int level) { -if (level > MAX_RECURSION_LEVEL) { - throw new IllegalArgumentException("input automaton is too large: " + level); -} + /** + * Performs a topological sort on the states of the given Automaton. + * + * @param a The automaton whose states are to be topologically sorted. + * @param states An int array which stores the states. + * @return the reversed topologically sorted array of state ids Review Comment: Yes, My bad.Already fix that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12286: toposort use iterator to avoid stackoverflow
uschindler merged PR #12286: URL: https://github.com/apache/lucene/pull/12286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer
tang-hi commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547968987 Good Catch! Could you submit a PR to fix that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547976361 Thank you everyone for your valuable comments on my PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer
uschindler commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547978601 In general I'd suggest to figure out if we should not change the stopword file parser to strip blank lines like comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547980379 Thanks @tang-hi for the nice work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547983951 One additional thing as my last comment: We moved from recursive to iterative, but we still have a stack (deque). It is not so limited like the OS stack by the Java VM, but still for some FSA it could consume a lot of heap space. Any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548011512 > One additional thing as my last comment: We moved from recursive to iterative, but we still have a stack (deque). It is not so limited like the OS stack by the Java VM, but still for some FSA it could consume a lot of heap space. > > Any thoughts? The only approach I can think of to limit heap consumption is to restrict the size of deque. However, I believe this will only occur in special cases. In addition, if users wish to perform topological sorting on large automata and are prepared to accommodate the associated memory consumption and time costs, I don't think we should impose any limitations.If there are any better suggestions, I am open to them at any time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow
mikemccand commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548020174 I think the RAM consumption is OK, but, we should clearly advertise it in the javadocs for this method? Since we detect cycles we will never have an "attempt to use infinite RAM". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer
mikemccand commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548022505 I think the stoplist loader already ignores comment lines, but, does not ignore empty lines! Darned empty string rears its head at us again... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548033460 I think we have the same problem with other methods in this class that were transformed to be iterative earlier. So we should maybe make a comment about what types of automatons could consume lot of space. But for end users it might be important to have this in our query classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] JerryChin commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer
JerryChin commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548065374 Hi @tang-hi, I can summit a PR to fix this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi opened a new pull request, #12292: Update Javadoc for topoSortStates method
tang-hi opened a new pull request, #12292: URL: https://github.com/apache/lucene/pull/12292 ### Description Update javadoc based on comment in #12286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548069204 I think adding a comment would be great. I have already submitted a new pull request. @uschindler @mikemccand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548081406 Looks fine, I can merge that later, no need for additional changes.txt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12292: Update Javadoc for topoSortStates method
uschindler commented on PR #12292: URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548082691 I will merge that later, no need to add changes.txt. If you really like, you can of course add this PR's issue number to the existing change log entry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer
uschindler commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548087039 Also here: https://github.com/apache/lucene/blob/5d203f8337cb6a2350c1abe5d83e3e103d060645/lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java#L119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on pull request #12292: Update Javadoc for topoSortStates method
tang-hi commented on PR #12292: URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548096138 > I will merge that later, no need to add changes.txt. If you really like, you can of course add this PR's issue number to the existing change log entry. It's okay to merge without updating the changes.txt file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548141965 Hi, I had to fix the test to use `TestUtil#nextInt(min, max)` as Java 11 has no two-parameter `Random#netInt(origin, bound)`. I had to substract 1 from bound as its is inclusive in TestUtil. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12292: Update Javadoc for topoSortStates method
uschindler merged PR #12292: URL: https://github.com/apache/lucene/pull/12292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] runningcode commented on pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights
runningcode commented on PR #12266: URL: https://github.com/apache/lucene/pull/12266#issuecomment-1548170425 @risdenk Yes that is certainly possible. I've opened a PR here to do this: https://github.com/apache/lucene/pull/12293 Feel free to close this PR in favor of the other one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] runningcode opened a new pull request, #12293: Capture build scans on ge.apache.org to benefit from deep build insights
runningcode opened a new pull request, #12293: URL: https://github.com/apache/lucene/pull/12293 Description This PR publishes a build scan for every CI build on Jenkins and GitHub Actions and for every local build from an authenticated Apache committer. The build will not fail if publishing fails. The build scans of the Apache Lucene project are published to the Gradle Enterprise instance at [ge.apache.org](https://ge.apache.org/), hosted by the Apache Software Foundation and run in partnership between the ASF and Gradle. This Gradle Enterprise instance has all features and extensions enabled and is freely available for use by the Apache Lucene project and all other Apache projects. This pull request enhances the functionality of publishing build scans to the publicly available [scans.gradle.com](https://scans.gradle.com/) by instead publishing build scans to [ge.apache.org](https://ge.apache.org/). On this Gradle Enterprise instance, Apache Lucene will have access not only to all of the published build scans but other aggregate data features such as: Dashboards to view all historical build scans, along with performance trends over time Build failure analytics for enhanced investigation and diagnosis of build failures Test failure analytics to better understand trends and causes around slow, failing, and flaky tests If interested in exploring a fully populated Gradle Enterprise instance, please explore the builds already connected to [ge.apache.org](https://ge.apache.org/), the [Spring project’s instance](https://ge.spring.io/scans?search.relativeStartTime=P28D&search.timeZoneId=Europe/Zurich), or any number of other [OSS projects](https://gradle.com/enterprise-customers/oss-projects/) for which we sponsor instances of Gradle Enterprise. Please let me know if there are any questions about the value of Gradle Enterprise or the changes in this pull request and I’d be happy to address them. This PR supersedes https://github.com/apache/lucene/pull/12266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler opened a new pull request, #12294: URL: https://github.com/apache/lucene/pull/12294 This is the Java 21 version of MemorySegments following [JEP 442](https://openjdk.org/jeps/442). There are not many changes: - Update scriptDepVersion's ASM to 9.5 and extract the preview API to a separate APIJAR file - make a clone of `MemorySegmentIndexInput` (no changes needed) - make a clone of `MemorySegmentIndexInputProvider` and adapt allocating a shared `Arena`. Also adapt FileChannel's map API to use `Arena`directly (scope is gone there). I will make this PR a draft until JDK 21 is in final phase (RC). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548241540 I will setup testing on Policeman Jenkins soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548251301 Hi @mcimadamore, maybe also have a quick look. Thanks, Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #12286: toposort use iterator to avoid stackoverflow
rmuir commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548401420 @tang-hi @uschindler @mikemccand I think now that there are no more recursive algorithms in `src/java` we can now move `Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `src/test`? This test helper still uses it only because it has some simple recursive implementations used for validating correctness. And I think it is a good step to be able to move it to just tests! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548450784 Policeman Jenkins looks fine: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/845/console -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow
mikemccand commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548617786 > @tang-hi @uschindler @mikemccand I think now that there are no more recursive algorithms in `src/java` we can now move `Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `src/test`? This test helper still uses it only because it has some simple recursive implementations used for validating correctness. And I think it is a good step to be able to move it to just tests! Yay, what a great milestone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] JarvisCraft opened a new pull request, #12295: Use `instanceof` pattern-matching where possible
JarvisCraft opened a new pull request, #12295: URL: https://github.com/apache/lucene/pull/12295 ### Description This PR enables the usage of `instanceof` pattern matching wherever possible (without changing semantics) reducing error-proneness and potentially enhancing readability. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] JarvisCraft opened a new pull request, #12296: Seal `IndexReaderContext`
JarvisCraft opened a new pull request, #12296: URL: https://github.com/apache/lucene/pull/12296 ### Description `IndexReaderContext` is already effectively sealed since it's constructor does type check throwing `Error` if `this` is neither instance of `CompositeReaderContext` nor `LeafReaderContext`. This PR simply makes this restriction explicit (codewise) turning it from a runtime error into a compile time one to extend this class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jainankitk opened a new issue, #12297: Unnecessary BM25Scorer allocations for non-scoring queries
jainankitk opened a new issue, #12297: URL: https://github.com/apache/lucene/issues/12297 ### Description While looking into customer issue, I noticed increase in GC time from Lucene 7.x to 8.x. From the JVM histograms, one of the primary difference was float[] allocation. Took a heap dump to check the dominator and it was coming from BM25Scorer. The change seems to have come in with https://github.com/apache/lucene/commit/8fd7ead940f69a892dfc951a1aa042e8cae806c1, which removed some of the special-case logic around the "non-scoring similarity" embedded in IndexSearcher (returned in the false case from the old IndexSearcher#scorer(boolean needsScores)). ``` num #instances #bytes class name (module) --- 1: 24601972 4773247024 [B (java.base@11.0.17) 2: 2100779 2061684496 [F (java.base@11.0.17) 3: 33501475 804035400 java.util.ArrayList (java.base@11.0.17) 4: 16232322 716523504 [Ljava.lang.Object; (java.base@11.0.17) 5: 14819347 711328656 java.util.HashMap (java.base@11.0.17) ... ... 34: 1106011 79632792 org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl 35: 1979609 79184360 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer ``` I also validated that the scoring mode for these queries is COMPLETE_NO_SCORES, that has `needsScore` set to false: ``` method=org.apache.lucene.search.TermQuery$TermWeight. location=AtExit ts=2023-05-01 22:32:57; [cost=0.014381ms] result=@ArrayList[ @ScoreMode[COMPLETE_NO_SCORES], ] method=org.apache.lucene.search.TermQuery$TermWeight. location=AtExit ts=2023-05-01 22:32:56; [cost=0.029482ms] result=@ArrayList[ @ScoreMode[COMPLETE_NO_SCORES], ] method=org.apache.lucene.search.TermQuery$TermWeight. location=AtExit ts=2023-05-01 22:32:57; [cost=0.0135ms] result=@ArrayList[ @ScoreMode[COMPLETE_NO_SCORES], ] ``` ### Version and environment details Using Lucene 8.10.1, though the issue is there starting 8.x goes into 9.x as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jainankitk commented on issue #12297: Unnecessary BM25Scorer allocations for non-scoring queries
jainankitk commented on issue #12297: URL: https://github.com/apache/lucene/issues/12297#issuecomment-1548859574 I also came across [this discussion](https://lists.apache.org/thread/nrlkswkqh1bp80owb9yd9zzotcz81soj). Maybe I am missing some context, but could not understand why this is not an issue. Seems like regression to me, for specific customer use case. I also tried quick hack patch (always returning non-scoring similarity) just to confirm, and the float[] did not appear in the histogram / heap dump anymore. ``` public class IndexSearcher { + /** A search-time {@link Similarity} that does not make use of scoring factors + * and may be used when scores are not needed. */ + private static final Similarity NON_SCORING_SIMILARITY = new Similarity() { + +@Override +public long computeNorm(FieldInvertState state) { + throw new UnsupportedOperationException("This Similarity may only be used for searching, not indexing"); +} + +@Override +public SimScorer scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats) { + return new SimScorer() { +@Override +public float score(float freq, long norm) { + return 0f; +} + }; +} + + }; + private static QueryCache DEFAULT_QUERY_CACHE; private static QueryCachingPolicy DEFAULT_CACHING_POLICY = new UsageTrackingQueryCachingPolicy(); static { @@ -314,7 +336,7 @@ public class IndexSearcher { * {@link Similarity} that has been set through {@link #setSimilarity(Similarity)} * or the default {@link Similarity} if none has been set explicitly. */ public Similarity getSimilarity() { -return similarity; +return NON_SCORING_SIMILARITY; } /** ``` @jpountz - Can you check once and confirm if I am missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi opened a new pull request, #12298: move max recursion from Operations.java to AutomatonTestUtil.java
tang-hi opened a new pull request, #12298: URL: https://github.com/apache/lucene/pull/12298 ### Description move max recursion from Operations.java to AutomatonTestUtil.java according to @rmuir 's comment in #12286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548875731 @rmuir Already raise a PR to move that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih merged pull request #12235: Optimize HNSW diversity calculation
zhaih merged PR #12235: URL: https://github.com/apache/lucene/pull/12235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized
zhaih commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ## @@ -85,7 +86,7 @@ public List getSynonyms( SIMILARITY_FUNCTION, hnswGraph, null, - word2VecModel.size()); + NO_LIMIT_ON_VISITED_NODES); Review Comment: FYI I applied the same change to https://github.com/apache/lucene/pull/12235 (to unblock that change) without having this as a static final, you can still make it static here if you want tho :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized
zhaih commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ## @@ -85,7 +86,7 @@ public List getSynonyms( SIMILARITY_FUNCTION, hnswGraph, null, - word2VecModel.size()); + NO_LIMIT_ON_VISITED_NODES); Review Comment: FYI I applied the same change to https://github.com/apache/lucene/pull/12235 (to unblock that PR) without having this as a static final, you can still make it static here if you want tho :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org