[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547398338 I can take care of that, but I would like to get a final "go" by @rmuir. :-) Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [lucene] romseygeek commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
romseygeek commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1193625395 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) { return s

[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1193672636 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) { return stat

[GitHub] [lucene] uschindler merged pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler merged PR #12286: URL: https://github.com/apache/lucene/pull/12286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] tang-hi commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
tang-hi commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547968987 Good Catch! Could you submit a PR to fix that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547976361 Thank you everyone for your valuable comments on my PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
uschindler commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547978601 In general I'd suggest to figure out if we should not change the stopword file parser to strip blank lines like comments? -- This is an automated message from the Apache Git Se

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547980379 Thanks @tang-hi for the nice work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547983951 One additional thing as my last comment: We moved from recursive to iterative, but we still have a stack (deque). It is not so limited like the OS stack by the Java VM, but still for

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548011512 > One additional thing as my last comment: We moved from recursive to iterative, but we still have a stack (deque). It is not so limited like the OS stack by the Java VM, but still for s

[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
mikemccand commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548020174 I think the RAM consumption is OK, but, we should clearly advertise it in the javadocs for this method? Since we detect cycles we will never have an "attempt to use infinite RAM".

[GitHub] [lucene] mikemccand commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
mikemccand commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548022505 I think the stoplist loader already ignores comment lines, but, does not ignore empty lines! Darned empty string rears its head at us again... -- This is an automated message

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548033460 I think we have the same problem with other methods in this class that were transformed to be iterative earlier. So we should maybe make a comment about what types of automatons cou

[GitHub] [lucene] JerryChin commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
JerryChin commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548065374 Hi @tang-hi, I can summit a PR to fix this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [lucene] tang-hi opened a new pull request, #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
tang-hi opened a new pull request, #12292: URL: https://github.com/apache/lucene/pull/12292 ### Description Update javadoc based on comment in #12286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548069204 I think adding a comment would be great. I have already submitted a new pull request. @uschindler @mikemccand -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548081406 Looks fine, I can merge that later, no need for additional changes.txt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [lucene] uschindler commented on pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
uschindler commented on PR #12292: URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548082691 I will merge that later, no need to add changes.txt. If you really like, you can of course add this PR's issue number to the existing change log entry. -- This is an automated messa

[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
uschindler commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548087039 Also here: https://github.com/apache/lucene/blob/5d203f8337cb6a2350c1abe5d83e3e103d060645/lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java#L119 -- This is an

[GitHub] [lucene] tang-hi commented on pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
tang-hi commented on PR #12292: URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548096138 > I will merge that later, no need to add changes.txt. If you really like, you can of course add this PR's issue number to the existing change log entry. It's okay to merge without

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548141965 Hi, I had to fix the test to use `TestUtil#nextInt(min, max)` as Java 11 has no two-parameter `Random#netInt(origin, bound)`. I had to substract 1 from bound as its is inclusive in

[GitHub] [lucene] uschindler merged pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
uschindler merged PR #12292: URL: https://github.com/apache/lucene/pull/12292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] runningcode commented on pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-15 Thread via GitHub
runningcode commented on PR #12266: URL: https://github.com/apache/lucene/pull/12266#issuecomment-1548170425 @risdenk Yes that is certainly possible. I've opened a PR here to do this: https://github.com/apache/lucene/pull/12293 Feel free to close this PR in favor of the other one. -

[GitHub] [lucene] runningcode opened a new pull request, #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-15 Thread via GitHub
runningcode opened a new pull request, #12293: URL: https://github.com/apache/lucene/pull/12293 Description This PR publishes a build scan for every CI build on Jenkins and GitHub Actions and for every local build from an authenticated Apache committer. The build will not fail if publish

[GitHub] [lucene] uschindler opened a new pull request, #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler opened a new pull request, #12294: URL: https://github.com/apache/lucene/pull/12294 This is the Java 21 version of MemorySegments following [JEP 442](https://openjdk.org/jeps/442). There are not many changes: - Update scriptDepVersion's ASM to 9.5 and extract the preview

[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548241540 I will setup testing on Policeman Jenkins soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548251301 Hi @mcimadamore, maybe also have a quick look. Thanks, Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] rmuir commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
rmuir commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548401420 @tang-hi @uschindler @mikemccand I think now that there are no more recursive algorithms in `src/java` we can now move `Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `src/test`

[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548450784 Policeman Jenkins looks fine: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/845/console -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
mikemccand commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548617786 > @tang-hi @uschindler @mikemccand I think now that there are no more recursive algorithms in `src/java` we can now move `Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `sr

[GitHub] [lucene] JarvisCraft opened a new pull request, #12295: Use `instanceof` pattern-matching where possible

2023-05-15 Thread via GitHub
JarvisCraft opened a new pull request, #12295: URL: https://github.com/apache/lucene/pull/12295 ### Description This PR enables the usage of `instanceof` pattern matching wherever possible (without changing semantics) reducing error-proneness and potentially enhancing readability.

[GitHub] [lucene] JarvisCraft opened a new pull request, #12296: Seal `IndexReaderContext`

2023-05-15 Thread via GitHub
JarvisCraft opened a new pull request, #12296: URL: https://github.com/apache/lucene/pull/12296 ### Description `IndexReaderContext` is already effectively sealed since it's constructor does type check throwing `Error` if `this` is neither instance of `CompositeReaderContext` nor `Le

[GitHub] [lucene] jainankitk opened a new issue, #12297: Unnecessary BM25Scorer allocations for non-scoring queries

2023-05-15 Thread via GitHub
jainankitk opened a new issue, #12297: URL: https://github.com/apache/lucene/issues/12297 ### Description While looking into customer issue, I noticed increase in GC time from Lucene 7.x to 8.x. From the JVM histograms, one of the primary difference was float[] allocation. Took a hea

[GitHub] [lucene] jainankitk commented on issue #12297: Unnecessary BM25Scorer allocations for non-scoring queries

2023-05-15 Thread via GitHub
jainankitk commented on issue #12297: URL: https://github.com/apache/lucene/issues/12297#issuecomment-1548859574 I also came across [this discussion](https://lists.apache.org/thread/nrlkswkqh1bp80owb9yd9zzotcz81soj). Maybe I am missing some context, but could not understand why this is not

[GitHub] [lucene] tang-hi opened a new pull request, #12298: move max recursion from Operations.java to AutomatonTestUtil.java

2023-05-15 Thread via GitHub
tang-hi opened a new pull request, #12298: URL: https://github.com/apache/lucene/pull/12298 ### Description move max recursion from Operations.java to AutomatonTestUtil.java according to @rmuir 's comment in #12286 -- This is an automated message from the Apache Git Service. To re

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548875731 @rmuir Already raise a PR to move that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [lucene] zhaih merged pull request #12235: Optimize HNSW diversity calculation

2023-05-15 Thread via GitHub
zhaih merged PR #12235: URL: https://github.com/apache/lucene/pull/12235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-15 Thread via GitHub
zhaih commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ## @@ -85,7 +86,7 @@ public List getSynonyms(

[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-15 Thread via GitHub
zhaih commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ## @@ -85,7 +86,7 @@ public List getSynonyms(