[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-15 Thread via GitHub
zhaih commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ## @@ -85,7 +86,7 @@ public List getSynonyms(

[GitHub] [lucene] zhaih commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-15 Thread via GitHub
zhaih commented on code in PR #12246: URL: https://github.com/apache/lucene/pull/12246#discussion_r1194674783 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java: ## @@ -85,7 +86,7 @@ public List getSynonyms(

[GitHub] [lucene] zhaih merged pull request #12235: Optimize HNSW diversity calculation

2023-05-15 Thread via GitHub
zhaih merged PR #12235: URL: https://github.com/apache/lucene/pull/12235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548875731 @rmuir Already raise a PR to move that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [lucene] tang-hi opened a new pull request, #12298: move max recursion from Operations.java to AutomatonTestUtil.java

2023-05-15 Thread via GitHub
tang-hi opened a new pull request, #12298: URL: https://github.com/apache/lucene/pull/12298 ### Description move max recursion from Operations.java to AutomatonTestUtil.java according to @rmuir 's comment in #12286 -- This is an automated message from the Apache Git Service. To re

[GitHub] [lucene] jainankitk commented on issue #12297: Unnecessary BM25Scorer allocations for non-scoring queries

2023-05-15 Thread via GitHub
jainankitk commented on issue #12297: URL: https://github.com/apache/lucene/issues/12297#issuecomment-1548859574 I also came across [this discussion](https://lists.apache.org/thread/nrlkswkqh1bp80owb9yd9zzotcz81soj). Maybe I am missing some context, but could not understand why this is not

[GitHub] [lucene] jainankitk opened a new issue, #12297: Unnecessary BM25Scorer allocations for non-scoring queries

2023-05-15 Thread via GitHub
jainankitk opened a new issue, #12297: URL: https://github.com/apache/lucene/issues/12297 ### Description While looking into customer issue, I noticed increase in GC time from Lucene 7.x to 8.x. From the JVM histograms, one of the primary difference was float[] allocation. Took a hea

[GitHub] [lucene] JarvisCraft opened a new pull request, #12296: Seal `IndexReaderContext`

2023-05-15 Thread via GitHub
JarvisCraft opened a new pull request, #12296: URL: https://github.com/apache/lucene/pull/12296 ### Description `IndexReaderContext` is already effectively sealed since it's constructor does type check throwing `Error` if `this` is neither instance of `CompositeReaderContext` nor `Le

[GitHub] [lucene] JarvisCraft opened a new pull request, #12295: Use `instanceof` pattern-matching where possible

2023-05-15 Thread via GitHub
JarvisCraft opened a new pull request, #12295: URL: https://github.com/apache/lucene/pull/12295 ### Description This PR enables the usage of `instanceof` pattern matching wherever possible (without changing semantics) reducing error-proneness and potentially enhancing readability.

[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
mikemccand commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548617786 > @tang-hi @uschindler @mikemccand I think now that there are no more recursive algorithms in `src/java` we can now move `Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `sr

[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548450784 Policeman Jenkins looks fine: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/845/console -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] rmuir commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
rmuir commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548401420 @tang-hi @uschindler @mikemccand I think now that there are no more recursive algorithms in `src/java` we can now move `Operations.MAX_RECURSION_LEVEL` to `AutomatonTestUtil` in `src/test`

[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548251301 Hi @mcimadamore, maybe also have a quick look. Thanks, Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1548241540 I will setup testing on Policeman Jenkins soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] uschindler opened a new pull request, #12294: Implement MMapDirectory with Java 21 Project Panama Preview API

2023-05-15 Thread via GitHub
uschindler opened a new pull request, #12294: URL: https://github.com/apache/lucene/pull/12294 This is the Java 21 version of MemorySegments following [JEP 442](https://openjdk.org/jeps/442). There are not many changes: - Update scriptDepVersion's ASM to 9.5 and extract the preview

[GitHub] [lucene] runningcode opened a new pull request, #12293: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-15 Thread via GitHub
runningcode opened a new pull request, #12293: URL: https://github.com/apache/lucene/pull/12293 Description This PR publishes a build scan for every CI build on Jenkins and GitHub Actions and for every local build from an authenticated Apache committer. The build will not fail if publish

[GitHub] [lucene] runningcode commented on pull request #12266: Capture build scans on ge.apache.org to benefit from deep build insights

2023-05-15 Thread via GitHub
runningcode commented on PR #12266: URL: https://github.com/apache/lucene/pull/12266#issuecomment-1548170425 @risdenk Yes that is certainly possible. I've opened a PR here to do this: https://github.com/apache/lucene/pull/12293 Feel free to close this PR in favor of the other one. -

[GitHub] [lucene] uschindler merged pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
uschindler merged PR #12292: URL: https://github.com/apache/lucene/pull/12292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548141965 Hi, I had to fix the test to use `TestUtil#nextInt(min, max)` as Java 11 has no two-parameter `Random#netInt(origin, bound)`. I had to substract 1 from bound as its is inclusive in

[GitHub] [lucene] tang-hi commented on pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
tang-hi commented on PR #12292: URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548096138 > I will merge that later, no need to add changes.txt. If you really like, you can of course add this PR's issue number to the existing change log entry. It's okay to merge without

[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
uschindler commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548087039 Also here: https://github.com/apache/lucene/blob/5d203f8337cb6a2350c1abe5d83e3e103d060645/lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java#L119 -- This is an

[GitHub] [lucene] uschindler commented on pull request #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
uschindler commented on PR #12292: URL: https://github.com/apache/lucene/pull/12292#issuecomment-1548082691 I will merge that later, no need to add changes.txt. If you really like, you can of course add this PR's issue number to the existing change log entry. -- This is an automated messa

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548081406 Looks fine, I can merge that later, no need for additional changes.txt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548069204 I think adding a comment would be great. I have already submitted a new pull request. @uschindler @mikemccand -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [lucene] tang-hi opened a new pull request, #12292: Update Javadoc for topoSortStates method

2023-05-15 Thread via GitHub
tang-hi opened a new pull request, #12292: URL: https://github.com/apache/lucene/pull/12292 ### Description Update javadoc based on comment in #12286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [lucene] JerryChin commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
JerryChin commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548065374 Hi @tang-hi, I can summit a PR to fix this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548033460 I think we have the same problem with other methods in this class that were transformed to be iterative earlier. So we should maybe make a comment about what types of automatons cou

[GitHub] [lucene] mikemccand commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
mikemccand commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1548022505 I think the stoplist loader already ignores comment lines, but, does not ignore empty lines! Darned empty string rears its head at us again... -- This is an automated message

[GitHub] [lucene] mikemccand commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
mikemccand commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548020174 I think the RAM consumption is OK, but, we should clearly advertise it in the javadocs for this method? Since we detect cycles we will never have an "attempt to use infinite RAM".

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1548011512 > One additional thing as my last comment: We moved from recursive to iterative, but we still have a stack (deque). It is not so limited like the OS stack by the Java VM, but still for s

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547983951 One additional thing as my last comment: We moved from recursive to iterative, but we still have a stack (deque). It is not so limited like the OS stack by the Java VM, but still for

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547980379 Thanks @tang-hi for the nice work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [lucene] uschindler commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
uschindler commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547978601 In general I'd suggest to figure out if we should not change the stopword file parser to strip blank lines like comments? -- This is an automated message from the Apache Git Se

[GitHub] [lucene] tang-hi commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547976361 Thank you everyone for your valuable comments on my PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [lucene] tang-hi commented on issue #12291: Unnecessary blank lines found in stopwords.txt of SmartChineseAnalyzer

2023-05-15 Thread via GitHub
tang-hi commented on issue #12291: URL: https://github.com/apache/lucene/issues/12291#issuecomment-1547968987 Good Catch! Could you submit a PR to fix that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] uschindler merged pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler merged PR #12286: URL: https://github.com/apache/lucene/pull/12286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] tang-hi commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
tang-hi commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1193672636 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) { return stat

[GitHub] [lucene] romseygeek commented on a diff in pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
romseygeek commented on code in PR #12286: URL: https://github.com/apache/lucene/pull/12286#discussion_r1193625395 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -1303,24 +1307,49 @@ public static int[] topoSortStates(Automaton a) { return s

[GitHub] [lucene] uschindler commented on pull request #12286: toposort use iterator to avoid stackoverflow

2023-05-15 Thread via GitHub
uschindler commented on PR #12286: URL: https://github.com/apache/lucene/pull/12286#issuecomment-1547398338 I can take care of that, but I would like to get a final "go" by @rmuir. :-) Uwe -- This is an automated message from the Apache Git Service. To respond to the message, please lo