[GitHub] [lucene] dweiss commented on pull request #394: LUCENE-9997: write release revision to system temp dir
dweiss commented on pull request #394: URL: https://github.com/apache/lucene/pull/394#issuecomment-946429811 Thanks Tomoko. What is this file for though? Is it really needed at all (can't it be a variable)? It'd also help to use Python's temporary-file facilities so that it's system-agnostic (I don't believe this will even run on Windows properly, but we can try not to make it worse). https://docs.python.org/3/library/tempfile.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10185) gradle check fails on java 17 (security manager deprecation)
[ https://issues.apache.org/jira/browse/LUCENE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430362#comment-17430362 ] Dawid Weiss commented on LUCENE-10185: -- Oh... completely forgot about polymorphic signatures ("Method handle compilation", [1])... Damn, this is complex. Thanks for digging deep, Uwe. [1] https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/lang/invoke/MethodHandle.html > gradle check fails on java 17 (security manager deprecation) > > > Key: LUCENE-10185 > URL: https://issues.apache.org/jira/browse/LUCENE-10185 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > Fix For: main (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I don't think we should add SuppressWarnings here, instead fix our ECJ linter > configuration. Seems like we should be specifying something similar to > "-release 11" and it shouldn't care about the new deprecations from java 17. > Or if we can't do that, maybe we should disable the "deprecated for removal" > check in ECJ entirely? > {noformat} > > Task :lucene:core:ecjLintMain > -- > 1. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/NamedThreadFactory.java > (at line 42) > final SecurityManager s = System.getSecurityManager(); > ^^^ > The type SecurityManager has been deprecated since version 17 and marked for > removal > -- > 2. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/NamedThreadFactory.java > (at line 42) > final SecurityManager s = System.getSecurityManager(); > > The method getSecurityManager() from the type System has been deprecated > since version 17 and marked for removal > -- > 3. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/NamedThreadFactory.java > (at line 43) > group = (s != null) ? s.getThreadGroup() : > Thread.currentThread().getThreadGroup(); > > The method getThreadGroup() from the type SecurityManager has been deprecated > and marked for removal > -- > -- > 4. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java > (at line 23) > import java.security.AccessControlException; > > The type AccessControlException has been deprecated since version 17 and > marked for removal > -- > 5. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java > (at line 24) > import java.security.AccessController; >^^ > The type AccessController has been deprecated since version 17 and marked for > removal > -- > 6. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java > (at line 574) > AccessController.doPrivileged((PrivilegedAction) > target::getDeclaredFields); > > The type AccessController has been deprecated since version 17 and marked for > removal > -- > 7. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java > (at line 574) > AccessController.doPrivileged((PrivilegedAction) > target::getDeclaredFields); > > ^^^ > The method doPrivileged(PrivilegedAction) from the type > AccessController has been deprecated and marked for removal > -- > 8. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java > (at line 575) > } catch (AccessControlException e) { > ^^ > The type AccessControlException has been deprecated since version 17 and > marked for removal > -- > -- > 9. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java > (at line 33) > import java.security.AccessController; >^^ > The type AccessController has been deprecated since version 17 and marked for > removal > -- > 10. ERROR in > /home/rmuir/workspace/lucene/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java > (at line 337) > AccessController.doPrivileged((PrivilegedAction) > MMapDirectory::unmapHackImpl); > > The type AccessController has been deprecated since version 17 and marked for > removal > -- >
[GitHub] [lucene] dweiss commented on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
dweiss commented on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946437537 > Or the script could run the main build in parallel and then run just the signing serially. Sure. Split in two - I also suggested removing "clean" because just rebuilding from scratch should always yield correct task outputs (this isn't ant). So this sequence: ``` gradlew assembleRelease gradlew assembleRelease -Psign --max-workers 1 ``` will rerun some tasks but will sign in a single worker. We could also order all signing tasks within gradle code (so that they can't run in parallel, no matter what) but it seems like an unnecessary complexity given the infrequent use of the script. I'd rather do the above (or fall back to just specifying --max-workers X, where X is small-ish for gpg signing). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss edited a comment on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
dweiss edited a comment on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946437537 > Or the script could run the main build in parallel and then run just the signing serially. Sure. Split in two - I also suggested removing "clean" because rebuilding from any state should always yield correct task outputs (this isn't ant where you have to clean leftovers over and over). So this sequence: ``` gradlew assembleRelease gradlew assembleRelease -Psign --max-workers 1 ``` will rerun some tasks but will sign in a single worker. We could also order all signing tasks within gradle code (so that they can't run in parallel, no matter what) but it seems like an unnecessary complexity given the infrequent use of the script. I'd rather do the above (or fall back to just specifying --max-workers X, where X is small-ish for gpg signing). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10166) Move relevant content of README.txt files from subprojects into package javadocs
[ https://issues.apache.org/jira/browse/LUCENE-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430375#comment-17430375 ] ASF subversion and git services commented on LUCENE-10166: -- Commit e290f91bb233f33cde4b2249d676298d5740e8b1 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e290f91 ] LUCENE-10166: removed module-level README.txt and modified a few links, removed a few obsolete instructions from 20 years ago. (#379) > Move relevant content of README.txt files from subprojects into package > javadocs > > > Key: LUCENE-10166 > URL: https://issues.apache.org/jira/browse/LUCENE-10166 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
Jan Høydahl created LUCENE-10186: Summary: Maven artifacts built by gradle lack required META-INF content Key: LUCENE-10186 URL: https://issues.apache.org/jira/browse/LUCENE-10186 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Jan Høydahl Spinoff from LUCENE-9997 Turns out that the maven artifacts generated by gradle lack LICENSE and NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: {code:java} RuntimeError: JAR file "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #379: LUCENE-10166: removed module-level README.txt and modified a few links.
dweiss merged pull request #379: URL: https://github.com/apache/lucene/pull/379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
[ https://issues.apache.org/jira/browse/LUCENE-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430376#comment-17430376 ] Dawid Weiss commented on LUCENE-10186: -- I'll handle this today, Jan. > Maven artifacts built by gradle lack required META-INF content > -- > > Key: LUCENE-10186 > URL: https://issues.apache.org/jira/browse/LUCENE-10186 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Jan Høydahl >Priority: Major > > Spinoff from LUCENE-9997 > Turns out that the maven artifacts generated by gradle lack LICENSE and > NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: > {code:java} > RuntimeError: JAR file > "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" > is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10166) Move relevant content of README.txt files from subprojects into package javadocs
[ https://issues.apache.org/jira/browse/LUCENE-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10166. -- Fix Version/s: main (9.0) Assignee: Dawid Weiss Resolution: Fixed > Move relevant content of README.txt files from subprojects into package > javadocs > > > Key: LUCENE-10166 > URL: https://issues.apache.org/jira/browse/LUCENE-10166 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
[ https://issues.apache.org/jira/browse/LUCENE-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-10186: Assignee: Dawid Weiss > Maven artifacts built by gradle lack required META-INF content > -- > > Key: LUCENE-10186 > URL: https://issues.apache.org/jira/browse/LUCENE-10186 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > > Spinoff from LUCENE-9997 > Turns out that the maven artifacts generated by gradle lack LICENSE and > NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: > {code:java} > RuntimeError: JAR file > "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" > is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
[ https://issues.apache.org/jira/browse/LUCENE-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430377#comment-17430377 ] Dawid Weiss commented on LUCENE-10186: -- Do source packages need a manifest though? Seems odd to me. These are not really binary JARs - they're convenience for IDEs? > Maven artifacts built by gradle lack required META-INF content > -- > > Key: LUCENE-10186 > URL: https://issues.apache.org/jira/browse/LUCENE-10186 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > > Spinoff from LUCENE-9997 > Turns out that the maven artifacts generated by gradle lack LICENSE and > NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: > {code:java} > RuntimeError: JAR file > "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" > is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
[ https://issues.apache.org/jira/browse/LUCENE-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430378#comment-17430378 ] Dawid Weiss commented on LUCENE-10186: -- We actually explicitly skip those JARs from receiving the manifest: {code} // Apply the manifest to any JAR or WAR file created by any project, // excluding those explicitly listed. tasks.withType(Jar) .matching { t -> !["sourcesJar", "javadocJar"].contains(t.name) } {code} > Maven artifacts built by gradle lack required META-INF content > -- > > Key: LUCENE-10186 > URL: https://issues.apache.org/jira/browse/LUCENE-10186 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > > Spinoff from LUCENE-9997 > Turns out that the maven artifacts generated by gradle lack LICENSE and > NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: > {code:java} > RuntimeError: JAR file > "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" > is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy commented on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
janhoy commented on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946453138 @mocobeta I got the same as you - looks like LICENSE and NOTICE are not copied into the maven jars. Strange, since they exist in the binary-release jars. Looks like maven task re-build the jars.. Could not the maven task use the pre-built jars that already have NOTICE and LICENSE? I also notice that `MANIFEST.MF` is empty in the maven jars, simply one line, which is also wrong: ``` Manifest-Version: 1.0 ``` I created https://issues.apache.org/jira/browse/LUCENE-10186 for maven artifacts. Re-opened LUCENE-10174 for the buildAndPushRelease changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-10174) Update buildAndPushRelease.py for new gradle build
[ https://issues.apache.org/jira/browse/LUCENE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reopened LUCENE-10174: -- Re-opening to enhance the 'assembleRelease' step, to avoid OOM in gpg-agent. Will split in two commands and skip 'clean': {code:java} gradlew assembleRelease ... gradlew assembleRelease -Psign --max-workers 1 ... {code} > Update buildAndPushRelease.py for new gradle build > -- > > Key: LUCENE-10174 > URL: https://issues.apache.org/jira/browse/LUCENE-10174 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > With LUCENE-9488 and LUCENE-10173 the gradle build was polished to properly > build source and binary artifacts, and sign those using either gpg tool or a > built-in java-based signing plugin. See > [https://github.com/apache/lucene/blob/main/help/publishing.txt] > This jira will update {{buildAndPushRelease.py}} script to use the correct > build parameters. It will also add cmdline args to choose between gpg and > built-in (gpg default), and to supply the location of {{gpgHome}} if you do > not use gpg. We'll also add an option to NOT prompt for passphrase in the > python script, which will fallback to defaults (gpg-agent, env.vars or > gradle.properties). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
[ https://issues.apache.org/jira/browse/LUCENE-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430383#comment-17430383 ] Dawid Weiss commented on LUCENE-10186: -- I filed a PR. I think this was intentional (by me) not to include manifests in these files - I didn't see the point. > Maven artifacts built by gradle lack required META-INF content > -- > > Key: LUCENE-10186 > URL: https://issues.apache.org/jira/browse/LUCENE-10186 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > > Spinoff from LUCENE-9997 > Turns out that the maven artifacts generated by gradle lack LICENSE and > NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: > {code:java} > RuntimeError: JAR file > "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" > is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
[ https://issues.apache.org/jira/browse/LUCENE-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430386#comment-17430386 ] ASF subversion and git services commented on LUCENE-10186: -- Commit 6c21862a552cccbb8509e4383ac8c6d10c68137f in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6c21862 ] LUCENE-10186: Include manifest and legalese in source and javadoc jars. (#395) > Maven artifacts built by gradle lack required META-INF content > -- > > Key: LUCENE-10186 > URL: https://issues.apache.org/jira/browse/LUCENE-10186 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > > Spinoff from LUCENE-9997 > Turns out that the maven artifacts generated by gradle lack LICENSE and > NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: > {code:java} > RuntimeError: JAR file > "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" > is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #395: LUCENE-10186: Include manifest and legalese in source and javadoc jars.
dweiss merged pull request #395: URL: https://github.com/apache/lucene/pull/395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10186) Maven artifacts built by gradle lack required META-INF content
[ https://issues.apache.org/jira/browse/LUCENE-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10186. -- Fix Version/s: main (9.0) Resolution: Fixed > Maven artifacts built by gradle lack required META-INF content > -- > > Key: LUCENE-10186 > URL: https://issues.apache.org/jira/browse/LUCENE-10186 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Jan Høydahl >Assignee: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > Spinoff from LUCENE-9997 > Turns out that the maven artifacts generated by gradle lack LICENSE and > NOTICE files in META-INF, and also have empty MANIFEST.MF. Smoketester error: > {code:java} > RuntimeError: JAR file > "/tmp/smoke_lucene_9.0.0_018642ff84f88a2438b32d6aca5d5d35f453e1fb_2/maven/org/apache/lucene/lucene-analysis-smartcn/9.0.0/lucene-analysis-smartcn-9.0.0-sources.jar" > is missing META-INF/NOTICE.txt {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10174) Update buildAndPushRelease.py for new gradle build
[ https://issues.apache.org/jira/browse/LUCENE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430393#comment-17430393 ] Jan Høydahl commented on LUCENE-10174: -- When we skip "clean" step, the task does not need to re-compile, only to assemble the tar/zip and sign. I tested locally and {{--max-workers 8}} (my default) results in OOM, but {{--max-workers 4}} works fine. So I'll settle on {{--max-workers 2}} which on my laptop takes less than a minute, which should be acceptable for a release. The imporant thing to parallellize is the tests. > Update buildAndPushRelease.py for new gradle build > -- > > Key: LUCENE-10174 > URL: https://issues.apache.org/jira/browse/LUCENE-10174 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > With LUCENE-9488 and LUCENE-10173 the gradle build was polished to properly > build source and binary artifacts, and sign those using either gpg tool or a > built-in java-based signing plugin. See > [https://github.com/apache/lucene/blob/main/help/publishing.txt] > This jira will update {{buildAndPushRelease.py}} script to use the correct > build parameters. It will also add cmdline args to choose between gpg and > built-in (gpg default), and to supply the location of {{gpgHome}} if you do > not use gpg. We'll also add an option to NOT prompt for passphrase in the > python script, which will fallback to defaults (gpg-agent, env.vars or > gradle.properties). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy opened a new pull request #396: LUCENE-10174 BuildAndPushRelease additional improvements
janhoy opened a new pull request #396: URL: https://github.com/apache/lucene/pull/396 https://issues.apache.org/jira/browse/LUCENE-10174 Makes assembleRelease OOM safe with max-workers=2 and faster by avoiding 'clean' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10174) Update buildAndPushRelease.py for new gradle build
[ https://issues.apache.org/jira/browse/LUCENE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430393#comment-17430393 ] Jan Høydahl edited comment on LUCENE-10174 at 10/19/21, 8:23 AM: - When we skip "clean" step, the task does not need to re-compile, only to assemble the tar/zip and sign. I tested locally and {{--max-workers 8}} (my default) results in OOM, but {{\--max-workers 4}} works fine. So I'll settle on {{\--max-workers 2}} which on my laptop takes less than a minute, which should be acceptable for a release. The imporant thing to parallellize is the tests. was (Author: janhoy): When we skip "clean" step, the task does not need to re-compile, only to assemble the tar/zip and sign. I tested locally and {{--max-workers 8}} (my default) results in OOM, but {{--max-workers 4}} works fine. So I'll settle on {{--max-workers 2}} which on my laptop takes less than a minute, which should be acceptable for a release. The imporant thing to parallellize is the tests. > Update buildAndPushRelease.py for new gradle build > -- > > Key: LUCENE-10174 > URL: https://issues.apache.org/jira/browse/LUCENE-10174 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > With LUCENE-9488 and LUCENE-10173 the gradle build was polished to properly > build source and binary artifacts, and sign those using either gpg tool or a > built-in java-based signing plugin. See > [https://github.com/apache/lucene/blob/main/help/publishing.txt] > This jira will update {{buildAndPushRelease.py}} script to use the correct > build parameters. It will also add cmdline args to choose between gpg and > built-in (gpg default), and to supply the location of {{gpgHome}} if you do > not use gpg. We'll also add an option to NOT prompt for passphrase in the > python script, which will fallback to defaults (gpg-agent, env.vars or > gradle.properties). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10174) Update buildAndPushRelease.py for new gradle build
[ https://issues.apache.org/jira/browse/LUCENE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430396#comment-17430396 ] Jan Høydahl commented on LUCENE-10174: -- See PR https://github.com/apache/lucene/pull/396 > Update buildAndPushRelease.py for new gradle build > -- > > Key: LUCENE-10174 > URL: https://issues.apache.org/jira/browse/LUCENE-10174 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > With LUCENE-9488 and LUCENE-10173 the gradle build was polished to properly > build source and binary artifacts, and sign those using either gpg tool or a > built-in java-based signing plugin. See > [https://github.com/apache/lucene/blob/main/help/publishing.txt] > This jira will update {{buildAndPushRelease.py}} script to use the correct > build parameters. It will also add cmdline args to choose between gpg and > built-in (gpg default), and to supply the location of {{gpgHome}} if you do > not use gpg. We'll also add an option to NOT prompt for passphrase in the > python script, which will fallback to defaults (gpg-agent, env.vars or > gradle.properties). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy edited a comment on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
janhoy edited a comment on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946453138 @mocobeta I got the same as you - looks like LICENSE and NOTICE are not copied into the maven jars. Strange, since they exist in the binary-release jars. Looks like maven task re-build the jars.. Could not the maven task use the pre-built jars that already have NOTICE and LICENSE? I also notice that `MANIFEST.MF` is empty in the maven jars, simply one line, which is also wrong: ``` Manifest-Version: 1.0 ``` I created https://issues.apache.org/jira/browse/LUCENE-10186 for maven artifacts. Re-opened LUCENE-10174 for the buildAndPushRelease changes, https://github.com/apache/lucene/pull/396. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy commented on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
janhoy commented on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946481179 @mocobeta I think you asked whether the `./gradlew` commands are windows safe. I really don't know. I see tons of `/` in that script so perhaps python translates it? I don't use Windows so cannot test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10182) TestRamUsageEstimator asserts trivial equality
[ https://issues.apache.org/jira/browse/LUCENE-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-10182. Resolution: Fixed > TestRamUsageEstimator asserts trivial equality > -- > > Key: LUCENE-10182 > URL: https://issues.apache.org/jira/browse/LUCENE-10182 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Stefan Vodita >Assignee: Uwe Schindler >Priority: Major > Fix For: main (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > {{TestRamUsageEstimator.testStaticOverloads}} has serveral lines like: > {code:java} > assertEquals(sizeOf(array), sizeOf((Object) array)); > {code} > Both calls to {{sizeOf()}} fall back on {{RamUsageTester.sizeOf}}, making the > 2 calls identical. Instead, we would want one of the calls to go to > {{RamUsageEstimator.sizeOf}}. > > This issue came up while working on LUCENE-10129. A possible solution, as per > [~uschindler]'s suggestion, would be to remove the static import > {code:java} > import static org.apache.lucene.util.RamUsageTester.sizeOf; > {code} > Instead, we could be explicit on which method we are calling, like: > {code:java} > assertEquals(RamUsageEstimator.sizeOf(array), RamUsageTester.sizeOf(array)); > {code} > This could be replicated for other potentially confusing cases in the test > class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10182) TestRamUsageEstimator asserts trivial equality
[ https://issues.apache.org/jira/browse/LUCENE-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430408#comment-17430408 ] Uwe Schindler commented on LUCENE-10182: Hi, I think we should not backport the changes. The last Lucene/Solr 8.11 release is on the go already, so it is not worth the trouble. The code may be untested, but that does not mean there's a bug in productive code. What I figured out when reading through the patch again: In TestRamUsageEstimator, the order of assertEquals is wrong: The expected value should come first (what RamUsageTester returns) and the value which we want to verify (the RamUsageEstimator static overload) should be second parameter. But that's just nitpicking. If you want to fix, make a PR. So I would close this issue now. > TestRamUsageEstimator asserts trivial equality > -- > > Key: LUCENE-10182 > URL: https://issues.apache.org/jira/browse/LUCENE-10182 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Stefan Vodita >Assignee: Uwe Schindler >Priority: Major > Fix For: main (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > {{TestRamUsageEstimator.testStaticOverloads}} has serveral lines like: > {code:java} > assertEquals(sizeOf(array), sizeOf((Object) array)); > {code} > Both calls to {{sizeOf()}} fall back on {{RamUsageTester.sizeOf}}, making the > 2 calls identical. Instead, we would want one of the calls to go to > {{RamUsageEstimator.sizeOf}}. > > This issue came up while working on LUCENE-10129. A possible solution, as per > [~uschindler]'s suggestion, would be to remove the static import > {code:java} > import static org.apache.lucene.util.RamUsageTester.sizeOf; > {code} > Instead, we could be explicit on which method we are calling, like: > {code:java} > assertEquals(RamUsageEstimator.sizeOf(array), RamUsageTester.sizeOf(array)); > {code} > This could be replicated for other potentially confusing cases in the test > class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430415#comment-17430415 ] Dawid Weiss commented on LUCENE-9997: - I ran the full check with local keys and dev mode (on jan/lucene9997-smoketester-part-2). {code} SUCCESS! [0:07:30.308004] {code} Two things that are odd: - r-- permissions on all maven artifact files - these are slightly odd and prevent those files from being removed (from /tmp). - that 'rev.txt' file is annoying. I'm not sure what it's for and don't have the time to check, but it looks like a bug. > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Priority: Major > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
dweiss commented on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946494828 I use Windows - these scripts are not compatible. I don't think we have to make it a priority to make them compatible. Too many variables to think of. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita opened a new pull request #397: LUCENE-10182: Order assertion parameters correctly
stefanvodita opened a new pull request #397: URL: https://github.com/apache/lucene/pull/397 # Description Reorder assert parameters in `TestRamUsageEstimator.testStaticOverloads` like `assertEquals(expected, actual)` instead of `assertEquals(actual, expected)`. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10182) TestRamUsageEstimator asserts trivial equality
[ https://issues.apache.org/jira/browse/LUCENE-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430430#comment-17430430 ] Stefan Vodita commented on LUCENE-10182: Might as well fix the assertion order, since it's a small change. [Here|https://github.com/apache/lucene/pull/397] is the PR for it. > TestRamUsageEstimator asserts trivial equality > -- > > Key: LUCENE-10182 > URL: https://issues.apache.org/jira/browse/LUCENE-10182 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Stefan Vodita >Assignee: Uwe Schindler >Priority: Major > Fix For: main (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > {{TestRamUsageEstimator.testStaticOverloads}} has serveral lines like: > {code:java} > assertEquals(sizeOf(array), sizeOf((Object) array)); > {code} > Both calls to {{sizeOf()}} fall back on {{RamUsageTester.sizeOf}}, making the > 2 calls identical. Instead, we would want one of the calls to go to > {{RamUsageEstimator.sizeOf}}. > > This issue came up while working on LUCENE-10129. A possible solution, as per > [~uschindler]'s suggestion, would be to remove the static import > {code:java} > import static org.apache.lucene.util.RamUsageTester.sizeOf; > {code} > Instead, we could be explicit on which method we are calling, like: > {code:java} > assertEquals(RamUsageEstimator.sizeOf(array), RamUsageTester.sizeOf(array)); > {code} > This could be replicated for other potentially confusing cases in the test > class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
mocobeta commented on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946544474 @janhoy @dweiss Commands with `./` do not work on Windows. Yesterday I just tested it on my Windows OS; I saw Command Prompt and PowerShell do not support `./gradlew`. After I noticed this comment in the file, I deleted the comment... sorry for the noise. https://github.com/apache/lucene/blob/6c21862a552cccbb8509e4383ac8c6d10c68137f/dev-tools/scripts/smokeTestRelease.py#L43-L45 I think it could be labor to make the scripts OS-agnostic. As for Windows, instead of fully supporting Windows perhaps we could test it on WSL2 then throw away Cygwin? I have little experience with it, but it seems to work just as plain Ubuntu and it's easier to install than Cygwin (its I/O performance was terrible a few years ago, but it should have improved...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430442#comment-17430442 ] Dawid Weiss commented on LUCENE-9997: - The rev.txt file is for running on a "prepared" package - it saves the git revision in a separate file because otherwise it wouldn't have any means to read it back from. {code} parser.add_argument('--no-prepare', dest='prepare', default=True, action='store_false', help='Use the already built release in the provided checkout') {code} I think we should make the git revision part of the distribution artifacts - then the smoke tester can read it directly from the distribution artifact release folder. Moreover, the git revision could also be part of the "source" distribution of Lucene - then the build scripts can be tweaked to actually work without the git clone (on the true "source" distribution) by simulating the git revision read from such a file. Thoughts? > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Priority: Major > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 8h > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta edited a comment on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
mocobeta edited a comment on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946544474 @janhoy @dweiss Commands with `./` do not work on Windows. Yesterday I just tested it on my Windows OS; I saw Command Prompt and PowerShell do not support `./gradlew`, and Python didn't interpret them for Windows. After I noticed this comment in the file, I deleted the comment... sorry for the noise. https://github.com/apache/lucene/blob/6c21862a552cccbb8509e4383ac8c6d10c68137f/dev-tools/scripts/smokeTestRelease.py#L43-L45 I think it could be labor to make the scripts OS-agnostic. As for Windows, instead of fully supporting Windows perhaps we could test it on WSL2 then throw away Cygwin? I have little experience with it, but it seems to work just as plain Ubuntu and it's easier to install than Cygwin (its I/O performance was terrible a few years ago, but it should have improved...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
dweiss commented on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-946571894 I'm really fine with these scripts working just on Unix-ish systems. If you really want to, WSL or a virtual machines is a fine workaround for Windows users (like you or me). Like I said - to many variables to consider (file permissions are notoriously annoying to get right). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default
apanimesh061 commented on a change in pull request #362: URL: https://github.com/apache/lucene/pull/362#discussion_r731752215 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() { /** * Internally use the {@link Weight#matches(LeafReaderContext, int)} API for highlighting. It's - * more accurate to the query, though might not calculate passage relevancy as well. Use of this - * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link - * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default. + * more accurate to the query, and the snippets can be a little different for phrases because + * the whole phrase is marked up instead of each word. The passage relevancy calculation can be + * different (maybe worse?) and it's slower when highlighting many fields. Use of this flag + * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link + * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so long as the requirements Review comment: I am attaching a diff file here which contains the unit test changes for the default behavior and the changes I mentioned in the comment above: [LUCENE-9431.txt](https://github.com/apache/lucene/files/7372657/LUCENE-9431.txt). Meanwhile I am trying to figure out how to update this current pull-request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dnhatn commented on pull request #389: LUCENE-10159: Fix invalid access in sorted set dv
dnhatn commented on pull request #389: URL: https://github.com/apache/lucene/pull/389#issuecomment-946649980 @rmuir @jpountz Thanks for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9613) Create blocks for ords when it helps in Lucene80DocValuesFormat
[ https://issues.apache.org/jira/browse/LUCENE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430493#comment-17430493 ] ASF subversion and git services commented on LUCENE-9613: - Commit 8b68bf60c9871ecb200f64c64bf55eb6ac456c0e in lucene's branch refs/heads/main from Nhat Nguyen [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8b68bf6 ] LUCENE-10159: Fix invalid access in sorted set dv (#389) We introduced invalid accesses for sorted set doc values in LUCENE-9613. However, the issue has been unnoticed because the ordinals in doc values tests aren't complex enough to use high packed bits, and the 3 padding bytes make these invalid accesses perfectly fine. To reproduce this issue, we need to use at least 20 bits per value for the ordinals. > Create blocks for ords when it helps in Lucene80DocValuesFormat > --- > > Key: LUCENE-9613 > URL: https://issues.apache.org/jira/browse/LUCENE-9613 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently for sorted(-set) values, we always write ords using > log2(valueCount) bits per entry. However in several cases like when the field > is used in the index sort, or if one value is _very_common, splitting into > blocks like we do for numerics would help. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10159) Index corruption: IndexOutOfBoundsException for doc values
[ https://issues.apache.org/jira/browse/LUCENE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430492#comment-17430492 ] ASF subversion and git services commented on LUCENE-10159: -- Commit 8b68bf60c9871ecb200f64c64bf55eb6ac456c0e in lucene's branch refs/heads/main from Nhat Nguyen [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8b68bf6 ] LUCENE-10159: Fix invalid access in sorted set dv (#389) We introduced invalid accesses for sorted set doc values in LUCENE-9613. However, the issue has been unnoticed because the ordinals in doc values tests aren't complex enough to use high packed bits, and the 3 padding bytes make these invalid accesses perfectly fine. To reproduce this issue, we need to use at least 20 bits per value for the ordinals. > Index corruption: IndexOutOfBoundsException for doc values > -- > > Key: LUCENE-10159 > URL: https://issues.apache.org/jira/browse/LUCENE-10159 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Blocker > Time Spent: 2h 50m > Remaining Estimate: 0h > > Since we upgraded Elasticsearch to a Lucene 9 snaspshot, we have seen test > failures with the following stack trace. This looks like an issue with the > Lucene90 DocValuesFormat. > {noformat} > org.apache.lucene.index.MergePolicy$MergeException: > java.lang.IndexOutOfBoundsException > at > org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2340) > ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] > at > org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) > ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] > at > org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) > ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > [?:?] > at java.lang.Thread.run(Thread.java:833) [?:?] > Caused by: java.lang.IndexOutOfBoundsException > at java.nio.Buffer.checkIndex(Buffer.java:749) ~[?:?] > at java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:692) ~[?:?] > at > org.apache.lucene.store.ByteBufferGuard.getInt(ByteBufferGuard.java:128) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readInt(ByteBufferIndexInput.java:591) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.util.packed.DirectReader$DirectPackedReader20.get(DirectReader.java:222) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.util.packed.DirectMonotonicReader.get(DirectMonotonicReader.java:149) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$24.set(Lucene90DocValuesProducer.java:1356) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$24.docValueCount(Lucene90DocValuesProducer.java:1348) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$25.nextDoc(Lucene90DocValuesProducer.java:1405) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:837) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:148) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$Fi
[GitHub] [lucene] dnhatn merged pull request #389: LUCENE-10159: Fix invalid access in sorted set dv
dnhatn merged pull request #389: URL: https://github.com/apache/lucene/pull/389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10159) Index corruption: IndexOutOfBoundsException for doc values
[ https://issues.apache.org/jira/browse/LUCENE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nhat Nguyen updated LUCENE-10159: - Fix Version/s: main (9.0) Resolution: Fixed Status: Resolved (was: Patch Available) > Index corruption: IndexOutOfBoundsException for doc values > -- > > Key: LUCENE-10159 > URL: https://issues.apache.org/jira/browse/LUCENE-10159 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Blocker > Fix For: main (9.0) > > Time Spent: 3h > Remaining Estimate: 0h > > Since we upgraded Elasticsearch to a Lucene 9 snaspshot, we have seen test > failures with the following stack trace. This looks like an issue with the > Lucene90 DocValuesFormat. > {noformat} > org.apache.lucene.index.MergePolicy$MergeException: > java.lang.IndexOutOfBoundsException > at > org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$2.doRun(InternalEngine.java:2340) > ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] > at > org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) > ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] > at > org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) > ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > [?:?] > at java.lang.Thread.run(Thread.java:833) [?:?] > Caused by: java.lang.IndexOutOfBoundsException > at java.nio.Buffer.checkIndex(Buffer.java:749) ~[?:?] > at java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:692) ~[?:?] > at > org.apache.lucene.store.ByteBufferGuard.getInt(ByteBufferGuard.java:128) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readInt(ByteBufferIndexInput.java:591) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.util.packed.DirectReader$DirectPackedReader20.get(DirectReader.java:222) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.util.packed.DirectMonotonicReader.get(DirectMonotonicReader.java:149) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$24.set(Lucene90DocValuesProducer.java:1356) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$24.docValueCount(Lucene90DocValuesProducer.java:1348) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$25.nextDoc(Lucene90DocValuesProducer.java:1405) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:837) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:148) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:154) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:168) > ~[lucene-core-9.0.0-snapshot-cb366d04d4a.jar:9.0.0-snapshot-cb366d04d4a > cb366d04d4a611faf90b36e49aa4bd9b48a6e83d - romseygeek - 2021-10-01 09:50:47] > at > org.apache.lucene.index.SegmentMerger.lambda$merge$2(SegmentMerger.ja
[GitHub] [lucene] jpountz commented on pull request #389: LUCENE-10159: Fix invalid access in sorted set dv
jpountz commented on pull request #389: URL: https://github.com/apache/lucene/pull/389#issuecomment-946681702 @rmuir Agreed with your thoughts. I wonder what you would think of reducing the padding to the strict minimum depending on the number of bits per value. This would make it harder to change how we read values in the future, but this would also make it more likely to detect out-of-bounds access in the future as we could also detect this on low numbers of bits per value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #389: LUCENE-10159: Fix invalid access in sorted set dv
rmuir commented on pull request #389: URL: https://github.com/apache/lucene/pull/389#issuecomment-946683120 +1 to that idea as a followup issue too. it is really bad that it masked the bug here! Thanks @dnhatn for all the debugging and the fix! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10187) Reduce DirectWriter's padding to a minimum
Adrien Grand created LUCENE-10187: - Summary: Reduce DirectWriter's padding to a minimum Key: LUCENE-10187 URL: https://issues.apache.org/jira/browse/LUCENE-10187 Project: Lucene - Core Issue Type: Bug Reporter: Adrien Grand This is a follow-up of LUCENE-10159 where DirectWriter's padding hid an out-of-bounds access. A consequence of DirectWriter's padding is that out-of-bounds access is completely silent until doc values use strictly more than 16 bits per ord, a situation that almost never occurs in our tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10188) Give SortedSetDocValues a docValueCount()?
Adrien Grand created LUCENE-10188: - Summary: Give SortedSetDocValues a docValueCount()? Key: LUCENE-10188 URL: https://issues.apache.org/jira/browse/LUCENE-10188 Project: Lucene - Core Issue Type: Wish Reporter: Adrien Grand Theoretically SortedSetDocValues gives more options to codecs with regard to how SORTED_SET doc values could store ords. However in practice we currently always store counts. Maybe giving SORTED_SET doc values an API that is closer to the API of SORTED_NUMERIC doc values would be a better trade-off? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10180) Remove usage of lambdas in SegmentMerger?
[ https://issues.apache.org/jira/browse/LUCENE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430531#comment-17430531 ] Adrien Grand commented on LUCENE-10180: --- I'm not sure either, I don't think I did anything special so that this PR would not get linked here. Sorry [~vigyas]. If you're looking for an easy issue to get started, I could recommend this one: LUCENE-10084, though it's not related to merging. > Remove usage of lambdas in SegmentMerger? > - > > Key: LUCENE-10180 > URL: https://issues.apache.org/jira/browse/LUCENE-10180 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Attachments: profile.png > > > SegmentMerger now uses lambdas to share the logic around logging merging > times for all file formats. > One problem is that these lambdas get auto-generated names, and it makes it > harder to work with profilers since things that should logically end up in > the same sub tree end up in different sub trees because two instances of the > same lambda get different names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #385: LUCENE-10180: Avoid using lambdas in SegmentMerger.
jpountz merged pull request #385: URL: https://github.com/apache/lucene/pull/385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10180) Remove usage of lambdas in SegmentMerger?
[ https://issues.apache.org/jira/browse/LUCENE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430532#comment-17430532 ] Adrien Grand commented on LUCENE-10180: --- bq. does the proposed solution (function pointers) make the profiles more consistent? Yes it does, since the method ref is always given its actual name in profiles. > Remove usage of lambdas in SegmentMerger? > - > > Key: LUCENE-10180 > URL: https://issues.apache.org/jira/browse/LUCENE-10180 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Attachments: profile.png > > > SegmentMerger now uses lambdas to share the logic around logging merging > times for all file formats. > One problem is that these lambdas get auto-generated names, and it makes it > harder to work with profilers since things that should logically end up in > the same sub tree end up in different sub trees because two instances of the > same lambda get different names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10180) Remove usage of lambdas in SegmentMerger?
[ https://issues.apache.org/jira/browse/LUCENE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430533#comment-17430533 ] ASF subversion and git services commented on LUCENE-10180: -- Commit 1448e4739b90613d63ac9efeea1326214b720638 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1448e47 ] LUCENE-10180: Avoid using lambdas in SegmentMerger. (#385) > Remove usage of lambdas in SegmentMerger? > - > > Key: LUCENE-10180 > URL: https://issues.apache.org/jira/browse/LUCENE-10180 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Attachments: profile.png > > > SegmentMerger now uses lambdas to share the logic around logging merging > times for all file formats. > One problem is that these lambdas get auto-generated names, and it makes it > harder to work with profilers since things that should logically end up in > the same sub tree end up in different sub trees because two instances of the > same lambda get different names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10180) Remove usage of lambdas in SegmentMerger?
[ https://issues.apache.org/jira/browse/LUCENE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10180. --- Fix Version/s: main (9.0) Resolution: Fixed > Remove usage of lambdas in SegmentMerger? > - > > Key: LUCENE-10180 > URL: https://issues.apache.org/jira/browse/LUCENE-10180 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Attachments: profile.png > > Time Spent: 10m > Remaining Estimate: 0h > > SegmentMerger now uses lambdas to share the logic around logging merging > times for all file formats. > One problem is that these lambdas get auto-generated names, and it makes it > harder to work with profilers since things that should logically end up in > the same sub tree end up in different sub trees because two instances of the > same lambda get different names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request #398: LUCENE-10187: Reduce DirectWriter's padding.
jpountz opened a new pull request #398: URL: https://github.com/apache/lucene/pull/398 It would make us more likely to detect out-of-bounds access in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #385: LUCENE-10180: Avoid using lambdas in SegmentMerger.
uschindler commented on pull request #385: URL: https://github.com/apache/lucene/pull/385#issuecomment-946762526 Thanks! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10180) Remove usage of lambdas in SegmentMerger?
[ https://issues.apache.org/jira/browse/LUCENE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430575#comment-17430575 ] Uwe Schindler commented on LUCENE-10180: {quote} bq. does the proposed solution (function pointers) make the profiles more consistent? Yes it does, since the method ref is always given its actual name in profiles. {quote} Background for [~sokolov]: Lambdas can't be compiled to without creating a method out of it. So {{a -> foobar(a)}} will generate a static or virtual method (depending on if access to "this" is needed) named {{lambda$XY(a)}} with the body {{return foobar(a)}. This is of course not needed but you always see the lambda method in the stack traces. So to better allow to see where something happens in "simple cases" (does not work in complex chains with Java streams): Avoid lambdas and add the bodies as methods. But always look at signatures and always prefer a method reference anywhere in code if a lambda that only calls another method with exact same parameter (with or without "this" capture). > Remove usage of lambdas in SegmentMerger? > - > > Key: LUCENE-10180 > URL: https://issues.apache.org/jira/browse/LUCENE-10180 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Attachments: profile.png > > Time Spent: 20m > Remaining Estimate: 0h > > SegmentMerger now uses lambdas to share the logic around logging merging > times for all file formats. > One problem is that these lambdas get auto-generated names, and it makes it > harder to work with profilers since things that should logically end up in > the same sub tree end up in different sub trees because two instances of the > same lambda get different names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10180) Remove usage of lambdas in SegmentMerger?
[ https://issues.apache.org/jira/browse/LUCENE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430575#comment-17430575 ] Uwe Schindler edited comment on LUCENE-10180 at 10/19/21, 2:20 PM: --- {quote} bq. does the proposed solution (function pointers) make the profiles more consistent? Yes it does, since the method ref is always given its actual name in profiles. {quote} Background for [~sokolov]: Lambdas can't be compiled to bytecode without creating a method out of it (and then make a reference to the same type of method reference syntax in the lambda bootstrap invokedynamic). So {{a -> foobar(a)}} will generate a static or virtual method (depending on if access to "this" is needed) named {{lambda$XY(a)}} with the body {{return foobar(a)}. This is of course not needed but you always see the lambda method in the stack traces. So to better allow to see where something happens in "simple cases" (does not work in complex chains with Java streams): Avoid lambdas and add the bodies as methods. But always look at signatures and always prefer a method reference anywhere in code if a lambda that only calls another method with exact same parameter (with or without "this" capture). was (Author: thetaphi): {quote} bq. does the proposed solution (function pointers) make the profiles more consistent? Yes it does, since the method ref is always given its actual name in profiles. {quote} Background for [~sokolov]: Lambdas can't be compiled to without creating a method out of it. So {{a -> foobar(a)}} will generate a static or virtual method (depending on if access to "this" is needed) named {{lambda$XY(a)}} with the body {{return foobar(a)}. This is of course not needed but you always see the lambda method in the stack traces. So to better allow to see where something happens in "simple cases" (does not work in complex chains with Java streams): Avoid lambdas and add the bodies as methods. But always look at signatures and always prefer a method reference anywhere in code if a lambda that only calls another method with exact same parameter (with or without "this" capture). > Remove usage of lambdas in SegmentMerger? > - > > Key: LUCENE-10180 > URL: https://issues.apache.org/jira/browse/LUCENE-10180 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Attachments: profile.png > > Time Spent: 20m > Remaining Estimate: 0h > > SegmentMerger now uses lambdas to share the logic around logging merging > times for all file formats. > One problem is that these lambdas get auto-generated names, and it makes it > harder to work with profilers since things that should logically end up in > the same sub tree end up in different sub trees because two instances of the > same lambda get different names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10180) Remove usage of lambdas in SegmentMerger?
[ https://issues.apache.org/jira/browse/LUCENE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430575#comment-17430575 ] Uwe Schindler edited comment on LUCENE-10180 at 10/19/21, 2:21 PM: --- {quote} bq. does the proposed solution (function pointers) make the profiles more consistent? Yes it does, since the method ref is always given its actual name in profiles. {quote} Background for [~sokolov]: Lambdas can't be compiled to bytecode without creating a method out of it (and then make a reference to the same type of method reference syntax in the lambda bootstrap invokedynamic). So {{a -> foobar(a)}} will generate a static or virtual method (depending on if access to "this" is needed) named {{lambda$XY(a)}} with the body {{return foobar(a)}}. This is of course not needed but you always see the lambda method in the stack traces. So to better allow to see where something happens in "simple cases" (does not work in complex chains with Java streams): Avoid lambdas and add the bodies as methods. But always look at signatures and always prefer a method reference anywhere in code if a lambda that only calls another method with exact same parameter (with or without "this" capture). was (Author: thetaphi): {quote} bq. does the proposed solution (function pointers) make the profiles more consistent? Yes it does, since the method ref is always given its actual name in profiles. {quote} Background for [~sokolov]: Lambdas can't be compiled to bytecode without creating a method out of it (and then make a reference to the same type of method reference syntax in the lambda bootstrap invokedynamic). So {{a -> foobar(a)}} will generate a static or virtual method (depending on if access to "this" is needed) named {{lambda$XY(a)}} with the body {{return foobar(a)}. This is of course not needed but you always see the lambda method in the stack traces. So to better allow to see where something happens in "simple cases" (does not work in complex chains with Java streams): Avoid lambdas and add the bodies as methods. But always look at signatures and always prefer a method reference anywhere in code if a lambda that only calls another method with exact same parameter (with or without "this" capture). > Remove usage of lambdas in SegmentMerger? > - > > Key: LUCENE-10180 > URL: https://issues.apache.org/jira/browse/LUCENE-10180 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Attachments: profile.png > > Time Spent: 20m > Remaining Estimate: 0h > > SegmentMerger now uses lambdas to share the logic around logging merging > times for all file formats. > One problem is that these lambdas get auto-generated names, and it makes it > harder to work with profilers since things that should logically end up in > the same sub tree end up in different sub trees because two instances of the > same lambda get different names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #397: LUCENE-10182: Order assertion parameters correctly
uschindler merged pull request #397: URL: https://github.com/apache/lucene/pull/397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10182) TestRamUsageEstimator asserts trivial equality
[ https://issues.apache.org/jira/browse/LUCENE-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430581#comment-17430581 ] ASF subversion and git services commented on LUCENE-10182: -- Commit 54c5a2ce28d35c3ff9eb98aa83a69ca6d0f69134 in lucene's branch refs/heads/main from Stefan Vodita [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=54c5a2c ] LUCENE-10182: Order assertion parameters correctly (#397) > TestRamUsageEstimator asserts trivial equality > -- > > Key: LUCENE-10182 > URL: https://issues.apache.org/jira/browse/LUCENE-10182 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Stefan Vodita >Assignee: Uwe Schindler >Priority: Major > Fix For: main (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > {{TestRamUsageEstimator.testStaticOverloads}} has serveral lines like: > {code:java} > assertEquals(sizeOf(array), sizeOf((Object) array)); > {code} > Both calls to {{sizeOf()}} fall back on {{RamUsageTester.sizeOf}}, making the > 2 calls identical. Instead, we would want one of the calls to go to > {{RamUsageEstimator.sizeOf}}. > > This issue came up while working on LUCENE-10129. A possible solution, as per > [~uschindler]'s suggestion, would be to remove the static import > {code:java} > import static org.apache.lucene.util.RamUsageTester.sizeOf; > {code} > Instead, we could be explicit on which method we are calling, like: > {code:java} > assertEquals(RamUsageEstimator.sizeOf(array), RamUsageTester.sizeOf(array)); > {code} > This could be replicated for other potentially confusing cases in the test > class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10182) TestRamUsageEstimator asserts trivial equality
[ https://issues.apache.org/jira/browse/LUCENE-10182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430582#comment-17430582 ] Uwe Schindler commented on LUCENE-10182: Merged! > TestRamUsageEstimator asserts trivial equality > -- > > Key: LUCENE-10182 > URL: https://issues.apache.org/jira/browse/LUCENE-10182 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Stefan Vodita >Assignee: Uwe Schindler >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > {{TestRamUsageEstimator.testStaticOverloads}} has serveral lines like: > {code:java} > assertEquals(sizeOf(array), sizeOf((Object) array)); > {code} > Both calls to {{sizeOf()}} fall back on {{RamUsageTester.sizeOf}}, making the > 2 calls identical. Instead, we would want one of the calls to go to > {{RamUsageEstimator.sizeOf}}. > > This issue came up while working on LUCENE-10129. A possible solution, as per > [~uschindler]'s suggestion, would be to remove the static import > {code:java} > import static org.apache.lucene.util.RamUsageTester.sizeOf; > {code} > Instead, we could be explicit on which method we are calling, like: > {code:java} > assertEquals(RamUsageEstimator.sizeOf(array), RamUsageTester.sizeOf(array)); > {code} > This could be replicated for other potentially confusing cases in the test > class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10189) Optimize SortedSet/SortedNumeric doc values writers for fields that are effectively single-valued
Adrien Grand created LUCENE-10189: - Summary: Optimize SortedSet/SortedNumeric doc values writers for fields that are effectively single-valued Key: LUCENE-10189 URL: https://issues.apache.org/jira/browse/LUCENE-10189 Project: Lucene - Core Issue Type: Wish Reporter: Adrien Grand I was wondering how much overhead multi-valued doc-value types have over their single-valued counterparts, so I hacked IndexTaxis to index all doc-value fields via Sorted(Set|Numeric)DocValuesField instead of (Sorted|Numeric)DocValuesField and flush times increased by 30%. It should be easy to automatically detect such cases in the doc values writers? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
Dawid Weiss created LUCENE-10190: Summary: Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber Key: LUCENE-10190 URL: https://issues.apache.org/jira/browse/LUCENE-10190 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss CI failure in PR at: https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 Does not reproduce. Stack below. {code} org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber FAILED com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1840, name=Thread-1481, state=RUNNABLE, group=TGRP-TestIndexWriter] at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) Caused by: java.lang.AssertionError: expected:<1> but was:<0> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) at java.base/java.lang.Thread.run(Thread.java:829) org.apache.lucene.index.TestIndexWriter > test suite's output saved to /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, copied below: 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException 2> WARNING: Uncaught exception in thread: Thread[Thread-1481,5,TGRP-TestIndexWriter] 2> java.lang.AssertionError: expected:<1> but was:<0> 2>at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) 2>at org.junit.Assert.fail(Assert.java:89) 2>at org.junit.Assert.failNotEquals(Assert.java:835) 2>at org.junit.Assert.assertEquals(Assert.java:647) 2>at org.junit.Assert.assertEquals(Assert.java:633) 2>at org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) 2>at java.base/java.lang.Thread.run(Thread.java:829) 2> > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1840, name=Thread-1481, state=RUNNABLE, group=TGRP-TestIndexWriter] > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) 2> NOTE: reproduce with: gradlew test --tests TestIndexWriter.testMaxCompletedSequenceNumber -Dtests.seed=5B8EFE8DBEFB881C -Dtests.badapples=true -Dtests.locale=ar-KW -Dtests.timezone=Africa/Lusaka -Dtests.asserts=true -Dtests.file.encoding=UTF-8 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #396: LUCENE-10174 BuildAndPushRelease additional improvements
dweiss commented on pull request #396: URL: https://github.com/apache/lucene/pull/396#issuecomment-946803083 CI failed with LUCENE-10190. I respinned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss edited a comment on pull request #396: LUCENE-10174 BuildAndPushRelease additional improvements
dweiss edited a comment on pull request #396: URL: https://github.com/apache/lucene/pull/396#issuecomment-946803083 CI failed with LUCENE-10190. I respun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy commented on pull request #396: LUCENE-10174 BuildAndPushRelease additional improvements
janhoy commented on pull request #396: URL: https://github.com/apache/lucene/pull/396#issuecomment-946830386 > LGTM. I also think all gradle invocations from within the python script shouldn't fork the daemon (--no-daemon) - this prevents leaking memory and makes sure nothing is left behind in case of errors. Are you saying that `--no-daemon` would be better? I could fold that into this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #396: LUCENE-10174 BuildAndPushRelease additional improvements
dweiss commented on pull request #396: URL: https://github.com/apache/lucene/pull/396#issuecomment-946910151 Yes, I think --no-daemon would be helpful here (in addition to worker restriction). This leaves nothing behind - clean slate for a re-run. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10189) Optimize SortedSet/SortedNumeric doc values writers for fields that are effectively single-valued
[ https://issues.apache.org/jira/browse/LUCENE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430647#comment-17430647 ] Robert Muir commented on LUCENE-10189: -- For the SortedSetCase, seems like we want to fix the IW component to use {{DocValues.singleton}}, box it up, and return it from {{SortedSetDocValues getDocValues()}} ? Then the DocValuesConsumer can simply check with {{DocValues.unwrapSingleton}}. The same codepath can work for merge and flush. > Optimize SortedSet/SortedNumeric doc values writers for fields that are > effectively single-valued > - > > Key: LUCENE-10189 > URL: https://issues.apache.org/jira/browse/LUCENE-10189 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > > I was wondering how much overhead multi-valued doc-value types have over > their single-valued counterparts, so I hacked IndexTaxis to index all > doc-value fields via Sorted(Set|Numeric)DocValuesField instead of > (Sorted|Numeric)DocValuesField and flush times increased by 30%. It should be > easy to automatically detect such cases in the doc values writers? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10147) KnnVectorQuery can produce negative scores
[ https://issues.apache.org/jira/browse/LUCENE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430662#comment-17430662 ] Julie Tibshirani commented on LUCENE-10147: --- [~msoko...@gmail.com] you mentioned that we discussed enforcing that vectors are unit length when using {{VectorSimilarityFunction#DOT_PRODUCT}}. I'm wondering why we decided not to go that direction (I couldn't find the discussion in JIRA/ GitHub)? This is just for my context, I don't have strong feelings about the decision. > KnnVectorQuery can produce negative scores > -- > > Key: LUCENE-10147 > URL: https://issues.apache.org/jira/browse/LUCENE-10147 > Project: Lucene - Core > Issue Type: Bug >Reporter: Julie Tibshirani >Priority: Blocker > Time Spent: 1h 50m > Remaining Estimate: 0h > > The cosine similarity of two vectors falls in the range [-1, 1]. So currently > with cosine similarity, {{KnnVectorQuery}} can produce negative scores. Maybe > we should just adjust the scores in this case by adding 1, shifting them to > the range [0, 2]. > As a side note, this made me notice that > {{VectorSimilarityFunction.DOT_PRODUCT}} is really quite "expert"! Users need > to know to normalize all document and query vectors to unit length when using > this similarity. Otherwise the output is unbounded and difficult to handle in > scoring. Also dot product is not a true metric: for example, it doesn't obey > the triangle inequality. So many ANN algorithms have trouble supporting it. > As part of this issue, we could improve the documentation on > {{VectorSimilarityFunction.DOT_PRODUCT}} to clarify that normalization is > required. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy merged pull request #396: LUCENE-10174 BuildAndPushRelease additional improvements
janhoy merged pull request #396: URL: https://github.com/apache/lucene/pull/396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10174) Update buildAndPushRelease.py for new gradle build
[ https://issues.apache.org/jira/browse/LUCENE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430670#comment-17430670 ] ASF subversion and git services commented on LUCENE-10174: -- Commit f5486d13e6f440a7296c23f45cd53f0313e83e0e in lucene's branch refs/heads/main from Jan Høydahl [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f5486d1 ] LUCENE-10174 BuildAndPushRelease additional improvements (#396) > Update buildAndPushRelease.py for new gradle build > -- > > Key: LUCENE-10174 > URL: https://issues.apache.org/jira/browse/LUCENE-10174 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Time Spent: 1h 20m > Remaining Estimate: 0h > > With LUCENE-9488 and LUCENE-10173 the gradle build was polished to properly > build source and binary artifacts, and sign those using either gpg tool or a > built-in java-based signing plugin. See > [https://github.com/apache/lucene/blob/main/help/publishing.txt] > This jira will update {{buildAndPushRelease.py}} script to use the correct > build parameters. It will also add cmdline args to choose between gpg and > built-in (gpg default), and to supply the location of {{gpgHome}} if you do > not use gpg. We'll also add an option to NOT prompt for passphrase in the > python script, which will fallback to defaults (gpg-agent, env.vars or > gradle.properties). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10189) Optimize SortedSet/SortedNumeric doc values writers for fields that are effectively single-valued
[ https://issues.apache.org/jira/browse/LUCENE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430710#comment-17430710 ] Adrien Grand commented on LUCENE-10189: --- Right, I tried to do this in the linked PR (for some reason it wasn't linked automatically, I just did it manually). > Optimize SortedSet/SortedNumeric doc values writers for fields that are > effectively single-valued > - > > Key: LUCENE-10189 > URL: https://issues.apache.org/jira/browse/LUCENE-10189 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > > I was wondering how much overhead multi-valued doc-value types have over > their single-valued counterparts, so I hacked IndexTaxis to index all > doc-value fields via Sorted(Set|Numeric)DocValuesField instead of > (Sorted|Numeric)DocValuesField and flush times increased by 30%. It should be > easy to automatically detect such cases in the doc values writers? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (LUCENE-10126) CompetitiveIterator of NumericComparator can wrongly skip documents
[ https://issues.apache.org/jira/browse/LUCENE-10126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova closed LUCENE-10126. Closing after the 8.10.1 release > CompetitiveIterator of NumericComparator can wrongly skip documents > --- > > Key: LUCENE-10126 > URL: https://issues.apache.org/jira/browse/LUCENE-10126 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.9, 8.10 >Reporter: Nhat Nguyen >Priority: Major > Fix For: 8.11, 8.10.1, 9.10 > > Time Spent: 8h > Remaining Estimate: 0h > > The ML team at Elastic reported that a large scroll with an Elasticsearch > nightly build that uses Lucene 9.0 snapshot returns fewer documents than > expected. I looked into it and found that the competitive iterator can > wrongly skip docs with a chunked bulk scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (LUCENE-10119) singleSort should not be set when after is non-null
[ https://issues.apache.org/jira/browse/LUCENE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova closed LUCENE-10119. Closing after the 8.10.1 release > singleSort should not be set when after is non-null > --- > > Key: LUCENE-10119 > URL: https://issues.apache.org/jira/browse/LUCENE-10119 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: main (9.0), 8.10 >Reporter: Nhat Nguyen >Assignee: Nhat Nguyen >Priority: Major > Fix For: main (9.0), 8.11, 8.10.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Today we set the parameter `singleSort` to true when we have a single > comparator to skip documents whose values equal the last visited value. > However, this is incorrect when the search_after parameter is non-null as > that we will skip documents whose values are equal, but their docIDs are > greater than the docID of the `search_after` parameter. > > We found this issue in Elasticsearch after upgrading it to Lucene 8.10 and > Lucene 9. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (LUCENE-10110) MultiCollector should conditionally wrap single leaf collector
[ https://issues.apache.org/jira/browse/LUCENE-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova closed LUCENE-10110. Closing after the 8.10.1 release > MultiCollector should conditionally wrap single leaf collector > -- > > Key: LUCENE-10110 > URL: https://issues.apache.org/jira/browse/LUCENE-10110 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi >Priority: Minor > Fix For: main (9.0), 8.11, 8.10.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > MultiCollector adapts the score mode of multiple collectors so that they can > run together in a search. If a collector wants to skip low-scoring hits, this > adapter ensures that the other collectors still see all hits. Although, when > all these collectors have early terminated, we allow the skipping collector > to start propagating the minimum score. This is not valid because the weight > of the query is built from the combined score mode of all collectors at the > beginning of the search. > So we should always ignore the minimum score in MultiCollector if the > combined score mode is different than TOP_SCORES. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430727#comment-17430727 ] Jan Høydahl commented on LUCENE-9997: - Yey, first smoketest SUCCESS on freshly built lucene release artifacts on [PR 391|https://github.com/apache/lucene/pull/391]. {code:java} ... verify maven artifact sigs . unpack lucene-9.0.0.tgz... verify that Maven artifacts are same as in the binary distribution... verify JAR metadata/identity/no javax.* or java.* classes... SUCCESS! [0:09:23.758716]{code} > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Priority: Major > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 8.5h > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy commented on pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
janhoy commented on pull request #391: URL: https://github.com/apache/lucene/pull/391#issuecomment-947035468 ``` verify maven artifact sigs unpack lucene-9.0.0.tgz... verify that Maven artifacts are same as in the binary distribution... verify JAR metadata/identity/no javax.* or java.* classes... SUCCESS! [0:09:23.758716] ``` I also added the `--no-daemon` arg to all gradlew commands here. I'll merge this in now. Then feel free to open further PRs against LUCENE-9997 for smoke tester improvements, such as WSL support etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy merged pull request #391: LUCENE-9997 Second pass smoketester fixes for 9.0
janhoy merged pull request #391: URL: https://github.com/apache/lucene/pull/391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430730#comment-17430730 ] ASF subversion and git services commented on LUCENE-9997: - Commit c77e9ddf93ae872ba6556d39c48a0a32e31e91b1 in lucene's branch refs/heads/main from Jan Høydahl [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c77e9dd ] LUCENE-9997 Second pass smoketester fixes for 9.0 (#391) * Java17 fixes * Add to error message that the unexpected file is in lucene/ folder * Fix gpg command utf-8 output * Add --no-daemon to all gradle calls, and skip clean Co-authored-by: Dawid Weiss Co-Authored-by: Tomoko Uchida > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Priority: Major > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned LUCENE-9997: --- Assignee: Jan Høydahl > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Assignee: Jan Høydahl >Priority: Major > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved LUCENE-9997. - Fix Version/s: main (9.0) Resolution: Fixed > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10191) Optimize vector functions by precomputing magnitudes
Julie Tibshirani created LUCENE-10191: - Summary: Optimize vector functions by precomputing magnitudes Key: LUCENE-10191 URL: https://issues.apache.org/jira/browse/LUCENE-10191 Project: Lucene - Core Issue Type: Improvement Reporter: Julie Tibshirani Both euclidean distance (L2 norm) and cosine similarity can be expressed in terms of dot product and vector magnitudes: * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2) * cosine(a, b) = a . b / ||a|| ||b|| We could compute and store each vector's magnitude upfront while indexing, and compute the query vector's magnitude once per query. Then we'd calculate the distance using our (very optimized) dot product method, plus the precomputed values. This is an exploratory issue: I haven't tested this out yet, so I'm not sure how much it would help. I would at least expect it to help with cosine similarity – several months ago we tried out similar ideas in Elasticsearch and were able to get a nice boost in cosine performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10141) Update releaseWizard for 8x to correctly create back-compat indices and update Version in main after repo split
[ https://issues.apache.org/jira/browse/LUCENE-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova updated LUCENE-10141: - Fix Version/s: (was: 8.10.1) > Update releaseWizard for 8x to correctly create back-compat indices and > update Version in main after repo split > --- > > Key: LUCENE-10141 > URL: https://issues.apache.org/jira/browse/LUCENE-10141 > Project: Lucene - Core > Issue Type: Task > Components: release wizard >Reporter: Timothy Potter >Assignee: Timothy Potter >Priority: Major > Fix For: 8.11 > > > Need to update the release wizard in 8x to create the back-compat indices and > update the Version info so that issues like: > https://issues.apache.org/jira/browse/LUCENE-10131 don't impact future 8x > release managers. Hopefully an 8.11 is NOT needed but release managers have > enough on their plate to get right that we should fix this if possible. If > not, we at least need to document the process of doing it manually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10189) Optimize SortedSet/SortedNumeric doc values writers for fields that are effectively single-valued
[ https://issues.apache.org/jira/browse/LUCENE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430736#comment-17430736 ] Adrien Grand commented on LUCENE-10189: --- With the linked PR I'm getting the same flush times for single-valued fields and multi-valued fields that are single-valued (though it doesn't mean that indexing is as fast as the in-memory buffering might still have some more overhead in the multi-valued case). > Optimize SortedSet/SortedNumeric doc values writers for fields that are > effectively single-valued > - > > Key: LUCENE-10189 > URL: https://issues.apache.org/jira/browse/LUCENE-10189 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > > I was wondering how much overhead multi-valued doc-value types have over > their single-valued counterparts, so I hacked IndexTaxis to index all > doc-value fields via Sorted(Set|Numeric)DocValuesField instead of > (Sorted|Numeric)DocValuesField and flush times increased by 30%. It should be > easy to automatically detect such cases in the doc values writers? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request #400: LUCENE-10146: Add note that dot product is preferred over cosine
jtibshirani opened a new pull request #400: URL: https://github.com/apache/lucene/pull/400 While VectorSimilarityFunction#COSINE is helpful when you need to preserve the original vectors, it is significantly slower than DOT_PRODUCT. This commit adds javadocs to COSINE explaining that dot product is the fastest option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #366: LUCENE-10146: Add VectorSimilarityFunction.COSINE
jtibshirani commented on pull request #366: URL: https://github.com/apache/lucene/pull/366#issuecomment-947053287 @msokolov @mayya-sharipova following up: I ran benchmarks and it's indeed significantly slower (around 20% on some of the datasets we've been using). Here's what I've done: * Added a note to `VectorSimilarityFunction#COSINE` explaining that `DOT_PRODUCT` is the preferred option when you don't need to preserve the original vectors: https://github.com/apache/lucene/pull/400 * Opened https://issues.apache.org/jira/browse/LUCENE-10191 with ideas to speed up cosine similarity Happy to hear other feedback/ ideas! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc
[ https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430785#comment-17430785 ] Vigya Sharma commented on LUCENE-10084: --- I would like to work on this. > Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == > maxDoc > > > Key: LUCENE-10084 > URL: https://issues.apache.org/jira/browse/LUCENE-10084 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Now that we require all documents to use the same features (LUCENE-9334) we > could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms > or points have a docCount that is equal to maxDoc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #394: LUCENE-9997: write release revision to system temp dir
mocobeta commented on pull request #394: URL: https://github.com/apache/lucene/pull/394#issuecomment-947141245 The `rev.txt` file is used to reuse the git revision on the previous run when `--no-prepare` option is passed. https://github.com/apache/lucene/blob/c77e9ddf93ae872ba6556d39c48a0a32e31e91b1/dev-tools/scripts/buildAndPushRelease.py#L398-L402 I'm not sure what is the use-cases of this, but if it's needed (for convenience?) we need an explicitly fixed path and shouldn't clean up the file after the first run. It's actually not a "temporary" file... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta merged pull request #394: LUCENE-9997: write release revision to system temp dir
mocobeta merged pull request #394: URL: https://github.com/apache/lucene/pull/394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430792#comment-17430792 ] ASF subversion and git services commented on LUCENE-9997: - Commit 54418cef450afa8a2e45904f68c6db45e241c584 in lucene's branch refs/heads/main from Tomoko Uchida [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=54418ce ] LUCENE-9997: write release revision to system temp dir (#394) > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 9h > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430795#comment-17430795 ] Jan Høydahl commented on LUCENE-9997: - > r-- permissions on all maven artifact files I have noticed that too. It is done in [https://github.com/apache/lucene/blob/main/dev-tools/scripts/buildAndPushRelease.py#L234] but I don't know why > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default
dsmiley commented on a change in pull request #362: URL: https://github.com/apache/lucene/pull/362#discussion_r732287420 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() { /** * Internally use the {@link Weight#matches(LeafReaderContext, int)} API for highlighting. It's - * more accurate to the query, though might not calculate passage relevancy as well. Use of this - * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link - * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default. + * more accurate to the query, and the snippets can be a little different for phrases because + * the whole phrase is marked up instead of each word. The passage relevancy calculation can be + * different (maybe worse?) and it's slower when highlighting many fields. Use of this flag + * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link + * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so long as the requirements Review comment: For the test, I think you can merely instantiate the highlighter and grab the flags and inspect them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] janhoy opened a new pull request #401: LUCENE-10174 Speed up 'pushLocal'
janhoy opened a new pull request #401: URL: https://github.com/apache/lucene/pull/401 https://issues.apache.org/jira/browse/LUCENE-10174 When copying files from `lucene/distribution/build/release` to the target directory, the script uses `tar.bz2`, i.e. with compression. This is super slow and usesless since files are already compressed. This PR uses plain `tar` without compression to greatly speed up this step. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10174) Update buildAndPushRelease.py for new gradle build
[ https://issues.apache.org/jira/browse/LUCENE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430801#comment-17430801 ] Jan Høydahl commented on LUCENE-10174: -- See [GitHub Pull Request #401|https://github.com/apache/lucene/pull/401] for a nice speedup of the last step 'pushLocal' > Update buildAndPushRelease.py for new gradle build > -- > > Key: LUCENE-10174 > URL: https://issues.apache.org/jira/browse/LUCENE-10174 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Time Spent: 1h 40m > Remaining Estimate: 0h > > With LUCENE-9488 and LUCENE-10173 the gradle build was polished to properly > build source and binary artifacts, and sign those using either gpg tool or a > built-in java-based signing plugin. See > [https://github.com/apache/lucene/blob/main/help/publishing.txt] > This jira will update {{buildAndPushRelease.py}} script to use the correct > build parameters. It will also add cmdline args to choose between gpg and > built-in (gpg default), and to supply the location of {{gpgHome}} if you do > not use gpg. We'll also add an option to NOT prompt for passphrase in the > python script, which will fallback to defaults (gpg-agent, env.vars or > gradle.properties). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9997) Revisit smoketester for 9.0 build
[ https://issues.apache.org/jira/browse/LUCENE-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430811#comment-17430811 ] Tomoko Uchida commented on LUCENE-9997: --- bq. I think we should make the git revision part of the distribution artifacts - then the smoke tester can read it directly from the distribution artifact release folder. Moreover, the git revision could also be part of the "source" distribution of Lucene - then the build scripts can be tweaked to actually work without the git clone (on the true "source" distribution) by simulating the git revision read from such a file. +1 - if we are willing to refactor the huge smoketester script... > Revisit smoketester for 9.0 build > - > > Key: LUCENE-9997 > URL: https://issues.apache.org/jira/browse/LUCENE-9997 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Robert Muir >Assignee: Jan Høydahl >Priority: Major > Fix For: main (9.0) > > Attachments: image-2021-10-12-12-47-11-480.png, > image-2021-10-12-12-48-15-373.png > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Currently we have a (great) {{dev-tools/scripts/smokeTester.py}} that will > perform automated tests against a release. > This was developed with the ant build process in mind. > This issue is just about considering the automated checks we do here, maybe > some of them can be done efficiently in the gradle build in earlier places: > this would be a large improvement! > Obviously some of them (e.g. GPG release key verifications) are really > specific to the artifacts in question. These are most important to release > verification, as that is actually the only place we can check it. > Any other checks (and I do tend to think, this checker should try to be > thorough, invoking gradle etc), should be stuff we regularly test in > PRs/nightly/builds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
[ https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430833#comment-17430833 ] Nhat Nguyen commented on LUCENE-10190: -- I am looking at this failure. > Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber > - > > Key: LUCENE-10190 > URL: https://issues.apache.org/jira/browse/LUCENE-10190 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Minor > > CI failure in PR at: > https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 > Does not reproduce. Stack below. > {code} > org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber > FAILED > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] > at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) > org.apache.lucene.index.TestIndexWriter > test suite's output saved to > /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, > copied below: > 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 2> WARNING: Uncaught exception in thread: > Thread[Thread-1481,5,TGRP-TestIndexWriter] > 2> java.lang.AssertionError: expected:<1> but was:<0> > 2> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > 2> at org.junit.Assert.fail(Assert.java:89) > 2> at org.junit.Assert.failNotEquals(Assert.java:835) > 2> at org.junit.Assert.assertEquals(Assert.java:647) > 2> at org.junit.Assert.assertEquals(Assert.java:633) > 2> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > 2> at java.base/java.lang.Thread.run(Thread.java:829) > 2> >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured > an uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] >> at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) >> >> Caused by: >> java.lang.AssertionError: expected:<1> but was:<0> >> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) >> at org.junit.Assert.fail(Assert.java:89) >> at org.junit.Assert.failNotEquals(Assert.java:835) >> at org.junit.Assert.assertEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:633) >> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) >> at java.base/java.lang.Thread.run(Thread.java:829) > 2> NOTE: reproduce with: gradlew test --tests > TestIndexWriter.testMaxCompletedSequenceNumber -Dtests.seed=5B8EFE8DBEFB881C > -Dtests.badapples=true -Dtests.locale=ar-KW -Dtests.timezone=Africa/Lusaka > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10191) Optimize vector functions by precomputing magnitudes
[ https://issues.apache.org/jira/browse/LUCENE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430835#comment-17430835 ] Mayya Sharipova commented on LUCENE-10191: -- +1 great ideas to explore the performance boost. > Optimize vector functions by precomputing magnitudes > > > Key: LUCENE-10191 > URL: https://issues.apache.org/jira/browse/LUCENE-10191 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Julie Tibshirani >Priority: Minor > > Both euclidean distance (L2 norm) and cosine similarity can be expressed in > terms of dot product and vector magnitudes: > * l2_norm(a, b) = ||a - b|| = sqrt(||a||^2 - 2(a . b) + ||b||^2) > * cosine(a, b) = a . b / ||a|| ||b|| > We could compute and store each vector's magnitude upfront while indexing, > and compute the query vector's magnitude once per query. Then we'd calculate > the distance using our (very optimized) dot product method, plus the > precomputed values. > This is an exploratory issue: I haven't tested this out yet, so I'm not sure > how much it would help. I would at least expect it to help with cosine > similarity – several months ago we tried out similar ideas in Elasticsearch > and were able to get a nice boost in cosine performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
[ https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nhat Nguyen reassigned LUCENE-10190: Assignee: Nhat Nguyen > Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber > - > > Key: LUCENE-10190 > URL: https://issues.apache.org/jira/browse/LUCENE-10190 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Nhat Nguyen >Priority: Minor > > CI failure in PR at: > https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 > Does not reproduce. Stack below. > {code} > org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber > FAILED > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] > at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) > org.apache.lucene.index.TestIndexWriter > test suite's output saved to > /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, > copied below: > 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 2> WARNING: Uncaught exception in thread: > Thread[Thread-1481,5,TGRP-TestIndexWriter] > 2> java.lang.AssertionError: expected:<1> but was:<0> > 2> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > 2> at org.junit.Assert.fail(Assert.java:89) > 2> at org.junit.Assert.failNotEquals(Assert.java:835) > 2> at org.junit.Assert.assertEquals(Assert.java:647) > 2> at org.junit.Assert.assertEquals(Assert.java:633) > 2> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > 2> at java.base/java.lang.Thread.run(Thread.java:829) > 2> >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured > an uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] >> at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) >> >> Caused by: >> java.lang.AssertionError: expected:<1> but was:<0> >> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) >> at org.junit.Assert.fail(Assert.java:89) >> at org.junit.Assert.failNotEquals(Assert.java:835) >> at org.junit.Assert.assertEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:633) >> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) >> at java.base/java.lang.Thread.run(Thread.java:829) > 2> NOTE: reproduce with: gradlew test --tests > TestIndexWriter.testMaxCompletedSequenceNumber -Dtests.seed=5B8EFE8DBEFB881C > -Dtests.badapples=true -Dtests.locale=ar-KW -Dtests.timezone=Africa/Lusaka > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
[ https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430843#comment-17430843 ] Nhat Nguyen commented on LUCENE-10190: -- I can reproduce the issue by sleeping for a few ms before we increase the numDocsInRam ([https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05).] I will be working on the fix. > Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber > - > > Key: LUCENE-10190 > URL: https://issues.apache.org/jira/browse/LUCENE-10190 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Minor > > CI failure in PR at: > https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 > Does not reproduce. Stack below. > {code} > org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber > FAILED > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] > at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) > org.apache.lucene.index.TestIndexWriter > test suite's output saved to > /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, > copied below: > 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 2> WARNING: Uncaught exception in thread: > Thread[Thread-1481,5,TGRP-TestIndexWriter] > 2> java.lang.AssertionError: expected:<1> but was:<0> > 2> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > 2> at org.junit.Assert.fail(Assert.java:89) > 2> at org.junit.Assert.failNotEquals(Assert.java:835) > 2> at org.junit.Assert.assertEquals(Assert.java:647) > 2> at org.junit.Assert.assertEquals(Assert.java:633) > 2> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > 2> at java.base/java.lang.Thread.run(Thread.java:829) > 2> >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured > an uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] >> at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) >> >> Caused by: >> java.lang.AssertionError: expected:<1> but was:<0> >> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) >> at org.junit.Assert.fail(Assert.java:89) >> at org.junit.Assert.failNotEquals(Assert.java:835) >> at org.junit.Assert.assertEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:633) >> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) >> at java.base/java.lang.Thread.run(Thread.java:829) > 2> NOTE: reproduce with: gradlew test --tests > TestIndexWriter.testMaxCompletedSequenceNumber -Dtests.seed=5B8EFE8DBEFB881C > -Dtests.badapples=true -Dtests.locale=ar-KW -Dtests.timezone=Africa/Lusaka > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
[ https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430843#comment-17430843 ] Nhat Nguyen edited comment on LUCENE-10190 at 10/20/21, 2:45 AM: - I can reproduce the issue by sleeping for a few ms before we increase the numDocsInRam ([ https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05 ).|https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05).] I will be working on the fix. was (Author: dnhatn): I can reproduce the issue by sleeping for a few ms before we increase the numDocsInRam ([https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05).] I will be working on the fix. > Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber > - > > Key: LUCENE-10190 > URL: https://issues.apache.org/jira/browse/LUCENE-10190 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Nhat Nguyen >Priority: Minor > > CI failure in PR at: > https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 > Does not reproduce. Stack below. > {code} > org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber > FAILED > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] > at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) > org.apache.lucene.index.TestIndexWriter > test suite's output saved to > /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, > copied below: > 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 2> WARNING: Uncaught exception in thread: > Thread[Thread-1481,5,TGRP-TestIndexWriter] > 2> java.lang.AssertionError: expected:<1> but was:<0> > 2> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > 2> at org.junit.Assert.fail(Assert.java:89) > 2> at org.junit.Assert.failNotEquals(Assert.java:835) > 2> at org.junit.Assert.assertEquals(Assert.java:647) > 2> at org.junit.Assert.assertEquals(Assert.java:633) > 2> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > 2> at java.base/java.lang.Thread.run(Thread.java:829) > 2> >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured > an uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] >> at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) >> >> Caused by: >> java.lang.AssertionError: expected:<1> but was:<0> >> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) >> at org.junit.Assert.fail(Assert.java:89) >> at org.junit.Assert.failNotEquals(Assert.java:835) >> at org.junit.Assert.assertEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:633) >> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) >> at java.base/java.lang.Thread.run(Thread.java:829) > 2> NOTE: reproduce with: gradlew test --tests > TestIndexWriter.testMaxCompletedSequenceNumber -Dtests.seed=5B8EFE8DBEFB881C > -Dtests.badapples=true -Dtests.locale=ar-KW -Dtests.timezone=Africa/Lusaka > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
[ https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430843#comment-17430843 ] Nhat Nguyen edited comment on LUCENE-10190 at 10/20/21, 2:46 AM: - I can reproduce the issue by sleeping for a few ms before we increase the numDocsInRam ([https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05).|https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05] I will be working on the fix. was (Author: dnhatn): I can reproduce the issue by sleeping for a few ms before we increase the numDocsInRam ([ https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05 ).|https://github.com/apache/lucene/commit/bbd24e865451d73fc228bd6a0b712508bd111b05).] I will be working on the fix. > Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber > - > > Key: LUCENE-10190 > URL: https://issues.apache.org/jira/browse/LUCENE-10190 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Nhat Nguyen >Priority: Minor > > CI failure in PR at: > https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 > Does not reproduce. Stack below. > {code} > org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber > FAILED > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] > at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) > org.apache.lucene.index.TestIndexWriter > test suite's output saved to > /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, > copied below: > 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 2> WARNING: Uncaught exception in thread: > Thread[Thread-1481,5,TGRP-TestIndexWriter] > 2> java.lang.AssertionError: expected:<1> but was:<0> > 2> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > 2> at org.junit.Assert.fail(Assert.java:89) > 2> at org.junit.Assert.failNotEquals(Assert.java:835) > 2> at org.junit.Assert.assertEquals(Assert.java:647) > 2> at org.junit.Assert.assertEquals(Assert.java:633) > 2> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > 2> at java.base/java.lang.Thread.run(Thread.java:829) > 2> >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured > an uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] >> at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) >> >> Caused by: >> java.lang.AssertionError: expected:<1> but was:<0> >> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) >> at org.junit.Assert.fail(Assert.java:89) >> at org.junit.Assert.failNotEquals(Assert.java:835) >> at org.junit.Assert.assertEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:633) >> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) >> at java.base/java.lang.Thread.run(Thread.java:829) > 2> NOTE: reproduce with: gradlew test --tests > TestIndexWriter.testMaxCompletedSequenceNumber -Dtests.seed=5B8EFE8DBEFB881C > -Dtests.badapples=true -Dtests.locale=ar-KW -Dtests.timezone=Africa/Lusaka > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10190) Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber
[ https://issues.apache.org/jira/browse/LUCENE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17430980#comment-17430980 ] Dawid Weiss commented on LUCENE-10190: -- Thank you, [~dnhatn]! > Assertion error in TestIndexWriter.testMaxCompletedSequenceNumber > - > > Key: LUCENE-10190 > URL: https://issues.apache.org/jira/browse/LUCENE-10190 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Nhat Nguyen >Priority: Minor > > CI failure in PR at: > https://github.com/apache/lucene/pull/396/checks?check_run_id=3936559246 > Does not reproduce. Stack below. > {code} > org.apache.lucene.index.TestIndexWriter > testMaxCompletedSequenceNumber > FAILED > com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an > uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] > at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) > Caused by: > java.lang.AssertionError: expected:<1> but was:<0> > at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > at java.base/java.lang.Thread.run(Thread.java:829) > org.apache.lucene.index.TestIndexWriter > test suite's output saved to > /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriter.txt, > copied below: > 2> أكتوبر ١٩, ٢٠٢١ ١٠:٢٧:٢٧ ص > com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler > uncaughtException > 2> WARNING: Uncaught exception in thread: > Thread[Thread-1481,5,TGRP-TestIndexWriter] > 2> java.lang.AssertionError: expected:<1> but was:<0> > 2> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) > 2> at org.junit.Assert.fail(Assert.java:89) > 2> at org.junit.Assert.failNotEquals(Assert.java:835) > 2> at org.junit.Assert.assertEquals(Assert.java:647) > 2> at org.junit.Assert.assertEquals(Assert.java:633) > 2> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) > 2> at java.base/java.lang.Thread.run(Thread.java:829) > 2> >> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured > an uncaught exception in thread: Thread[id=1840, name=Thread-1481, > state=RUNNABLE, group=TGRP-TestIndexWriter] >> at > __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C:671A014F243A0EDE]:0) >> >> Caused by: >> java.lang.AssertionError: expected:<1> but was:<0> >> at __randomizedtesting.SeedInfo.seed([5B8EFE8DBEFB881C]:0) >> at org.junit.Assert.fail(Assert.java:89) >> at org.junit.Assert.failNotEquals(Assert.java:835) >> at org.junit.Assert.assertEquals(Assert.java:647) >> at org.junit.Assert.assertEquals(Assert.java:633) >> at > org.apache.lucene.index.TestIndexWriter.lambda$testMaxCompletedSequenceNumber$53(TestIndexWriter.java:4305) >> at java.base/java.lang.Thread.run(Thread.java:829) > 2> NOTE: reproduce with: gradlew test --tests > TestIndexWriter.testMaxCompletedSequenceNumber -Dtests.seed=5B8EFE8DBEFB881C > -Dtests.badapples=true -Dtests.locale=ar-KW -Dtests.timezone=Africa/Lusaka > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10192) Drop third-party JARs from the binary distribution
Dawid Weiss created LUCENE-10192: Summary: Drop third-party JARs from the binary distribution Key: LUCENE-10192 URL: https://issues.apache.org/jira/browse/LUCENE-10192 Project: Lucene - Core Issue Type: Task Reporter: Dawid Weiss Assignee: Dawid Weiss [~janhoy] Are we ready (with respect to scripts) for this change? I'd like to do it but I'm not sure whether the release wizard doesn't depend on it somehow (I will handle buildAndPushRelease.py and smokeTestRelease.py if they need fixes but I'm not sure about the releaseWizard.*). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc
[ https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431010#comment-17431010 ] Adrien Grand commented on LUCENE-10084: --- Please feel free to give it a try! > Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == > maxDoc > > > Key: LUCENE-10084 > URL: https://issues.apache.org/jira/browse/LUCENE-10084 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Now that we require all documents to use the same features (LUCENE-9334) we > could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms > or points have a docCount that is equal to maxDoc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org