[GitHub] [lucene] iverase opened a new pull request #72: LUCENE-9907: Remove packedInts#getReaderNoHeader dependency on TermsVectorFieldsFormat
iverase opened a new pull request #72: URL: https://github.com/apache/lucene/pull/72 Replaces the usages of PackedInts#getReaderNoHeader with DirecReader#getInstance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9914) Modernize Emoji regeneration scripts
[ https://issues.apache.org/jira/browse/LUCENE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317121#comment-17317121 ] Robert Muir commented on LUCENE-9914: - FYI: For the jflex we want the unicode version to match what the rest of the jflex grammar is using. Sometimes new unicode versions have features that require new jflex versions. So we may want to add something like the following to the script to make it clear what version it was generated with: {code} import com.ibm.icu.lang.UCharacter; import com.ibm.icu.util.VersionInfo; System.out.println("// Unicode Version: " + UCharacter.getUnicodeVersion()); System.out.println("// ICU Version: " + VersionInfo.ICU_VERSION); {code} > Modernize Emoji regeneration scripts > > > Key: LUCENE-9914 > URL: https://issues.apache.org/jira/browse/LUCENE-9914 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > These are perl scripts... I don't think they had ant tasks in 8x and they > haven't been used in a while. They don't seem too scary (for perl) - just > fetch emoji unicode descriptions and parse them into a jflex macro and a test > case. > It'd be good to convert them to use python, groovy or even java so that they > fit better in the build system. Alternatively - perhaps there is a way to get > these codepoint properties from Java directly? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9913) TestCompressingTermVectorsFormat.testMergeStability can fail assertion
[ https://issues.apache.org/jira/browse/LUCENE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317137#comment-17317137 ] Robert Muir commented on LUCENE-9913: - [~julietibs] I haven't dug into this test/failure yet, but it might be due to LUCENE-9827 merge compression changes to stored fields & vectors. The maxChunkSize parameter passed to the compression is now used as part of the decision about whether or not recompression happens at merge, and it wasn't used here before. So perhaps it confuses tests depending on various parameters. > TestCompressingTermVectorsFormat.testMergeStability can fail assertion > -- > > Key: LUCENE-9913 > URL: https://issues.apache.org/jira/browse/LUCENE-9913 > Project: Lucene - Core > Issue Type: Test >Reporter: Julie Tibshirani >Priority: Major > > This reproduces for me on {{main}}: > {code:java} > ./gradlew test --tests TestCompressingTermVectorsFormat.testMergeStability \ > -Dtests.seed=502C0E17C8769082 -Dtests.nightly=true \ > -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=gd-GB \ > -Dtests.timezone=Africa/Accra -Dtests.asserts=true \ > -Dtests.file.encoding=UTF-8 > {code} > Failure excerpt: > {code:java} > > java.lang.AssertionError: expected:<{tvd=33526, fnm=698, nvm=283, > tvm=164, tmd=826, fdm=158, pos=10508, fdt=1121, tvx=339, doc=13302, > tim=22354, tip=101, fdx=202, nvd=18983}> but was:<{tvd=33526, fnm=698, > nvm=283, tvm=163, tmd=826, fdm=157, pos=10508, fdt=1121, tvx=339, doc=13302, > tim=22354, tip=101, fdx=202, nvd=18983}> >> at > __randomizedtesting.SeedInfo.seed([502C0E17C8769082:24604838C59C9234]:0) >> at org.junit.Assert.fail(Assert.java:89) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9914) Modernize Emoji regeneration scripts
[ https://issues.apache.org/jira/browse/LUCENE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317138#comment-17317138 ] Uwe Schindler commented on LUCENE-9914: --- See the groovy script that creates the file UnicodeData.java. Dawid touched it a few days ago. https://github.com/apache/lucene/blob/fbf9191abf2ad4acd26bae16e075cdeb79d33a39/gradle/generation/unicode-data.gradle Uwe > Modernize Emoji regeneration scripts > > > Key: LUCENE-9914 > URL: https://issues.apache.org/jira/browse/LUCENE-9914 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > These are perl scripts... I don't think they had ant tasks in 8x and they > haven't been used in a while. They don't seem too scary (for perl) - just > fetch emoji unicode descriptions and parse them into a jflex macro and a test > case. > It'd be good to convert them to use python, groovy or even java so that they > fit better in the build system. Alternatively - perhaps there is a way to get > these codepoint properties from Java directly? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir merged pull request #70: LUCENE-9911: enable ecjLint unusedExceptionParameter
rmuir merged pull request #70: URL: https://github.com/apache/lucene/pull/70 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9911) enable ecjLint unusedExceptionParameter
[ https://issues.apache.org/jira/browse/LUCENE-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317140#comment-17317140 ] ASF subversion and git services commented on LUCENE-9911: - Commit 2971f311a2b4a9139e3a74edbe76b08bc0e288a3 in lucene's branch refs/heads/main from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2971f31 ] LUCENE-9911: enable ecjLint unusedExceptionParameter (#70) Fails the linter if an exception is swallowed (e.g. variable completely unused). If this is intentional for some reason, the exception can simply by annotated with @SuppressWarnings("unused"). > enable ecjLint unusedExceptionParameter > --- > > Key: LUCENE-9911 > URL: https://issues.apache.org/jira/browse/LUCENE-9911 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > unusedExceptionParameter is a very useful check, as it detects if you catch > an exception and do nothing with it at all. > As a library, its important to preserve exceptions (e.g. chain the root > cause, .addSuppressed, etc). This check helps prevent exceptions from getting > swallowed inadvertently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9911) enable ecjLint unusedExceptionParameter
[ https://issues.apache.org/jira/browse/LUCENE-9911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-9911. - Fix Version/s: main (9.0) Resolution: Fixed > enable ecjLint unusedExceptionParameter > --- > > Key: LUCENE-9911 > URL: https://issues.apache.org/jira/browse/LUCENE-9911 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > Fix For: main (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > unusedExceptionParameter is a very useful check, as it detects if you catch > an exception and do nothing with it at all. > As a library, its important to preserve exceptions (e.g. chain the root > cause, .addSuppressed, etc). This check helps prevent exceptions from getting > swallowed inadvertently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9914) Modernize Emoji regeneration scripts
[ https://issues.apache.org/jira/browse/LUCENE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317145#comment-17317145 ] Robert Muir commented on LUCENE-9914: - yes, that one looks great: I think a similar groovy can work here (using above snippet). We just have to use icu 62 for now so that we get unicode 11 property data to match the version of unicode that jflex grammar uses (I think it only makes sense for the whole grammar to be self-consistent with respect to that, we shouldn't mix and match). FYI, that one could be done in a similar more efficient way with UnicodeSet on the "White_Space" property as well, rather than looping thru every codepoint. But maybe it is fast enough that no one cares :) > Modernize Emoji regeneration scripts > > > Key: LUCENE-9914 > URL: https://issues.apache.org/jira/browse/LUCENE-9914 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > These are perl scripts... I don't think they had ant tasks in 8x and they > haven't been used in a while. They don't seem too scary (for perl) - just > fetch emoji unicode descriptions and parse them into a jflex macro and a test > case. > It'd be good to convert them to use python, groovy or even java so that they > fit better in the build system. Alternatively - perhaps there is a way to get > these codepoint properties from Java directly? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9914) Modernize Emoji regeneration scripts
[ https://issues.apache.org/jira/browse/LUCENE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317153#comment-17317153 ] Dawid Weiss commented on LUCENE-9914: - It does get a bit more complex if we want to use multiple ICU versions -- there can be only one referenced directly from within build scripts. Having multiple versions requires a separate configuration/ dependency and java fork with a different classpath. Not terribly difficult but definitely adding a layer of complexity. I'll take a look. > Modernize Emoji regeneration scripts > > > Key: LUCENE-9914 > URL: https://issues.apache.org/jira/browse/LUCENE-9914 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > These are perl scripts... I don't think they had ant tasks in 8x and they > haven't been used in a while. They don't seem too scary (for perl) - just > fetch emoji unicode descriptions and parse them into a jflex macro and a test > case. > It'd be good to convert them to use python, groovy or even java so that they > fit better in the build system. Alternatively - perhaps there is a way to get > these codepoint properties from Java directly? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #71: LUCENE-9651: Make benchmarks run again, correct javadocs
dweiss commented on pull request #71: URL: https://github.com/apache/lucene/pull/71#issuecomment-815782744 Thanks Robert. I'll go through these benchmark files and correct them so that they work. It is a bit worrying that nobody noticed they're broken. :) Anybody using these at all? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9914) Modernize Emoji regeneration scripts
[ https://issues.apache.org/jira/browse/LUCENE-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317155#comment-17317155 ] Uwe Schindler commented on LUCENE-9914: --- That's really fast. The reason why I did it like that was that ICU should be no runtime dependency, so it is just extracting data and providing it to CharTokenizer as a Bits interface (backed by a sparse bitset). The script only takes milliseconds. π Maybe we can just extend the class UnicodeData to contain Emoji codepoints in a similar way and let the jflex code depend on it. Because of my bad experience with the domain name tokenizer, I tend to think that the FSA should only contain some "best guess" like unicode ranges so FSA is small. In the jflex callback the lookup of exact emoji could be done and everything which is not emoji handled back to jflex as no match. IMHO for the domain name standard tokenizer it should maybe done similar: just match anything that looks like a domain and do a separate check on possible matches. > Modernize Emoji regeneration scripts > > > Key: LUCENE-9914 > URL: https://issues.apache.org/jira/browse/LUCENE-9914 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > > These are perl scripts... I don't think they had ant tasks in 8x and they > haven't been used in a while. They don't seem too scary (for perl) - just > fetch emoji unicode descriptions and parse them into a jflex macro and a test > case. > It'd be good to convert them to use python, groovy or even java so that they > fit better in the build system. Alternatively - perhaps there is a way to get > these codepoint properties from Java directly? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9916) generateUnicodeProps doesn't work according to instructions, always SKIPPED
Robert Muir created LUCENE-9916: --- Summary: generateUnicodeProps doesn't work according to instructions, always SKIPPED Key: LUCENE-9916 URL: https://issues.apache.org/jira/browse/LUCENE-9916 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir I tried to regenerate unicode properties mentioned in LUCENE-9914 by [~uschindler] and I simply can't get it to run at all. It says in the output file: {code} // DO NOT EDIT THIS FILE! Use "gradlew generateUnicodeProps tidy" to recreate. {code} Here is what I see: {noformat} ./gradlew clean ../gradlew generateUnicodeProps tidy ... Task :lucene:analysis:common:generateUnicodeProps SKIPPED {noformat} Even if i remove the output file completely: {{rm lucene/analysis/common/src/java/org/apache/lucene/analysis/util/UnicodeProps.java}}, the task is always skipped. How to regenerate? cc [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9916) generateUnicodeProps doesn't work according to instructions, always SKIPPED
[ https://issues.apache.org/jira/browse/LUCENE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317159#comment-17317159 ] Dawid Weiss commented on LUCENE-9916: - Run with --rerun-tasks if you want to force regeneration. It should still run if you remove (or touch) one of the inputs/outputs - I'll take a look. > generateUnicodeProps doesn't work according to instructions, always SKIPPED > --- > > Key: LUCENE-9916 > URL: https://issues.apache.org/jira/browse/LUCENE-9916 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > I tried to regenerate unicode properties mentioned in LUCENE-9914 by > [~uschindler] and I simply can't get it to run at all. > It says in the output file: > {code} > // DO NOT EDIT THIS FILE! Use "gradlew generateUnicodeProps tidy" to recreate. > {code} > Here is what I see: > {noformat} > ./gradlew clean > ../gradlew generateUnicodeProps tidy > ... > Task :lucene:analysis:common:generateUnicodeProps SKIPPED > {noformat} > Even if i remove the output file completely: {{rm > lucene/analysis/common/src/java/org/apache/lucene/analysis/util/UnicodeProps.java}}, > the task is always skipped. How to regenerate? cc [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609667950 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: @rmuir instance type swap did the trick. Thanks so much! Grabbing your command from above, here's the suspicious difference (in case this helps in the future): * instance type: m4.10xlarge ``` m4.10xl% sudo journalctl -k | grep PMU Mar 24 15:02:23 localhost kernel: Performance Events: unsupported p6 CPU model 63 no PMU driver, software events only. Mar 24 15:02:23 localhost kernel: RAPL PMU: API unit is 2^-32 Joules, 0 fixed counters, 655360 ms ovfl timer ``` * instance type: m5.12xlarge ``` m5.12xl% sudo journalctl -k | grep PMU Apr 08 04:02:43 localhost kernel: Performance Events: Skylake events, Intel PMU driver. Apr 08 04:02:43 localhost kernel: RAPL PMU: API unit is 2^-32 Joules, 0 fixed counters, 10737418240 ms ovfl timer ``` I've confirmed `perfasm` is working for me. I'll get some results updated here shortly. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9872) Make the most painful tasks in regenerate fully incremental
[ https://issues.apache.org/jira/browse/LUCENE-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317169#comment-17317169 ] ASF subversion and git services commented on LUCENE-9872: - Commit 4c2384a1f352094a2f208dd354240f56e782da1d in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4c2384a ] LUCENE-9872: load input/output checksums prior to executing the target task, even if regenerate is not called. > Make the most painful tasks in regenerate fully incremental > --- > > Key: LUCENE-9872 > URL: https://issues.apache.org/jira/browse/LUCENE-9872 > Project: Lucene - Core > Issue Type: Sub-task >Affects Versions: main (9.0) >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: main (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > This is particularly important for that one jflex task that is currently > mood-killer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9916) generateUnicodeProps doesn't work according to instructions, always SKIPPED
[ https://issues.apache.org/jira/browse/LUCENE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317173#comment-17317173 ] Dawid Weiss commented on LUCENE-9916: - I corrected the checksum configuration code - it's a bug. There's, sadly, a lot of trickery involved in making these "checksums" work because gradle task dependencies are much more relaxed than ant's - they can execute out of order and there is no mechanism to "skip" a task AND its dependencies (which makes sense since it's a graph and task A's dependencies can be a non-ignored task B's dependencies...). I don't know of a simpler way to do it though. > generateUnicodeProps doesn't work according to instructions, always SKIPPED > --- > > Key: LUCENE-9916 > URL: https://issues.apache.org/jira/browse/LUCENE-9916 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > I tried to regenerate unicode properties mentioned in LUCENE-9914 by > [~uschindler] and I simply can't get it to run at all. > It says in the output file: > {code} > // DO NOT EDIT THIS FILE! Use "gradlew generateUnicodeProps tidy" to recreate. > {code} > Here is what I see: > {noformat} > ./gradlew clean > ../gradlew generateUnicodeProps tidy > ... > Task :lucene:analysis:common:generateUnicodeProps SKIPPED > {noformat} > Even if i remove the output file completely: {{rm > lucene/analysis/common/src/java/org/apache/lucene/analysis/util/UnicodeProps.java}}, > the task is always skipped. How to regenerate? cc [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9916) generateUnicodeProps doesn't work according to instructions, always SKIPPED
[ https://issues.apache.org/jira/browse/LUCENE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9916. - Resolution: Fixed > generateUnicodeProps doesn't work according to instructions, always SKIPPED > --- > > Key: LUCENE-9916 > URL: https://issues.apache.org/jira/browse/LUCENE-9916 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > I tried to regenerate unicode properties mentioned in LUCENE-9914 by > [~uschindler] and I simply can't get it to run at all. > It says in the output file: > {code} > // DO NOT EDIT THIS FILE! Use "gradlew generateUnicodeProps tidy" to recreate. > {code} > Here is what I see: > {noformat} > ./gradlew clean > ../gradlew generateUnicodeProps tidy > ... > Task :lucene:analysis:common:generateUnicodeProps SKIPPED > {noformat} > Even if i remove the output file completely: {{rm > lucene/analysis/common/src/java/org/apache/lucene/analysis/util/UnicodeProps.java}}, > the task is always skipped. How to regenerate? cc [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9916) generateUnicodeProps doesn't work according to instructions, always SKIPPED
[ https://issues.apache.org/jira/browse/LUCENE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317175#comment-17317175 ] Dawid Weiss commented on LUCENE-9916: - The "convention" to force-run any task(s) is to pass --rerun-tasks, by the way. The regeneration code does respect it. It's not a selective option (it'll really rerun everything). > generateUnicodeProps doesn't work according to instructions, always SKIPPED > --- > > Key: LUCENE-9916 > URL: https://issues.apache.org/jira/browse/LUCENE-9916 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > I tried to regenerate unicode properties mentioned in LUCENE-9914 by > [~uschindler] and I simply can't get it to run at all. > It says in the output file: > {code} > // DO NOT EDIT THIS FILE! Use "gradlew generateUnicodeProps tidy" to recreate. > {code} > Here is what I see: > {noformat} > ./gradlew clean > ../gradlew generateUnicodeProps tidy > ... > Task :lucene:analysis:common:generateUnicodeProps SKIPPED > {noformat} > Even if i remove the output file completely: {{rm > lucene/analysis/common/src/java/org/apache/lucene/analysis/util/UnicodeProps.java}}, > the task is always skipped. How to regenerate? cc [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #71: LUCENE-9651: Make benchmarks run again, correct javadocs
rmuir commented on pull request #71: URL: https://github.com/apache/lucene/pull/71#issuecomment-815809836 > Thanks Robert. I'll go through these benchmark files and correct them so that they work. It is a bit worrying that nobody noticed they're broken. :) Anybody using these at all? I've not used this mechanism of the benchmark to do any performance benchmarking: It seems most performance benchmarking from contributors/committers is using https://github.com/mikemccand/luceneutil for this, or writing ad-hoc benchmarks. Personally, I use this benchmarking package, but via QualityRun's main method, to measure relevance, and I always write my own parser (because every trec-like dataset differs oh-so-slightly and the generic TREC parser we supply never works), and I just hold it in a minimum way (generate submission.txt, then i run trec_eval etc from commandline myself). The issue why it isn't used might be the dataset, I'm unfamiliar with this reuters dataset and maybe its not big enough for useful benchmarks? I think in general people tend to use these datasets more often for performance benchmarks, often ad-hoc: * wikipedia english * geonames * apache httpd logs * NYC Taxis * OpenStreetMap Or maybe its just because perf issues are usually complicated? For example to reproduce LUCENE-9827 I downloaded geonames and wrote a simple standalone .java Indexer (attached to issue) that essentially changes IW's config (flush every doc, SerialMergeScheduler, LZ4 and DEFLATE codec compression) to keep it simple measuring using only a single thread. It ran so slow i had to limit the number of docs to the first N as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9916) generateUnicodeProps doesn't work according to instructions, always SKIPPED
[ https://issues.apache.org/jira/browse/LUCENE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317192#comment-17317192 ] Robert Muir commented on LUCENE-9916: - [~dweiss] I realize this looks like just a generic gradle feature, but I didn't know about it. Maybe a good one for help/ ? The two cases where its would have been useful to me so far is: 1. re-running a test with the same seed. 2. trying to force-regenerate content here. So maybe at least it could be a little one-liner in help/tests.txt and possibly a future help/regeneration.txt. I can followup with it > generateUnicodeProps doesn't work according to instructions, always SKIPPED > --- > > Key: LUCENE-9916 > URL: https://issues.apache.org/jira/browse/LUCENE-9916 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > I tried to regenerate unicode properties mentioned in LUCENE-9914 by > [~uschindler] and I simply can't get it to run at all. > It says in the output file: > {code} > // DO NOT EDIT THIS FILE! Use "gradlew generateUnicodeProps tidy" to recreate. > {code} > Here is what I see: > {noformat} > ./gradlew clean > ../gradlew generateUnicodeProps tidy > ... > Task :lucene:analysis:common:generateUnicodeProps SKIPPED > {noformat} > Even if i remove the output file completely: {{rm > lucene/analysis/common/src/java/org/apache/lucene/analysis/util/UnicodeProps.java}}, > the task is always skipped. How to regenerate? cc [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
rmuir commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609691366 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: @gsmiller glad to hear you are up and running! We need more eyes on this stuff, and they don't exactly make it easy! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9916) generateUnicodeProps doesn't work according to instructions, always SKIPPED
[ https://issues.apache.org/jira/browse/LUCENE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317196#comment-17317196 ] Dawid Weiss commented on LUCENE-9916: - I think tests.txt mentions the convention cleanTest to rerun with the same seed (which works). I agree --rerun-tasks is sometimes useful and I use it myself. The "cleanTaskName" convention is more convenient if you don't want to rebuild the world. Please go ahead and commit a clarification to the docs - it's better if it comes from you than me. > generateUnicodeProps doesn't work according to instructions, always SKIPPED > --- > > Key: LUCENE-9916 > URL: https://issues.apache.org/jira/browse/LUCENE-9916 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Major > > I tried to regenerate unicode properties mentioned in LUCENE-9914 by > [~uschindler] and I simply can't get it to run at all. > It says in the output file: > {code} > // DO NOT EDIT THIS FILE! Use "gradlew generateUnicodeProps tidy" to recreate. > {code} > Here is what I see: > {noformat} > ./gradlew clean > ../gradlew generateUnicodeProps tidy > ... > Task :lucene:analysis:common:generateUnicodeProps SKIPPED > {noformat} > Even if i remove the output file completely: {{rm > lucene/analysis/common/src/java/org/apache/lucene/analysis/util/UnicodeProps.java}}, > the task is always skipped. How to regenerate? cc [~dweiss] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609697557 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Here's what I've found with `perfasm` on this microbenchmark branch: 1. The `prefixSumOf` method in question [1] is _not_ auto-vectorizing. The assembly loop is below [2]. 2. If I change the implementation of `prefixSumOf` to use two loops [3], the second "add" loop is auto-vectoring in the same way that `prefixSumOfOnes` does [4], but the first "multiply" loop does not [5]. 3. Even though the second approach [3] gets partially vectorized, it's significantly less performant than the vanilla, single-loop approach [6]. [1] ``` private static void prefixSumOf(long val, long[] arr, long base) { for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] = IDENTITY_PLUS_ONE[i] * val + base; } } ``` [2] ``` 0.45%β 0x7f37dfa02c52: mov%r9,%r8 0.30%β 0x7f37dfa02c55: movabs $0xd2816340,%rdi ; {oop([J{0xd2816340})} 3.16%β 0x7f37dfa02c5f: imul 0x10(%rdi,%r10,8),%r8 1.12%β 0x7f37dfa02c65: add%rcx,%r8 1.42%β 0x7f37dfa02c68: mov%r8,0x10(%rbx,%r10,8) 2.97%β 0x7f37dfa02c6d: mov%r9,%r8 2.62%β 0x7f37dfa02c70: imul 0x18(%rdi,%r10,8),%r8 1.37%β 0x7f37dfa02c76: add%rcx,%r8 1.40%β 0x7f37dfa02c79: mov%r8,0x18(%rbx,%r10,8) 5.71%β 0x7f37dfa02c7e: mov%r9,%r8 2.02%β 0x7f37dfa02c81: imul 0x20(%rdi,%r10,8),%r8 1.08%β 0x7f37dfa02c87: add%rcx,%r8 1.91%β 0x7f37dfa02c8a: mov%r8,0x20(%rbx,%r10,8) 4.96%β 0x7f37dfa02c8f: mov%r9,%r8 1.57%β 0x7f37dfa02c92: imul 0x28(%rdi,%r10,8),%r8 0.71%β 0x7f37dfa02c98: add%rcx,%r8 0.56%β 0x7f37dfa02c9b: mov%r8,0x28(%rbx,%r10,8) ;*lastore {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@24 (line 29) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 4.79%β 0x7f37dfa02ca0: add$0x4,%r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@25 (line 28) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 1.22%β 0x7f37dfa02ca4: cmp$0x7d,%r10d β° 0x7f37dfa02ca8: jl 0x7f37dfa02c52 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ``` [3] ``` private static void prefixSumOfTwoLoops(long val, long[] arr, long base) { System.arraycopy(IDENTITY_PLUS_ONE, 0, arr, 0, ForUtil.BLOCK_SIZE); for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] *= val; } for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] += base; } } ``` [4] ``` 0.11% β0x7f1607a05810: vpaddq 0x10(%rbp,%r11,8),%ymm0,%ymm1 0.17% β0x7f16
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609699872 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: So, as of now, I think we leave the implementation as is and hope that we can do something better with more explicit vectorization support in the future. @jpountz / @rmuir does that seem right to you? If you have any suggestions on other was to try to trick this compiler, I'm happy to try them out. And I know you'll call it out if you see something off in my above analysis, since I'm so new to this :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609697557 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Here's what I've found with `perfasm` on [this microbenchmark branch](https://github.com/gsmiller/decode-128-ints-benchmark/tree/pfor-is-it-vectorizing): 1. The `prefixSumOf` method in question [1] is _not_ auto-vectorizing. The assembly loop is below [2]. 2. If I change the implementation of `prefixSumOf` to use two loops [3], the second "add" loop is auto-vectoring in the same way that `prefixSumOfOnes` does [4], but the first "multiply" loop does not [5]. 3. Even though the second approach [3] gets partially vectorized, it's significantly less performant than the vanilla, single-loop approach [6]. [1] ``` private static void prefixSumOf(long val, long[] arr, long base) { for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] = IDENTITY_PLUS_ONE[i] * val + base; } } ``` [2] ``` 0.45%β 0x7f37dfa02c52: mov%r9,%r8 0.30%β 0x7f37dfa02c55: movabs $0xd2816340,%rdi ; {oop([J{0xd2816340})} 3.16%β 0x7f37dfa02c5f: imul 0x10(%rdi,%r10,8),%r8 1.12%β 0x7f37dfa02c65: add%rcx,%r8 1.42%β 0x7f37dfa02c68: mov%r8,0x10(%rbx,%r10,8) 2.97%β 0x7f37dfa02c6d: mov%r9,%r8 2.62%β 0x7f37dfa02c70: imul 0x18(%rdi,%r10,8),%r8 1.37%β 0x7f37dfa02c76: add%rcx,%r8 1.40%β 0x7f37dfa02c79: mov%r8,0x18(%rbx,%r10,8) 5.71%β 0x7f37dfa02c7e: mov%r9,%r8 2.02%β 0x7f37dfa02c81: imul 0x20(%rdi,%r10,8),%r8 1.08%β 0x7f37dfa02c87: add%rcx,%r8 1.91%β 0x7f37dfa02c8a: mov%r8,0x20(%rbx,%r10,8) 4.96%β 0x7f37dfa02c8f: mov%r9,%r8 1.57%β 0x7f37dfa02c92: imul 0x28(%rdi,%r10,8),%r8 0.71%β 0x7f37dfa02c98: add%rcx,%r8 0.56%β 0x7f37dfa02c9b: mov%r8,0x28(%rbx,%r10,8) ;*lastore {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@24 (line 29) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 4.79%β 0x7f37dfa02ca0: add$0x4,%r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@25 (line 28) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 1.22%β 0x7f37dfa02ca4: cmp$0x7d,%r10d β° 0x7f37dfa02ca8: jl 0x7f37dfa02c52 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ``` [3] ``` private static void prefixSumOfTwoLoops(long val, long[] arr, long base) { System.arraycopy(IDENTITY_PLUS_ONE, 0, arr, 0, ForUtil.BLOCK_SIZE); for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] *= val; } for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] += base; } } ``` [4] ``` 0.11% β0x0
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609697557 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Here's what I've found with `perfasm` on [this microbenchmark branch](https://github.com/gsmiller/decode-128-ints-benchmark/tree/pfor-is-it-vectorizing): 1. The `prefixSumOf` method in question [1] is _not_ auto-vectorizing. The assembly loop is below [2]. 2. If I change the implementation of `prefixSumOf` to use two loops [3], the second "add" loop is auto-vectoring in the same way that `prefixSumOfOnes` does [4], but the first "multiply" loop does not [5]. 3. Even though the second approach [3] gets partially vectorized, it's significantly less performant than the vanilla, single-loop approach (7.1 throughput vs. 6.3) [6]. [1] ``` private static void prefixSumOf(long val, long[] arr, long base) { for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] = IDENTITY_PLUS_ONE[i] * val + base; } } ``` [2] ``` 0.45%β 0x7f37dfa02c52: mov%r9,%r8 0.30%β 0x7f37dfa02c55: movabs $0xd2816340,%rdi ; {oop([J{0xd2816340})} 3.16%β 0x7f37dfa02c5f: imul 0x10(%rdi,%r10,8),%r8 1.12%β 0x7f37dfa02c65: add%rcx,%r8 1.42%β 0x7f37dfa02c68: mov%r8,0x10(%rbx,%r10,8) 2.97%β 0x7f37dfa02c6d: mov%r9,%r8 2.62%β 0x7f37dfa02c70: imul 0x18(%rdi,%r10,8),%r8 1.37%β 0x7f37dfa02c76: add%rcx,%r8 1.40%β 0x7f37dfa02c79: mov%r8,0x18(%rbx,%r10,8) 5.71%β 0x7f37dfa02c7e: mov%r9,%r8 2.02%β 0x7f37dfa02c81: imul 0x20(%rdi,%r10,8),%r8 1.08%β 0x7f37dfa02c87: add%rcx,%r8 1.91%β 0x7f37dfa02c8a: mov%r8,0x20(%rbx,%r10,8) 4.96%β 0x7f37dfa02c8f: mov%r9,%r8 1.57%β 0x7f37dfa02c92: imul 0x28(%rdi,%r10,8),%r8 0.71%β 0x7f37dfa02c98: add%rcx,%r8 0.56%β 0x7f37dfa02c9b: mov%r8,0x28(%rbx,%r10,8) ;*lastore {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@24 (line 29) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 4.79%β 0x7f37dfa02ca0: add$0x4,%r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@25 (line 28) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 1.22%β 0x7f37dfa02ca4: cmp$0x7d,%r10d β° 0x7f37dfa02ca8: jl 0x7f37dfa02c52 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ``` [3] ``` private static void prefixSumOfTwoLoops(long val, long[] arr, long base) { System.arraycopy(IDENTITY_PLUS_ONE, 0, arr, 0, ForUtil.BLOCK_SIZE); for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] *= val; } for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] += base; } } ``` [4] ```
[GitHub] [lucene] rmuir opened a new pull request #73: LUCENE-9916: add a simple regeneration help doc
rmuir opened a new pull request #73: URL: https://github.com/apache/lucene/pull/73 This probably isn't most efficient or the best, but its a start. Some notes: * Using these steps to "force regenerate" results in local diffs. These look to be hashmap ordering differences or similar. We should fix these so that regeneration is fully idempotent? * Might not be the most efficient, for example when using `--rerun-tasks` the tidy is unnecessarily rerun even if its not necessary, which is actually quite slow. Is the `tidy` task really necessary or is it automatically/more efficiently done as some prerequisite of `regenerate`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609697557 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Here's what I've found with `perfasm` on [this microbenchmark branch](https://github.com/gsmiller/decode-128-ints-benchmark/tree/pfor-is-it-vectorizing): 1. The `prefixSumOf` method in question [1] is _not_ auto-vectorizing. The assembly loop is below [2]. 2. If I change the implementation of `prefixSumOf` to use two loops [3], the second "add" loop is auto-vectoring in the same way that `prefixSumOfOnes` does [4], but the first "multiply" loop does not [5]. 3. Even though the second approach [3] gets partially vectorized, it's significantly less performant than the vanilla, single-loop approach (7.1 throughput vs. 6.3) [6]. 4. The full output of the jmh benchmark runs with `perfasm` are here (note that the first run in each of these is `prefixSumOfOnes` as a baseline, controlled by `sameVal == 1` instead of `sameVal == 2`; `sameVal == 1` triggers the special-case handling using `prefixSumOfOnes`): [single-loop.log](https://github.com/gsmiller/decode-128-ints-benchmark/blob/pfor-is-it-vectorizing/single-loop.log), [two-loops.log](https://github.com/gsmiller/decode-128-ints-benchmark/blob/pfor-is-it-vectorizing/two-loops.log) [1] ``` private static void prefixSumOf(long val, long[] arr, long base) { for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { arr[i] = IDENTITY_PLUS_ONE[i] * val + base; } } ``` [2] ``` 0.45%β 0x7f37dfa02c52: mov%r9,%r8 0.30%β 0x7f37dfa02c55: movabs $0xd2816340,%rdi ; {oop([J{0xd2816340})} 3.16%β 0x7f37dfa02c5f: imul 0x10(%rdi,%r10,8),%r8 1.12%β 0x7f37dfa02c65: add%rcx,%r8 1.42%β 0x7f37dfa02c68: mov%r8,0x10(%rbx,%r10,8) 2.97%β 0x7f37dfa02c6d: mov%r9,%r8 2.62%β 0x7f37dfa02c70: imul 0x18(%rdi,%r10,8),%r8 1.37%β 0x7f37dfa02c76: add%rcx,%r8 1.40%β 0x7f37dfa02c79: mov%r8,0x18(%rbx,%r10,8) 5.71%β 0x7f37dfa02c7e: mov%r9,%r8 2.02%β 0x7f37dfa02c81: imul 0x20(%rdi,%r10,8),%r8 1.08%β 0x7f37dfa02c87: add%rcx,%r8 1.91%β 0x7f37dfa02c8a: mov%r8,0x20(%rbx,%r10,8) 4.96%β 0x7f37dfa02c8f: mov%r9,%r8 1.57%β 0x7f37dfa02c92: imul 0x28(%rdi,%r10,8),%r8 0.71%β 0x7f37dfa02c98: add%rcx,%r8 0.56%β 0x7f37dfa02c9b: mov%r8,0x28(%rbx,%r10,8) ;*lastore {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@24 (line 29) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 4.79%β 0x7f37dfa02ca0: add$0x4,%r10d ;*iinc {reexecute=0 rethrow=0 return_oop=0} β; - jpountz.PForDeltaDecoder::prefixSumOf@25 (line 28) β; - jpountz.PForDeltaDecoder::decodeAndPrefixSum@32 (line 59) β; - jpountz.PackedIntsDeltaDecodeBenchmark::pForDeltaDecoder@42 (line 29) β; - jpountz.generated.PackedIntsDeltaDecodeBenchmark_pForDeltaDecoder_jmhTest::pForDeltaDecoder_thrpt_jmhStub@151 (line 240) 1.22%β 0x7f37dfa02ca4: cmp$0x7d,%r10d
[GitHub] [lucene] rmuir commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
rmuir commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609724812 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: i dont know enough about the autovectorization to trick it. But in general i wonder if we should replace some instances of `ForUtil.BLOCK_SIZE` with `array.length` etc where possible to make the compiler's job easier wrt bounds checks and loop processing. arrays are being passed in as parameters, these are private static methods, so i don't know how smart it is about this today :) On the other hand, maybe it is a waste of your time and just costs a single inexpensive check up front... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
jpountz commented on pull request #69: URL: https://github.com/apache/lucene/pull/69#issuecomment-815845326 > I think we leave the implementation as is and hope that we can do something better with more explicit vectorization support in the future (Sorry replying here as Github prevents me from replying on the existing thread) +1 Let go with whichever of `arr[i] = IDENTITY_PLUS_ONE[i] * val + base` or `arr[i] = (i+1) * val + base` runs fastest in your micro benchmark. We can still improve things later if we find a way to trick the JVM into auto-vectorizing this loop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609733110 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Interesting thought @rmuir. I'll tweak this to see what kind of difference it makes, but we can't replace `ForUtil.BLOCK_SIZE` with `array.length` in the production code. The array length is actually one more than `ForUtil.BLOCK_SIZE` (as used in `Lucene90PostingsReader`). (See [L317](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsReader.java#L317) for example). It populates that 129th value with `NO_MORE_DOCS` (i.e., `MAX_INT`) as a end-of-block marker. This was the source of a very frustrating debugging effort on my part while working on this, since early on I was actually using `array.length` instead :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609733110 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Interesting thought @rmuir. I'll tweak this to see what kind of difference it makes, but we can't replace `ForUtil.BLOCK_SIZE` with `array.length` in the production code. The array length is actually one more than `ForUtil.BLOCK_SIZE` (as used in `Lucene90PostingsReader`). (See [L317](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90PostingsReader.java#L317) for example). It populates that 129th value with `NO_MORE_DOCS` (i.e., `MAX_INT`) as a end-of-block marker. This was the source of a very frustrating debugging effort on my part while working on this, since early on I was actually using `array.length` instead :) UPDATE: I tried with both `array.length - 1` (which we'd need to actually use in production) as well as `array.length` (just to see if it mattered) and didn't get any auto-vectorization. The assembly looked the same to my eye. Thanks for the suggestion though! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
rmuir commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609750704 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: ok, maybe useful for the future to look at. perhaps we could "hold" ForUtil different from postings reader and avoid this. Or maybe you could try something like `array.length & ~(BLOCK_SIZE - 1)` which is similar to what VectorSpecies.loopBound does when writing manually vectorized code. I found it was quite sensitive to this stuff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #73: LUCENE-9916: add a simple regeneration help doc
uschindler commented on pull request #73: URL: https://github.com/apache/lucene/pull/73#issuecomment-815870339 I have a question: why do we need this "tidy" at end of command line? If it is always required, it could be triffered automatically? I know this is unrelated to the documentation issue, but whenever I see any of those instructions, this puts questions in my eyes: π€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #73: LUCENE-9916: add a simple regeneration help doc
uschindler commented on a change in pull request #73: URL: https://github.com/apache/lucene/pull/73#discussion_r609763034 ## File path: help/regeneration.txt ## @@ -0,0 +1,23 @@ +Regeneration + + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +-- + +Regenerate code: + +gradlew regenerate tidy Review comment: I would indexnt those lines. Maybe use Markdown for whole help files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler edited a comment on pull request #73: LUCENE-9916: add a simple regeneration help doc
uschindler edited a comment on pull request #73: URL: https://github.com/apache/lucene/pull/73#issuecomment-815870339 I have a question: why do we need this "tidy" at end of command line? If it is always required, it could be triggered automatically? I know this is unrelated to the documentation issue, but whenever I see any of those instructions, this puts questions in my eyes: π€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #73: LUCENE-9916: add a simple regeneration help doc
rmuir commented on a change in pull request #73: URL: https://github.com/apache/lucene/pull/73#discussion_r609772375 ## File path: help/regeneration.txt ## @@ -0,0 +1,23 @@ +Regeneration + + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +-- + +Regenerate code: + +gradlew regenerate tidy Review comment: FYI I followed the style of existing help docs which do not indent, see tests.txt. I would say +1 to markdown as the current format is alien, and markdown would give good rendering on github? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9827) Small segments are slower to merge due to stored fields since 8.7
[ https://issues.apache.org/jira/browse/LUCENE-9827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317245#comment-17317245 ] ASF subversion and git services commented on LUCENE-9827: - Commit e510ef11c2a4307dd6ecc8c8974eef2c04e3e4d6 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e510ef1 ] LUCENE-9827: Propagate `numChunks` through bulk merges. > Small segments are slower to merge due to stored fields since 8.7 > - > > Key: LUCENE-9827 > URL: https://issues.apache.org/jira/browse/LUCENE-9827 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Attachments: Indexer.java, log-and-lucene-9827.patch, > merge-count-by-num-docs.png, merge-type-by-version.png, > total-merge-time-by-num-docs-on-small-segments.png, > total-merge-time-by-num-docs.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > [~dm] and [~dimitrisli] looked into an interesting case where indexing slowed > down after upgrading to 8.7. After digging we identified that this was due to > the merging of stored fields, which had become slower on average. > This is due to changes to stored fields, which now have top-level blocks that > are then split into sub-blocks and compressed using shared dictionaries (one > dictionary per top-level block). As the top-level blocks are larger than they > were before, segments are more likely to be considered "dirty" by the merging > logic. Dirty segments are segments were 1% of the data or more consists of > incomplete blocks. For large segments, the size of blocks doesn't really > affect the dirtiness of segments: if you flush a segment that has 100 blocks > or more, it will never be considered dirty as only the last block may be > incomplete. But for small segments it does: for instance if your segment is > only 10 blocks, it is very likely considered dirty given that the last block > is always incomplete. And the fact that we increased the top-level block size > means that segments that used to be considered clean might now be considered > dirty. > And indeed benchmarks reported that while large stored fields merges became > slightly faster after upgrading to 8.7, the smaller merges actually became > slower. See attached chart, which gives the total merge time as a function of > the number of documents in the segment. > I don't know how we can address this, this is a natural consequence of the > larger block size, which is needed to achieve better compression ratios. But > I wanted to open an issue about it in case someone has a bright idea how we > could make things better. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9917) Reduce block size for BEST_COMPRESSION
Adrien Grand created LUCENE-9917: Summary: Reduce block size for BEST_COMPRESSION Key: LUCENE-9917 URL: https://issues.apache.org/jira/browse/LUCENE-9917 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand As benchmarks suggested major savings and minor slowdowns with larger block sizes, I had increased them on LUCENE-9486. However it looks like this slowdown is still problematic for some users, so I plan to go back to a smaller block size, something like 10*16kB to get closer to the amount of data we had to decompress per document when we had 16kB blocks without shared dictionaries. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9913) TestCompressingTermVectorsFormat.testMergeStability can fail assertion
[ https://issues.apache.org/jira/browse/LUCENE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9913. -- Resolution: Fixed Argh, it is a bug in bulk merges due to numChunks not being propagated on the optimized merge code path. I'm glad this test caught it. I just pushed a fix. I'm not setting a fixVersion since this bug wasn't released. > TestCompressingTermVectorsFormat.testMergeStability can fail assertion > -- > > Key: LUCENE-9913 > URL: https://issues.apache.org/jira/browse/LUCENE-9913 > Project: Lucene - Core > Issue Type: Test >Reporter: Julie Tibshirani >Priority: Major > > This reproduces for me on {{main}}: > {code:java} > ./gradlew test --tests TestCompressingTermVectorsFormat.testMergeStability \ > -Dtests.seed=502C0E17C8769082 -Dtests.nightly=true \ > -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=gd-GB \ > -Dtests.timezone=Africa/Accra -Dtests.asserts=true \ > -Dtests.file.encoding=UTF-8 > {code} > Failure excerpt: > {code:java} > > java.lang.AssertionError: expected:<{tvd=33526, fnm=698, nvm=283, > tvm=164, tmd=826, fdm=158, pos=10508, fdt=1121, tvx=339, doc=13302, > tim=22354, tip=101, fdx=202, nvd=18983}> but was:<{tvd=33526, fnm=698, > nvm=283, tvm=163, tmd=826, fdm=157, pos=10508, fdt=1121, tvx=339, doc=13302, > tim=22354, tip=101, fdx=202, nvd=18983}> >> at > __randomizedtesting.SeedInfo.seed([502C0E17C8769082:24604838C59C9234]:0) >> at org.junit.Assert.fail(Assert.java:89) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #73: LUCENE-9916: add a simple regeneration help doc
dweiss commented on a change in pull request #73: URL: https://github.com/apache/lucene/pull/73#discussion_r609801705 ## File path: help/regeneration.txt ## @@ -0,0 +1,23 @@ +Regeneration + + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +-- + +Regenerate code: + +gradlew regenerate tidy Review comment: I hate those markup formats and live in txt world... Also, these files are sourced (and printed) as part of helpXXX tasks which you can invoke from gradlew. Don't know if this matters (I'm sure there is a plugin somewhere that renders them into ascii console opcodes...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #73: LUCENE-9916: add a simple regeneration help doc
dweiss commented on a change in pull request #73: URL: https://github.com/apache/lucene/pull/73#discussion_r609807318 ## File path: help/regeneration.txt ## @@ -0,0 +1,23 @@ +Regeneration + + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +-- + +Regenerate code: + +gradlew regenerate tidy + +Force-regenerate code, even when it isn't necessary: + +gradlew --rerun-tasks regenerate tidy + +Force-regenerate code, except for one tokenizer which is extremely slow: Review comment: Most regeneration tasks are incremental at the moment - they do sense if they need to run or not. There should be a big red "last resort" option in this help file because in 99% of cases this should do the job: gradlew regenerate. That's it. Skips over tasks that have the same inputs/ outputs, regenerates and tidies up everything else. I've tested it on Linux and Windows and it really does work. The trouble you fell into today was caused by the fact that you use the low-level regeneration task and regenerate has all sorts of tweaks to make those tasks incremental and clean up formatting, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #73: LUCENE-9916: add a simple regeneration help doc
dweiss commented on a change in pull request #73: URL: https://github.com/apache/lucene/pull/73#discussion_r609808709 ## File path: help/regeneration.txt ## @@ -0,0 +1,23 @@ +Regeneration + + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +-- + +Regenerate code: + +gradlew regenerate tidy + +Force-regenerate code, even when it isn't necessary: + +gradlew --rerun-tasks regenerate tidy + +Force-regenerate code, except for one tokenizer which is extremely slow: Review comment: An example of when --rerun-tasks is useful is when you tweak the code of the generation task itself (not the inputs/outputs but the task itself). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #73: LUCENE-9916: add a simple regeneration help doc
uschindler commented on a change in pull request #73: URL: https://github.com/apache/lucene/pull/73#discussion_r609817750 ## File path: help/regeneration.txt ## @@ -0,0 +1,23 @@ +Regeneration + + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +-- + +Regenerate code: + +gradlew regenerate tidy + +Force-regenerate code, even when it isn't necessary: + +gradlew --rerun-tasks regenerate tidy + +Force-regenerate code, except for one tokenizer which is extremely slow: Review comment: I figured out that gradle also rexecutes tasks if you change its source file (at least in the past this worked). I tested this at least when developing the renderJavadocs classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a change in pull request #73: LUCENE-9916: add a simple regeneration help doc
uschindler commented on a change in pull request #73: URL: https://github.com/apache/lucene/pull/73#discussion_r609819216 ## File path: help/regeneration.txt ## @@ -0,0 +1,23 @@ +Regeneration + + +Lucene makes use of some generated code (e.g. jflex tokenizers). + +Examples below assume cwd at the gradlew script in the top directory of +the project's checkout. + + +Generic regeneration commands +-- + +Regenerate code: + +gradlew regenerate tidy Review comment: Markdown is a good compromise. I just think we should use as minimal as possible, but e.g. make code parts inside `code` blocks or indent, so its blockquoted(sourceformatted automatically. I don't want full featured Markdown :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #73: LUCENE-9916: add a simple regeneration help doc
dweiss commented on pull request #73: URL: https://github.com/apache/lucene/pull/73#issuecomment-815914300 Leave this patch open, Robert. There is one more non-trivial bit (checksum saving) that I need to explain there - I'll do it once I get back home. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #2082: SOLR-15002: Upgrade HttpClient to 4.5.13
janhoy closed pull request #2082: URL: https://github.com/apache/lucene-solr/pull/2082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609945080 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: @rmuir interesting. Yeah, I'd like to explore this further, but I wonder if it makes sense to do so in a follow-up Jira? For starters, intuitively, this case seems pretty uncommon. It will only kick in when all deltas are the same value, but aren't `1`. "Dense" blocks seem like the common case for using 0 bpv, where all deltas would be `1`, and that case is definitely optimized already (`prefixSumOfOnes`). In fact, `ForDeltaUtil` doesn't even use 0 bpv for any case other than `1` (it doesn't actually store the "same value", but rather infers that it's `1` if bpv == 0). So this is already more efficient than what `ForDeltaUtil` is doing for these cases, in the sense that `ForDeltaUtil` would actually fully encode the deltas and go through the whole dane of decoding them, etc. @rmuir / @jpountz Any concern with me creating a follow-on issue to further investigate and move forward with this PR in its current state? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9917) Reduce block size for BEST_COMPRESSION
[ https://issues.apache.org/jira/browse/LUCENE-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317379#comment-17317379 ] Robert Muir commented on LUCENE-9917: - do you mean BEST_SPEED here? > Reduce block size for BEST_COMPRESSION > -- > > Key: LUCENE-9917 > URL: https://issues.apache.org/jira/browse/LUCENE-9917 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > As benchmarks suggested major savings and minor slowdowns with larger block > sizes, I had increased them on LUCENE-9486. However it looks like this > slowdown is still problematic for some users, so I plan to go back to a > smaller block size, something like 10*16kB to get closer to the amount of > data we had to decompress per document when we had 16kB blocks without shared > dictionaries. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
rmuir commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609952804 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: yes definitely followup or whatever. since you asked for suggestions i was just brainstorming... not necessary to be done here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9918) Can PForUtil be further auto-vectorized?
Greg Miller created LUCENE-9918: --- Summary: Can PForUtil be further auto-vectorized? Key: LUCENE-9918 URL: https://issues.apache.org/jira/browse/LUCENE-9918 Project: Lucene - Core Issue Type: Task Components: core/codecs Affects Versions: main (9.0) Reporter: Greg Miller While working on LUCENE-9850, we discovered the loop in PForUtil::prefixSumOf is not getting auto-vectorized by the HotSpot compiler. We tried a few different tweaks to see if we could change this, but came up empty. There are some additional suggestions in the related [PR|https://github.com/apache/lucene/pull/69#discussion_r608412309] that could still be experimented with, and it may be worth doing so to see if further improvements could be squeezed out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a change in pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on a change in pull request #69: URL: https://github.com/apache/lucene/pull/69#discussion_r609959666 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/PForUtil.java ## @@ -121,4 +167,146 @@ void skip(DataInput in) throws IOException { in.skipBytes(forUtil.numBytes(bitsPerValue) + (numExceptions << 1)); } } + + /** + * Fill {@code longs} with the final values for the case of all deltas being 1. Note this assumes + * there are no exceptions to apply. + */ + private static void prefixSumOfOnes(long[] longs, long base) { +System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); +// This loop gets auto-vectorized +for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { + longs[i] += base; +} + } + + /** + * Fill {@code longs} with the final values for the case of all deltas being {@code val}. Note + * this assumes there are no exceptions to apply. + */ + private static void prefixSumOf(long[] longs, long base, long val) { +for (int i = 0; i < ForUtil.BLOCK_SIZE; i++) { + longs[i] = (i + 1) * val + base; Review comment: Thanks @rmuir, I went ahead and created [LUCENE-9918](https://issues.apache.org/jira/browse/LUCENE-9918). I appreciate the additional suggestions! This stuff is super interesting and a bit out of my wheelhouse, so I love having more ideas to experiments with :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #73: LUCENE-9916: add a simple regeneration help doc
rmuir commented on pull request #73: URL: https://github.com/apache/lucene/pull/73#issuecomment-816036750 sure, please anyone push improvements, i just wanted to get it started. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on pull request #69: LUCENE-9850: Use PFOR encoding for doc IDs (instead of FOR)
gsmiller commented on pull request #69: URL: https://github.com/apache/lucene/pull/69#issuecomment-816040529 @jpountz > +1 Let go with whichever of `arr[i] = IDENTITY_PLUS_ONE[i] * val + base` or `arr[i] = (i+1) * val + base` runs fastest in your micro benchmark. We can still improve things later if we find a way to trick the JVM into auto-vectorizing this loop. Perfect, thanks! I'm changing this back to `(i + 1) * val + base` because it (somewhat surprisingly maybe, but I suppose this simple addition could be more efficient than an array reference) does consistently perform slightly better in microbenchmarks (`arraryRef == 0` is this implementation while `arrayRef == 1` references `IDENTITY_PLUS_ONE[i]`): ``` Benchmark(arrayRef) (bitsPerValue) (exceptionCount) (sameVal) Mode Cnt Score Error Units PackedIntsDeltaDecodeBenchmark.pForDeltaDecoder 0 0 0 2 thrpt 20 7.915 Β± 0.008 ops/us PackedIntsDeltaDecodeBenchmark.pForDeltaDecoder 1 0 0 2 thrpt 20 7.695 Β± 0.010 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #73: LUCENE-9916: add a simple regeneration help doc
dweiss commented on pull request #73: URL: https://github.com/apache/lucene/pull/73#issuecomment-816110667 I pushed a commit - sorry for being verbose. Hope this will helps you (and others) understand how I think it should work. Not every task is incremental yet (and I didn't clarify that). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on pull request #73: LUCENE-9916: add a simple regeneration help doc
rmuir commented on pull request #73: URL: https://github.com/apache/lucene/pull/73#issuecomment-816171377 super-helpful, thank you @dweiss ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9918) Can PForUtil be further auto-vectorized?
[ https://issues.apache.org/jira/browse/LUCENE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317473#comment-17317473 ] Greg Miller commented on LUCENE-9918: - I've setup a microbenchmark project over [here|https://github.com/gsmiller/lucene-pfor-benchmark]Β to help explore this more easily if anyone is interested. I'll probably mess around with this a bit, but don't let that stop you from working on it if interested :) > Can PForUtil be further auto-vectorized? > > > Key: LUCENE-9918 > URL: https://issues.apache.org/jira/browse/LUCENE-9918 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs >Affects Versions: main (9.0) >Reporter: Greg Miller >Priority: Minor > > While working on LUCENE-9850, we discovered the loop in PForUtil::prefixSumOf > is not getting auto-vectorized by the HotSpot compiler. We tried a few > different tweaks to see if we could change this, but came up empty. There are > some additional suggestions in the related > [PR|https://github.com/apache/lucene/pull/69#discussion_r608412309] that > could still be experimented with, and it may be worth doing so to see if > further improvements could be squeezed out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9918) Can PForUtil be further auto-vectorized?
[ https://issues.apache.org/jira/browse/LUCENE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317493#comment-17317493 ] Greg Miller commented on LUCENE-9918: - I think my (fairly naive) question here is mainly why the "multiplication loop" in the below code isn't able to get vectorized. Both the array copy and the "addition loop" are getting vectorized, but not the "multiplication loop." (I've put the decompiled assembly that I believe is relevant in the README in the above-referenced benchmark project). {code:java} Β protected void prefixSumOf(long[] longs, long base, long val) { System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { longs[i] *= val; } for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { longs[i] += base; } } {code} > Can PForUtil be further auto-vectorized? > > > Key: LUCENE-9918 > URL: https://issues.apache.org/jira/browse/LUCENE-9918 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs >Affects Versions: main (9.0) >Reporter: Greg Miller >Priority: Minor > > While working on LUCENE-9850, we discovered the loop in PForUtil::prefixSumOf > is not getting auto-vectorized by the HotSpot compiler. We tried a few > different tweaks to see if we could change this, but came up empty. There are > some additional suggestions in the related > [PR|https://github.com/apache/lucene/pull/69#discussion_r608412309] that > could still be experimented with, and it may be worth doing so to see if > further improvements could be squeezed out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9918) Can PForUtil be further auto-vectorized?
[ https://issues.apache.org/jira/browse/LUCENE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317494#comment-17317494 ] Greg Miller commented on LUCENE-9918: - I'll also mention that [~rcmuir]Β has some thoughts overΒ [here|https://github.com/apache/lucene/pull/69#discussion_r609750704] on some other ideas to try if anyone is interested in poking around more. > Can PForUtil be further auto-vectorized? > > > Key: LUCENE-9918 > URL: https://issues.apache.org/jira/browse/LUCENE-9918 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs >Affects Versions: main (9.0) >Reporter: Greg Miller >Priority: Minor > > While working on LUCENE-9850, we discovered the loop in PForUtil::prefixSumOf > is not getting auto-vectorized by the HotSpot compiler. We tried a few > different tweaks to see if we could change this, but came up empty. There are > some additional suggestions in the related > [PR|https://github.com/apache/lucene/pull/69#discussion_r608412309] that > could still be experimented with, and it may be worth doing so to see if > further improvements could be squeezed out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request #74: LUCENE-9705: Correct the format names in Lucene90StoredFieldsFormat
jtibshirani opened a new pull request #74: URL: https://github.com/apache/lucene/pull/74 We accidentally kept the old names when creating the new format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org