[jira] [Resolved] (LUCENE-10165) Implement Lucene90DocValuesProducer#getMergeInstance
[ https://issues.apache.org/jira/browse/LUCENE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10165. --- Fix Version/s: main (9.0) Resolution: Fixed > Implement Lucene90DocValuesProducer#getMergeInstance > > > Key: LUCENE-10165 > URL: https://issues.apache.org/jira/browse/LUCENE-10165 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > The Lucene90 doc values producer optimizes for random access so that > selective queries only have to decode the values that they need for sorting > or faceting. > However, in the case of merging, the merging process systematically consumes > all doc IDs / values sequentially, so we could optimize for this access > pattern via the merge instance? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #374: LUCENE-10165: Implement Lucene90DocValuesProducer#getMergeInstance.
jpountz commented on pull request #374: URL: https://github.com/apache/lucene/pull/374#issuecomment-948349796 I didn't observe hotspot confusion when benchmarking locally but I'll be watching nigthlies to check if they see something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432288#comment-17432288 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Mike, My codec passed all test cases with test option -Dtests.codec=MyCodec. Now i am working on luceneutil benchmark. Thanks for your reply in dev community thread! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10165) Implement Lucene90DocValuesProducer#getMergeInstance
[ https://issues.apache.org/jira/browse/LUCENE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432315#comment-17432315 ] ASF subversion and git services commented on LUCENE-10165: -- Commit 8b6c90eccd97e6ea065558c3f3da96daf74c04bf in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8b6c90e ] LUCENE-10165: Fix test failures. > Implement Lucene90DocValuesProducer#getMergeInstance > > > Key: LUCENE-10165 > URL: https://issues.apache.org/jira/browse/LUCENE-10165 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > The Lucene90 doc values producer optimizes for random access so that > selective queries only have to decode the values that they need for sorting > or faceting. > However, in the case of merging, the merging process systematically consumes > all doc IDs / values sequentially, so we could optimize for this access > pattern via the merge instance? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432444#comment-17432444 ] Michael McCandless commented on LUCENE-8739: {quote}My codec passed all test cases with test option -Dtests.codec=MyCodec. {quote} Aha, that is great news! Lucene's tests tend to stress out new Codecs. If you want to evil-up the tests, pass {{-Dtests.nightly=true}}. The tests will run longer but try harder to find problems. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #375: LUCENE-10093: first cut at fixing conflicting test assert and improving TMP javadocs
mikemccand commented on pull request #375: URL: https://github.com/apache/lucene/pull/375#issuecomment-948612756 > > gradlew clean only cleans active modules. packaging has been renamed and no longer exists. Wipe all old cruft with: > > git clean -xfd lucene > > Aha! Thermonuclear clean, I like it. I'll try that. Thanks @dweiss. That worked. Thanks @dweiss. I'll try to remember to try this next time. I'll push shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand merged pull request #375: LUCENE-10093: first cut at fixing conflicting test assert and improving TMP javadocs
mikemccand merged pull request #375: URL: https://github.com/apache/lucene/pull/375 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10093) TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure
[ https://issues.apache.org/jira/browse/LUCENE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432460#comment-17432460 ] ASF subversion and git services commented on LUCENE-10093: -- Commit e3151d6c7dea187ed99d349f1435b38b31aa6dd9 in lucene's branch refs/heads/main from Michael McCandless [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e3151d6 ] LUCENE-10093: fix conflicting test assert to match how TieredMergePolicy (TMP) works; improv TMP javadocs (#375) > TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure > - > > Key: LUCENE-10093 > URL: https://issues.apache.org/jira/browse/LUCENE-10093 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This test fails periodically in our CI builds, and the failing seed repros > for me: > {noformat} > org.apache.lucene.index.TestTieredMergePolicy > test suite's output saved to > /l/trunk/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestTieredMergePolicy.txt, > copied below: > > java.lang.AssertionError > > at > __randomizedtesting.SeedInfo.seed([7B591E657503510C:C958DC291BD5CF0A]:0) > > at org.junit.Assert.fail(Assert.java:87) > > at org.junit.Assert.assertTrue(Assert.java:42) > > at org.junit.Assert.assertTrue(Assert.java:53) > > at > org.apache.lucene.index.TestTieredMergePolicy.assertMaxSize(TestTieredMergePolicy.java:497) > > at > org.apache.lucene.index.TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges(TestTieredMergePolicy.java:454) > > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78) > > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:567) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) > > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) > > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) >
[jira] [Resolved] (LUCENE-10093) TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure
[ https://issues.apache.org/jira/browse/LUCENE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-10093. - Fix Version/s: main (9.0) Resolution: Fixed > TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure > - > > Key: LUCENE-10093 > URL: https://issues.apache.org/jira/browse/LUCENE-10093 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Fix For: main (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This test fails periodically in our CI builds, and the failing seed repros > for me: > {noformat} > org.apache.lucene.index.TestTieredMergePolicy > test suite's output saved to > /l/trunk/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestTieredMergePolicy.txt, > copied below: > > java.lang.AssertionError > > at > __randomizedtesting.SeedInfo.seed([7B591E657503510C:C958DC291BD5CF0A]:0) > > at org.junit.Assert.fail(Assert.java:87) > > at org.junit.Assert.assertTrue(Assert.java:42) > > at org.junit.Assert.assertTrue(Assert.java:53) > > at > org.apache.lucene.index.TestTieredMergePolicy.assertMaxSize(TestTieredMergePolicy.java:497) > > at > org.apache.lucene.index.TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges(TestTieredMergePolicy.java:454) > > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78) > > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:567) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) > > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) > > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > org.apache.lucene.util.TestRuleA
[jira] [Commented] (LUCENE-10093) TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure
[ https://issues.apache.org/jira/browse/LUCENE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432495#comment-17432495 ] Michael McCandless commented on LUCENE-10093: - The above ^^ fix should resolve this. > TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure > - > > Key: LUCENE-10093 > URL: https://issues.apache.org/jira/browse/LUCENE-10093 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This test fails periodically in our CI builds, and the failing seed repros > for me: > {noformat} > org.apache.lucene.index.TestTieredMergePolicy > test suite's output saved to > /l/trunk/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestTieredMergePolicy.txt, > copied below: > > java.lang.AssertionError > > at > __randomizedtesting.SeedInfo.seed([7B591E657503510C:C958DC291BD5CF0A]:0) > > at org.junit.Assert.fail(Assert.java:87) > > at org.junit.Assert.assertTrue(Assert.java:42) > > at org.junit.Assert.assertTrue(Assert.java:53) > > at > org.apache.lucene.index.TestTieredMergePolicy.assertMaxSize(TestTieredMergePolicy.java:497) > > at > org.apache.lucene.index.TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges(TestTieredMergePolicy.java:454) > > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78) > > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.base/java.lang.reflect.Method.invoke(Method.java:567) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992) > > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) > > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819) > > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887) > > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898) > > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > > at > com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > > at > org.apache.lucene.util.TestRu
[jira] [Commented] (LUCENE-10165) Implement Lucene90DocValuesProducer#getMergeInstance
[ https://issues.apache.org/jira/browse/LUCENE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432502#comment-17432502 ] Adrien Grand commented on LUCENE-10165: --- Merging times for doc values went noticeably down with this change: http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#dv_merge_times. > Implement Lucene90DocValuesProducer#getMergeInstance > > > Key: LUCENE-10165 > URL: https://issues.apache.org/jira/browse/LUCENE-10165 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > The Lucene90 doc values producer optimizes for random access so that > selective queries only have to decode the values that they need for sorting > or faceting. > However, in the case of merging, the merging process systematically consumes > all doc IDs / values sequentially, so we could optimize for this access > pattern via the merge instance? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432506#comment-17432506 ] Adrien Grand commented on LUCENE-8739: -- You might be interested in the new simple benchmark for stored fields that we added to luceneutil to compare your stored fields format against Lucene's built-in formats: https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/StoredFieldsBenchmark.java. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand merged pull request #2573: LUCENE-10008: Respect ignoreCase flag in CommonGramsFilterFactory
mikemccand merged pull request #2573: URL: https://github.com/apache/lucene-solr/pull/2573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10008) CommonGramsFilterFactory doesn't respect ignoreCase=true when default stopwords are used
[ https://issues.apache.org/jira/browse/LUCENE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-10008. - Fix Version/s: 8.11 main (9.0) Resolution: Fixed Thanks [~vigyas]! > CommonGramsFilterFactory doesn't respect ignoreCase=true when default > stopwords are used > > > Key: LUCENE-10008 > URL: https://issues.apache.org/jira/browse/LUCENE-10008 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Priority: Major > Fix For: main (9.0), 8.11 > > Time Spent: 40m > Remaining Estimate: 0h > > CommonGramsFilterFactory's use of the "words" and "ignoreCase" config options > is inconsistent with how StopFilterFactory uses them - leading to > "ignoreCase=true" not being respected unless "words" is specified... > StopFilterFactory... > {code:java} > public void inform(ResourceLoader loader) throws IOException { > if (stopWordFiles != null) { > ... > } else { > ... > stopWords = new CharArraySet(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET, > ignoreCase); > } > } > {code} > CommonGramsFilterFactory... > {code:java} > @Override > public void inform(ResourceLoader loader) throws IOException { > if (commonWordFiles != null) { > ... > } else { > commonWords = EnglishAnalyzer.ENGLISH_STOP_WORDS_SET; > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10008) CommonGramsFilterFactory doesn't respect ignoreCase=true when default stopwords are used
[ https://issues.apache.org/jira/browse/LUCENE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432518#comment-17432518 ] ASF subversion and git services commented on LUCENE-10008: -- Commit 641ac0b36a9257db3a6d2f9d12f422cfe5fddbc3 in lucene-solr's branch refs/heads/branch_8x from Vigya Sharma [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=641ac0b ] LUCENE-10008: Respect ignoreCase flag in CommonGramsFilterFactory (#2573) > CommonGramsFilterFactory doesn't respect ignoreCase=true when default > stopwords are used > > > Key: LUCENE-10008 > URL: https://issues.apache.org/jira/browse/LUCENE-10008 > Project: Lucene - Core > Issue Type: Bug >Reporter: Chris M. Hostetter >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > CommonGramsFilterFactory's use of the "words" and "ignoreCase" config options > is inconsistent with how StopFilterFactory uses them - leading to > "ignoreCase=true" not being respected unless "words" is specified... > StopFilterFactory... > {code:java} > public void inform(ResourceLoader loader) throws IOException { > if (stopWordFiles != null) { > ... > } else { > ... > stopWords = new CharArraySet(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET, > ignoreCase); > } > } > {code} > CommonGramsFilterFactory... > {code:java} > @Override > public void inform(ResourceLoader loader) throws IOException { > if (commonWordFiles != null) { > ... > } else { > commonWords = EnglishAnalyzer.ENGLISH_STOP_WORDS_SET; > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on pull request #1435: SOLR-14410: Switch from SysV init script to systemd service file
janhoy commented on pull request #1435: URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-948682201 @andreasbolstad Solr 11.0 is not far away. If you have the bandwidth to give this PR a spin on main branch and verify that it works ok, then I'll make sure it is merged in time for 8.11 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432589#comment-17432589 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Mike, -Dtests.nightly=true ran successfully , took more than an hour to complete! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432590#comment-17432590 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Adrien, Can you please help me by stating the way to compare my stored fields format against Lucene's built-in formats? Thanks! > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
Bruno Roustant created LUCENE-10196: --- Summary: Improve IntroSorter with 3-ways partitioning Key: LUCENE-10196 URL: https://issues.apache.org/jira/browse/LUCENE-10196 Project: Lucene - Core Issue Type: Improvement Reporter: Bruno Roustant I added a SorterBenchmark to evaluate the performance of the various Sorter implementations depending on the strategies defined in BaseSortTestCase (random, random-low-cardinality, ascending, descending, etc). By changing the implementation of the IntroSorter to use a 3-ways partitioning, we can gain a significant performance improvement when sorting low-cardinality lists, and we additional changes we can also improve the performance for all the strategies. Proposed changes: - Sort small ranges with insertion sort (instead of binary sort). - Select the quick sort pivot with medians. - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432670#comment-17432670 ] Bruno Roustant commented on LUCENE-10196: - Benchmark run to compare sorters implementations for different shapes of data: Each value is the time to complete a run. The values for the same column can be compared because the same data is provided as input to the various sorters. Each column has different random data. IntroSorter2 is the new modified version of IntroSorter. The benchmark runs with 20K comparable entries. Comparing IntroSorter and IntroSorter2, we mainly observe a x5 speed for random low cardinality, and an improvement for each data shape. {noformat} RANDOM IntroSorter ... 445 445 459 458 453 458 460 465 452 451 IntroSorter2 ... 394 403 401 400 401 398 400 404 396 399 TimSorter... 1196 1203 1197 1206 1193 1195 1193 1204 1230 1207 MergeSorter ... 1462 1470 1482 1466 1463 1475 1478 1475 1466 1471 RANDOM_LOW_CARDINALITY IntroSorter ... 505 513 504 490 527 499 510 512 509 525 IntroSorter2 ... 89 84 88 88 86 90 88 89 92 88 TimSorter... 511 513 508 508 513 512 521 511 524 516 MergeSorter ... 725 725 725 762 737 723 727 724 736 733 RANDOM_MEDIUM_CARDINALITY IntroSorter ... 463 451 452 455 448 452 451 459 458 455 IntroSorter2 ... 370 381 378 373 375 376 376 372 370 370 TimSorter... 1192 1212 1197 1196 1201 1202 1196 1199 1196 1204 MergeSorter ... 1493 1465 1470 1480 1460 1470 1483 1464 1506 1500 ASCENDING IntroSorter ... 211 205 215 213 207 206 208 214 212 211 IntroSorter2 ... 191 188 190 193 194 191 188 187 185 188 TimSorter... 17 18 18 18 19 19 18 17 18 19 MergeSorter ... 73 71 72 75 72 73 73 77 72 71 DESCENDING IntroSorter ... 225 253 229 220 225 231 222 217 220 223 IntroSorter2 ... 220 213 214 220 205 211 208 210 208 212 TimSorter... 545 576 562 553 543 551 552 552 548 546 MergeSorter ... 537 537 548 538 537 536 533 530 533 545 STRICTLY_DESCENDING IntroSorter ... 215 214 221 224 218 227 213 212 212 211 IntroSorter2 ... 202 203 202 205 202 204 206 204 202 204 TimSorter... 22 21 21 22 22 21 21 22 22 23 MergeSorter ... 534 531 533 527 531 529 526 527 528 527 ASCENDING_SEQUENCES IntroSorter ... 370 366 361 376 367 369 358 364 379 376 IntroSorter2 ... 234 235 231 236 234 245 242 239 239 236 TimSorter... 686 679 745 673 694 685 673 719 682 685 MergeSorter ... 894 911 932 907 923 907 918 917 920 916 MOSTLY_ASCENDING IntroSorter ... 284 282 282 283 285 282 278 284 283 287 IntroSorter2 ... 254 252 249 250 255 255 249 250 252 251 TimSorter... 233 233 230 235 232 234 234 233 228 238 MergeSorter ... 399 385 390 398 398 392 380 377 377 387 {noformat} > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and we additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432677#comment-17432677 ] Adrien Grand commented on LUCENE-8739: -- You need to download https://download.geonames.org/export/dump/allCountries.zip, unzip it and then use it to run the above benchmark which is a simple standalone Java class with a main class. To run it with your own codec, you will need to modify the code a bit to use it rather than Lucene's default codec. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Time Spent: 1h > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432685#comment-17432685 ] Bruno Roustant commented on LUCENE-10196: - https://github.com/apache/lucene/pull/404 > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and we additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
dweiss commented on a change in pull request #404: URL: https://github.com/apache/lucene/pull/404#discussion_r734017645 ## File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java ## @@ -35,51 +38,102 @@ public final void sort(int from, int to) { quicksort(from, to, 2 * MathUtil.log(to - from, 2)); } + /** + * Sorts between from (inclusive) and to (exclusive) with intro sort. Review comment: "with intro sort"? Is this accurate here? ## File path: lucene/core/src/java/org/apache/lucene/util/Sorter.java ## @@ -216,6 +219,25 @@ void binarySort(int from, int to, int i) { } } + /** + * Sorts between from (inclusive) and to (exclusive) with insertion sort. Runs in {@code O(n^2)}. + * It is typically used by more sophisticated implementations as a fall-back when the numbers of Review comment: when the number (not numbers) ## File path: lucene/core/src/test/org/apache/lucene/util/SorterBenchmark.java ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.util.Random; +import org.apache.lucene.util.BaseSortTestCase.Entry; +import org.apache.lucene.util.BaseSortTestCase.Strategy; + +/** + * Benchmark for {@link Sorter} implementations. + * + * Run the static {@link #main(String[])} method to start the benchmark. + */ +public class SorterBenchmark { + + private static final int ARRAY_LENGTH = 2; + private static final int RUNS = 10; + private static final int LOOPS = 100; + + private enum SorterFactory { +INTRO_SORTER( +"IntroSorter", +(arr, s) -> { + return new ArrayIntroSorter<>(arr, Entry::compareTo); +}), +TIM_SORTER( +"TimSorter", +(arr, s) -> { + return new ArrayTimSorter<>(arr, Entry::compareTo, arr.length / 64); +}), +MERGE_SORTER( +"MergeSorter", +(arr, s) -> { + return new ArrayInPlaceMergeSorter<>(arr, Entry::compareTo); +}), +; +final String name; +final Builder builder; + +SorterFactory(String name, Builder builder) { + this.name = name; + this.builder = builder; +} + +interface Builder { + Sorter build(Entry[] arr, Strategy strategy); +} + } + + public static void main(String[] args) throws Exception { Review comment: You could convert it to a test and give the test an assumption on some property (or just an Ignore). Then you'd have a seed-reproducible-benchmark. :) This stuff fits JMH nicely but I understand why you didn't want to roll out the big guns here. ## File path: lucene/core/src/test/org/apache/lucene/util/SorterBenchmark.java ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.util.Random; +import org.apache.lucene.util.BaseSortTestCase.Entry; +import org.apache.lucene.util.BaseSortTestCase.Strategy; + +/** + * Benchmark for {@link Sorter} implementations. + * + * Run the static {@link #main(String[])} method to start the benchmark. + */ +public class SorterBenchmark { + + private static final int ARRAY_LENGTH = 2; + private static final int RUNS = 10; + private static final int LOOPS = 100; + + private enum SorterFactory { +INTRO_SORTER( +"IntroSorter", +(arr, s) -> { + return new ArrayIntroSorter<>(arr, Entry::c
[jira] [Issue Comment Deleted] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant updated LUCENE-10196: Comment: was deleted (was: https://github.com/apache/lucene/pull/404) > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and we additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning
[ https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant updated LUCENE-10196: Description: I added a SorterBenchmark to evaluate the performance of the various Sorter implementations depending on the strategies defined in BaseSortTestCase (random, random-low-cardinality, ascending, descending, etc). By changing the implementation of the IntroSorter to use a 3-ways partitioning, we can gain a significant performance improvement when sorting low-cardinality lists, and with additional changes we can also improve the performance for all the strategies. Proposed changes: - Sort small ranges with insertion sort (instead of binary sort). - Select the quick sort pivot with medians. - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. - Replace the tail recursion by a loop. was: I added a SorterBenchmark to evaluate the performance of the various Sorter implementations depending on the strategies defined in BaseSortTestCase (random, random-low-cardinality, ascending, descending, etc). By changing the implementation of the IntroSorter to use a 3-ways partitioning, we can gain a significant performance improvement when sorting low-cardinality lists, and we additional changes we can also improve the performance for all the strategies. Proposed changes: - Sort small ranges with insertion sort (instead of binary sort). - Select the quick sort pivot with medians. - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. - Replace the tail recursion by a loop. > Improve IntroSorter with 3-ways partitioning > > > Key: LUCENE-10196 > URL: https://issues.apache.org/jira/browse/LUCENE-10196 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I added a SorterBenchmark to evaluate the performance of the various Sorter > implementations depending on the strategies defined in BaseSortTestCase > (random, random-low-cardinality, ascending, descending, etc). > By changing the implementation of the IntroSorter to use a 3-ways > partitioning, we can gain a significant performance improvement when sorting > low-cardinality lists, and with additional changes we can also improve the > performance for all the strategies. > Proposed changes: > - Sort small ranges with insertion sort (instead of binary sort). > - Select the quick sort pivot with medians. > - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm. > - Replace the tail recursion by a loop. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] bruno-roustant commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
bruno-roustant commented on a change in pull request #404: URL: https://github.com/apache/lucene/pull/404#discussion_r734035237 ## File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java ## @@ -35,51 +38,102 @@ public final void sort(int from, int to) { quicksort(from, to, 2 * MathUtil.log(to - from, 2)); } + /** + * Sorts between from (inclusive) and to (exclusive) with intro sort. Review comment: I think yes, because we still fallback to heap sort if the recursive stack goes too large. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] bruno-roustant commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
bruno-roustant commented on a change in pull request #404: URL: https://github.com/apache/lucene/pull/404#discussion_r734039059 ## File path: lucene/core/src/test/org/apache/lucene/util/SorterBenchmark.java ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.util.Random; +import org.apache.lucene.util.BaseSortTestCase.Entry; +import org.apache.lucene.util.BaseSortTestCase.Strategy; + +/** + * Benchmark for {@link Sorter} implementations. + * + * Run the static {@link #main(String[])} method to start the benchmark. + */ +public class SorterBenchmark { + + private static final int ARRAY_LENGTH = 2; + private static final int RUNS = 10; + private static final int LOOPS = 100; + + private enum SorterFactory { +INTRO_SORTER( +"IntroSorter", +(arr, s) -> { + return new ArrayIntroSorter<>(arr, Entry::compareTo); +}), +TIM_SORTER( +"TimSorter", +(arr, s) -> { + return new ArrayTimSorter<>(arr, Entry::compareTo, arr.length / 64); +}), +MERGE_SORTER( +"MergeSorter", +(arr, s) -> { + return new ArrayInPlaceMergeSorter<>(arr, Entry::compareTo); +}), +; +final String name; +final Builder builder; + +SorterFactory(String name, Builder builder) { + this.name = name; + this.builder = builder; +} + +interface Builder { + Sorter build(Entry[] arr, Strategy strategy); +} + } + + public static void main(String[] args) throws Exception { Review comment: Can we disable the assertions when running a test? (yes I hesitated to go with JMH but indeed I kept it simple, and based on the many runs I saw, the ratio between the sorters is reproducible) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
dweiss commented on a change in pull request #404: URL: https://github.com/apache/lucene/pull/404#discussion_r734046039 ## File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java ## @@ -35,51 +38,102 @@ public final void sort(int from, int to) { quicksort(from, to, 2 * MathUtil.log(to - from, 2)); } + /** + * Sorts between from (inclusive) and to (exclusive) with intro sort. Review comment: Ok. The quicksort method name is sort of confusing, but I see it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] bruno-roustant commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.
bruno-roustant commented on a change in pull request #404: URL: https://github.com/apache/lucene/pull/404#discussion_r734048487 ## File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java ## @@ -35,51 +38,102 @@ public final void sort(int from, int to) { quicksort(from, to, 2 * MathUtil.log(to - from, 2)); } + /** + * Sorts between from (inclusive) and to (exclusive) with intro sort. Review comment: I'll rename the method simply "sort". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default
dsmiley commented on pull request #362: URL: https://github.com/apache/lucene/pull/362#issuecomment-949243034 I added your changes but made 3 edits: * Removed your change to the randomized highlighter configuration. It was working before; didn't need anything. Thus we want to continue to test with WEIGHT_MATCHES being off, even when the other settings allow for it to be enabled. * Thanks to the test, which failed, I realized the boolean for checking PASSAGE_RELEVANCY_OVER_SPEED was inverted. * Enhanced the added test to check how many enum values there are so that if we change these enums, we intentionally revisit the default assertions. This looks ready to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default
apanimesh061 commented on a change in pull request #362: URL: https://github.com/apache/lucene/pull/362#discussion_r734200431 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() { /** * Internally use the {@link Weight#matches(LeafReaderContext, int)} API for highlighting. It's - * more accurate to the query, though might not calculate passage relevancy as well. Use of this - * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link - * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default. + * more accurate to the query, and the snippets can be a little different for phrases because + * the whole phrase is marked up instead of each word. The passage relevancy calculation can be + * different (maybe worse?) and it's slower when highlighting many fields. Use of this flag + * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link + * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so long as the requirements Review comment: > I added your changes but made 3 edits: > > * Removed your change to the randomized highlighter configuration. It was working before; didn't need anything. Thus we want to continue to test with WEIGHT_MATCHES being off, even when the other settings allow for it to be enabled. > > * Thanks to the test, which failed, I realized the boolean for checking PASSAGE_RELEVANCY_OVER_SPEED was inverted. > > * Enhanced the added test to check how many enum values there are so that if we change these enums, we intentionally revisit the default assertions. > > > This looks ready to me. @dsmiley Thanks a lot for fixing that. I was not sure if the tests were supposed to fail. Just for clarification, maybe I misunderstood your earlier comments. My understanding was that WEIGHT_MATCHES should be enabled when MULTI_TERM_QUERY and PHRASES are enabled and it does not matter if PASSAGE_RELEVANCY_OVER_SPEED is enabled. Based on your modification, it looks like all 3 should be enabled for the WEIGHT_MATCHES to be enabled? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default
dsmiley commented on a change in pull request #362: URL: https://github.com/apache/lucene/pull/362#discussion_r734203232 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() { /** * Internally use the {@link Weight#matches(LeafReaderContext, int)} API for highlighting. It's - * more accurate to the query, though might not calculate passage relevancy as well. Use of this - * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link - * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default. + * more accurate to the query, and the snippets can be a little different for phrases because + * the whole phrase is marked up instead of each word. The passage relevancy calculation can be + * different (maybe worse?) and it's slower when highlighting many fields. Use of this flag + * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link + * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so long as the requirements Review comment: I may have misstated it. I improved the JavaDocs just now for clarity. What's confusing is that it's theoretically possible to subclass and return WEIGHT_MATCHES without some of the other flags, and so the JavaDocs were saying basically that if you do that, then PASSAGE_RELEVANCY_OVER_SPEED will be ignored. But I think it's clearer to speak of the 3 requirements in the same way. And perhaps PASSAGE_RELEVANCY_OVER_SPEED should be an internal/expert option that is more of a hold-over from earlier times which has questionable value to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default
apanimesh061 commented on a change in pull request #362: URL: https://github.com/apache/lucene/pull/362#discussion_r734204016 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() { /** * Internally use the {@link Weight#matches(LeafReaderContext, int)} API for highlighting. It's - * more accurate to the query, though might not calculate passage relevancy as well. Use of this - * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link - * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default. + * more accurate to the query, and the snippets can be a little different for phrases because + * the whole phrase is marked up instead of each word. The passage relevancy calculation can be + * different (maybe worse?) and it's slower when highlighting many fields. Use of this flag + * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link + * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so long as the requirements Review comment: Okay great. I understand this now. Thanks. On a separate note, do we need to create a task for replacing the setter with a builder for UnifiedHighlighter class? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dsmiley commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default
dsmiley commented on a change in pull request #362: URL: https://github.com/apache/lucene/pull/362#discussion_r734210117 ## File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java ## @@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() { /** * Internally use the {@link Weight#matches(LeafReaderContext, int)} API for highlighting. It's - * more accurate to the query, though might not calculate passage relevancy as well. Use of this - * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link - * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default. + * more accurate to the query, and the snippets can be a little different for phrases because + * the whole phrase is marked up instead of each word. The passage relevancy calculation can be + * different (maybe worse?) and it's slower when highlighting many fields. Use of this flag + * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link + * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so long as the requirements Review comment: Yeah, that'd be a new JIRA issue; you're welcome to do so. I'm busy but happy to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org