[jira] [Resolved] (LUCENE-10165) Implement Lucene90DocValuesProducer#getMergeInstance

2021-10-21 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10165.
---
Fix Version/s: main (9.0)
   Resolution: Fixed

> Implement Lucene90DocValuesProducer#getMergeInstance
> 
>
> Key: LUCENE-10165
> URL: https://issues.apache.org/jira/browse/LUCENE-10165
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Lucene90 doc values producer optimizes for random access so that 
> selective queries only have to decode the values that they need for sorting 
> or faceting.
> However, in the case of merging, the merging process systematically consumes 
> all doc IDs / values sequentially, so we could optimize for this access 
> pattern via the merge instance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #374: LUCENE-10165: Implement Lucene90DocValuesProducer#getMergeInstance.

2021-10-21 Thread GitBox


jpountz commented on pull request #374:
URL: https://github.com/apache/lucene/pull/374#issuecomment-948349796


   I didn't observe hotspot confusion when benchmarking locally but I'll be 
watching nigthlies to check if they see something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432288#comment-17432288
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Mike,

My codec passed all test cases with test option -Dtests.codec=MyCodec.

Now i am working on luceneutil benchmark. Thanks for your reply in dev 
community thread!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10165) Implement Lucene90DocValuesProducer#getMergeInstance

2021-10-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432315#comment-17432315
 ] 

ASF subversion and git services commented on LUCENE-10165:
--

Commit 8b6c90eccd97e6ea065558c3f3da96daf74c04bf in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8b6c90e ]

LUCENE-10165: Fix test failures.


> Implement Lucene90DocValuesProducer#getMergeInstance
> 
>
> Key: LUCENE-10165
> URL: https://issues.apache.org/jira/browse/LUCENE-10165
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Lucene90 doc values producer optimizes for random access so that 
> selective queries only have to decode the values that they need for sorting 
> or faceting.
> However, in the case of merging, the merging process systematically consumes 
> all doc IDs / values sequentially, so we could optimize for this access 
> pattern via the merge instance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432444#comment-17432444
 ] 

Michael McCandless commented on LUCENE-8739:


{quote}My codec passed all test cases with test option -Dtests.codec=MyCodec.
{quote}
Aha, that is great news!  Lucene's tests tend to stress out new Codecs.  If you 
want to evil-up the tests, pass {{-Dtests.nightly=true}}.  The tests will run 
longer but try harder to find problems.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #375: LUCENE-10093: first cut at fixing conflicting test assert and improving TMP javadocs

2021-10-21 Thread GitBox


mikemccand commented on pull request #375:
URL: https://github.com/apache/lucene/pull/375#issuecomment-948612756


   > > gradlew clean only cleans active modules. packaging has been renamed and 
no longer exists. Wipe all old cruft with:
   > > git clean -xfd lucene
   > 
   > Aha! Thermonuclear clean, I like it. I'll try that. Thanks @dweiss.
   
   That worked.  Thanks @dweiss.  I'll try to remember to try this next time.
   
   I'll push shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand merged pull request #375: LUCENE-10093: first cut at fixing conflicting test assert and improving TMP javadocs

2021-10-21 Thread GitBox


mikemccand merged pull request #375:
URL: https://github.com/apache/lucene/pull/375


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10093) TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure

2021-10-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432460#comment-17432460
 ] 

ASF subversion and git services commented on LUCENE-10093:
--

Commit e3151d6c7dea187ed99d349f1435b38b31aa6dd9 in lucene's branch 
refs/heads/main from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e3151d6 ]

LUCENE-10093: fix conflicting test assert to match how TieredMergePolicy (TMP) 
works; improv TMP javadocs (#375)



> TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure
> -
>
> Key: LUCENE-10093
> URL: https://issues.apache.org/jira/browse/LUCENE-10093
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This test fails periodically in our CI builds, and the failing seed repros 
> for me:
> {noformat}
> org.apache.lucene.index.TestTieredMergePolicy > test suite's output saved to 
> /l/trunk/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestTieredMergePolicy.txt,
>  copied below:
>    >     java.lang.AssertionError
>    >         at 
> __randomizedtesting.SeedInfo.seed([7B591E657503510C:C958DC291BD5CF0A]:0)
>    >         at org.junit.Assert.fail(Assert.java:87)
>    >         at org.junit.Assert.assertTrue(Assert.java:42)
>    >         at org.junit.Assert.assertTrue(Assert.java:53)
>    >         at 
> org.apache.lucene.index.TestTieredMergePolicy.assertMaxSize(TestTieredMergePolicy.java:497)
>    >         at 
> org.apache.lucene.index.TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges(TestTieredMergePolicy.java:454)
>    >         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    >         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
>    >         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    >         at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>    >         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>    >         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>    >         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>    >         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>    >         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>    >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>    >         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> 

[jira] [Resolved] (LUCENE-10093) TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure

2021-10-21 Thread Michael McCandless (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-10093.
-
Fix Version/s: main (9.0)
   Resolution: Fixed

> TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure
> -
>
> Key: LUCENE-10093
> URL: https://issues.apache.org/jira/browse/LUCENE-10093
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This test fails periodically in our CI builds, and the failing seed repros 
> for me:
> {noformat}
> org.apache.lucene.index.TestTieredMergePolicy > test suite's output saved to 
> /l/trunk/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestTieredMergePolicy.txt,
>  copied below:
>    >     java.lang.AssertionError
>    >         at 
> __randomizedtesting.SeedInfo.seed([7B591E657503510C:C958DC291BD5CF0A]:0)
>    >         at org.junit.Assert.fail(Assert.java:87)
>    >         at org.junit.Assert.assertTrue(Assert.java:42)
>    >         at org.junit.Assert.assertTrue(Assert.java:53)
>    >         at 
> org.apache.lucene.index.TestTieredMergePolicy.assertMaxSize(TestTieredMergePolicy.java:497)
>    >         at 
> org.apache.lucene.index.TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges(TestTieredMergePolicy.java:454)
>    >         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    >         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
>    >         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    >         at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>    >         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>    >         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>    >         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>    >         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>    >         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>    >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>    >         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> org.apache.lucene.util.TestRuleA

[jira] [Commented] (LUCENE-10093) TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure

2021-10-21 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432495#comment-17432495
 ] 

Michael McCandless commented on LUCENE-10093:
-

The above ^^ fix should resolve this.

> TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges test failure
> -
>
> Key: LUCENE-10093
> URL: https://issues.apache.org/jira/browse/LUCENE-10093
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This test fails periodically in our CI builds, and the failing seed repros 
> for me:
> {noformat}
> org.apache.lucene.index.TestTieredMergePolicy > test suite's output saved to 
> /l/trunk/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestTieredMergePolicy.txt,
>  copied below:
>    >     java.lang.AssertionError
>    >         at 
> __randomizedtesting.SeedInfo.seed([7B591E657503510C:C958DC291BD5CF0A]:0)
>    >         at org.junit.Assert.fail(Assert.java:87)
>    >         at org.junit.Assert.assertTrue(Assert.java:42)
>    >         at org.junit.Assert.assertTrue(Assert.java:53)
>    >         at 
> org.apache.lucene.index.TestTieredMergePolicy.assertMaxSize(TestTieredMergePolicy.java:497)
>    >         at 
> org.apache.lucene.index.TestTieredMergePolicy.testForcedMergesUseLeastNumberOfMerges(TestTieredMergePolicy.java:454)
>    >         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    >         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
>    >         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    >         at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>    >         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>    >         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>    >         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>    >         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>    >         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>    >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>    >         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>    >         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>    >         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    >         at 
> org.apache.lucene.util.TestRu

[jira] [Commented] (LUCENE-10165) Implement Lucene90DocValuesProducer#getMergeInstance

2021-10-21 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432502#comment-17432502
 ] 

Adrien Grand commented on LUCENE-10165:
---

Merging times for doc values went noticeably down with this change: 
http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#dv_merge_times.

> Implement Lucene90DocValuesProducer#getMergeInstance
> 
>
> Key: LUCENE-10165
> URL: https://issues.apache.org/jira/browse/LUCENE-10165
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Lucene90 doc values producer optimizes for random access so that 
> selective queries only have to decode the values that they need for sorting 
> or faceting.
> However, in the case of merging, the merging process systematically consumes 
> all doc IDs / values sequentially, so we could optimize for this access 
> pattern via the merge instance?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432506#comment-17432506
 ] 

Adrien Grand commented on LUCENE-8739:
--

You might be interested in the new simple benchmark for stored fields that we 
added to luceneutil to compare your stored fields format against Lucene's 
built-in formats: 
https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/StoredFieldsBenchmark.java.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand merged pull request #2573: LUCENE-10008: Respect ignoreCase flag in CommonGramsFilterFactory

2021-10-21 Thread GitBox


mikemccand merged pull request #2573:
URL: https://github.com/apache/lucene-solr/pull/2573


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10008) CommonGramsFilterFactory doesn't respect ignoreCase=true when default stopwords are used

2021-10-21 Thread Michael McCandless (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-10008.
-
Fix Version/s: 8.11
   main (9.0)
   Resolution: Fixed

Thanks [~vigyas]!

> CommonGramsFilterFactory doesn't respect ignoreCase=true when default 
> stopwords are used
> 
>
> Key: LUCENE-10008
> URL: https://issues.apache.org/jira/browse/LUCENE-10008
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
> Fix For: main (9.0), 8.11
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> CommonGramsFilterFactory's use of the "words" and "ignoreCase" config options 
> is inconsistent with how StopFilterFactory uses them - leading to 
> "ignoreCase=true" not being respected unless "words" is specified...
> StopFilterFactory...
> {code:java}
>   public void inform(ResourceLoader loader) throws IOException {
> if (stopWordFiles != null) {
>   ...
> } else {
>   ...
>   stopWords = new CharArraySet(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET, 
> ignoreCase);
> }
>   }
> {code}
> CommonGramsFilterFactory...
> {code:java}
>   @Override
>   public void inform(ResourceLoader loader) throws IOException {
> if (commonWordFiles != null) {
>   ...
> } else {
>   commonWords = EnglishAnalyzer.ENGLISH_STOP_WORDS_SET;
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10008) CommonGramsFilterFactory doesn't respect ignoreCase=true when default stopwords are used

2021-10-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432518#comment-17432518
 ] 

ASF subversion and git services commented on LUCENE-10008:
--

Commit 641ac0b36a9257db3a6d2f9d12f422cfe5fddbc3 in lucene-solr's branch 
refs/heads/branch_8x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=641ac0b ]

LUCENE-10008: Respect ignoreCase flag in CommonGramsFilterFactory (#2573)



> CommonGramsFilterFactory doesn't respect ignoreCase=true when default 
> stopwords are used
> 
>
> Key: LUCENE-10008
> URL: https://issues.apache.org/jira/browse/LUCENE-10008
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> CommonGramsFilterFactory's use of the "words" and "ignoreCase" config options 
> is inconsistent with how StopFilterFactory uses them - leading to 
> "ignoreCase=true" not being respected unless "words" is specified...
> StopFilterFactory...
> {code:java}
>   public void inform(ResourceLoader loader) throws IOException {
> if (stopWordFiles != null) {
>   ...
> } else {
>   ...
>   stopWords = new CharArraySet(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET, 
> ignoreCase);
> }
>   }
> {code}
> CommonGramsFilterFactory...
> {code:java}
>   @Override
>   public void inform(ResourceLoader loader) throws IOException {
> if (commonWordFiles != null) {
>   ...
> } else {
>   commonWords = EnglishAnalyzer.ENGLISH_STOP_WORDS_SET;
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on pull request #1435: SOLR-14410: Switch from SysV init script to systemd service file

2021-10-21 Thread GitBox


janhoy commented on pull request #1435:
URL: https://github.com/apache/lucene-solr/pull/1435#issuecomment-948682201


   @andreasbolstad Solr 11.0 is not far away. If you have the bandwidth to give 
this PR a spin on main branch and verify that it works ok, then I'll make sure 
it is merged in time for 8.11 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432589#comment-17432589
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Mike,

-Dtests.nightly=true ran successfully , took more than an hour to complete!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Praveen Nishchal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432590#comment-17432590
 ] 

Praveen Nishchal commented on LUCENE-8739:
--

Hi Adrien,

Can you please help me by stating the way to compare my stored fields format 
against Lucene's built-in formats?

Thanks!

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-10-21 Thread Bruno Roustant (Jira)
Bruno Roustant created LUCENE-10196:
---

 Summary: Improve IntroSorter with 3-ways partitioning
 Key: LUCENE-10196
 URL: https://issues.apache.org/jira/browse/LUCENE-10196
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Bruno Roustant


I added a SorterBenchmark to evaluate the performance of the various Sorter 
implementations depending on the strategies defined in BaseSortTestCase 
(random, random-low-cardinality, ascending, descending, etc).

By changing the implementation of the IntroSorter to use a 3-ways partitioning, 
we can gain a significant performance improvement when sorting low-cardinality 
lists, and we additional changes we can also improve the performance for all 
the strategies.

Proposed changes:
- Sort small ranges with insertion sort (instead of binary sort).
- Select the quick sort pivot with medians.
- Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
- Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-10-21 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432670#comment-17432670
 ] 

Bruno Roustant commented on LUCENE-10196:
-

Benchmark run to compare sorters implementations for different shapes of data:

Each value is the time to complete a run. The values for the same column can be 
compared because the same data is provided as input to the various sorters. 
Each column has different random data. IntroSorter2 is the new modified version 
of IntroSorter.

The benchmark runs with 20K comparable entries. Comparing IntroSorter and 
IntroSorter2, we mainly observe a x5 speed for random low cardinality, and an 
improvement for each data shape.
{noformat}
RANDOM
  IntroSorter  ...  445  445  459  458  453  458  460  465  452  451
  IntroSorter2 ...  394  403  401  400  401  398  400  404  396  399
  TimSorter... 1196 1203 1197 1206 1193 1195 1193 1204 1230 1207
  MergeSorter  ... 1462 1470 1482 1466 1463 1475 1478 1475 1466 1471
RANDOM_LOW_CARDINALITY
  IntroSorter  ...  505  513  504  490  527  499  510  512  509  525
  IntroSorter2 ...   89   84   88   88   86   90   88   89   92   88
  TimSorter...  511  513  508  508  513  512  521  511  524  516
  MergeSorter  ...  725  725  725  762  737  723  727  724  736  733
RANDOM_MEDIUM_CARDINALITY
  IntroSorter  ...  463  451  452  455  448  452  451  459  458  455
  IntroSorter2 ...  370  381  378  373  375  376  376  372  370  370
  TimSorter... 1192 1212 1197 1196 1201 1202 1196 1199 1196 1204
  MergeSorter  ... 1493 1465 1470 1480 1460 1470 1483 1464 1506 1500
ASCENDING
  IntroSorter  ...  211  205  215  213  207  206  208  214  212  211
  IntroSorter2 ...  191  188  190  193  194  191  188  187  185  188
  TimSorter...   17   18   18   18   19   19   18   17   18   19
  MergeSorter  ...   73   71   72   75   72   73   73   77   72   71
DESCENDING
  IntroSorter  ...  225  253  229  220  225  231  222  217  220  223
  IntroSorter2 ...  220  213  214  220  205  211  208  210  208  212
  TimSorter...  545  576  562  553  543  551  552  552  548  546
  MergeSorter  ...  537  537  548  538  537  536  533  530  533  545
STRICTLY_DESCENDING
  IntroSorter  ...  215  214  221  224  218  227  213  212  212  211
  IntroSorter2 ...  202  203  202  205  202  204  206  204  202  204
  TimSorter...   22   21   21   22   22   21   21   22   22   23
  MergeSorter  ...  534  531  533  527  531  529  526  527  528  527
ASCENDING_SEQUENCES
  IntroSorter  ...  370  366  361  376  367  369  358  364  379  376
  IntroSorter2 ...  234  235  231  236  234  245  242  239  239  236
  TimSorter...  686  679  745  673  694  685  673  719  682  685
  MergeSorter  ...  894  911  932  907  923  907  918  917  920  916
MOSTLY_ASCENDING
  IntroSorter  ...  284  282  282  283  285  282  278  284  283  287
  IntroSorter2 ...  254  252  249  250  255  255  249  250  252  251
  TimSorter...  233  233  230  235  232  234  234  233  228  238
  MergeSorter  ...  399  385  390  398  398  392  380  377  377  387
{noformat}

> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and we additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
> - Sort small ranges with insertion sort (instead of binary sort).
> - Select the quick sort pivot with medians.
> - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
> - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2021-10-21 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432677#comment-17432677
 ] 

Adrien Grand commented on LUCENE-8739:
--

You need to download 
https://download.geonames.org/export/dump/allCountries.zip, unzip it and then 
use it to run the above benchmark which is a simple standalone Java class with 
a main class.

To run it with your own codec, you will need to modify the code a bit to use it 
rather than Lucene's default codec.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-10-21 Thread Bruno Roustant (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432685#comment-17432685
 ] 

Bruno Roustant commented on LUCENE-10196:
-

https://github.com/apache/lucene/pull/404

> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and we additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
> - Sort small ranges with insertion sort (instead of binary sort).
> - Select the quick sort pivot with medians.
> - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
> - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-21 Thread GitBox


dweiss commented on a change in pull request #404:
URL: https://github.com/apache/lucene/pull/404#discussion_r734017645



##
File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java
##
@@ -35,51 +38,102 @@ public final void sort(int from, int to) {
 quicksort(from, to, 2 * MathUtil.log(to - from, 2));
   }
 
+  /**
+   * Sorts between from (inclusive) and to (exclusive) with intro sort.

Review comment:
   "with intro sort"? Is this accurate here?

##
File path: lucene/core/src/java/org/apache/lucene/util/Sorter.java
##
@@ -216,6 +219,25 @@ void binarySort(int from, int to, int i) {
 }
   }
 
+  /**
+   * Sorts between from (inclusive) and to (exclusive) with insertion sort. 
Runs in {@code O(n^2)}.
+   * It is typically used by more sophisticated implementations as a fall-back 
when the numbers of

Review comment:
   when the number (not numbers)

##
File path: lucene/core/src/test/org/apache/lucene/util/SorterBenchmark.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.util.Random;
+import org.apache.lucene.util.BaseSortTestCase.Entry;
+import org.apache.lucene.util.BaseSortTestCase.Strategy;
+
+/**
+ * Benchmark for {@link Sorter} implementations.
+ *
+ * Run the static {@link #main(String[])} method to start the benchmark.
+ */
+public class SorterBenchmark {
+
+  private static final int ARRAY_LENGTH = 2;
+  private static final int RUNS = 10;
+  private static final int LOOPS = 100;
+
+  private enum SorterFactory {
+INTRO_SORTER(
+"IntroSorter",
+(arr, s) -> {
+  return new ArrayIntroSorter<>(arr, Entry::compareTo);
+}),
+TIM_SORTER(
+"TimSorter",
+(arr, s) -> {
+  return new ArrayTimSorter<>(arr, Entry::compareTo, arr.length / 64);
+}),
+MERGE_SORTER(
+"MergeSorter",
+(arr, s) -> {
+  return new ArrayInPlaceMergeSorter<>(arr, Entry::compareTo);
+}),
+;
+final String name;
+final Builder builder;
+
+SorterFactory(String name, Builder builder) {
+  this.name = name;
+  this.builder = builder;
+}
+
+interface Builder {
+  Sorter build(Entry[] arr, Strategy strategy);
+}
+  }
+
+  public static void main(String[] args) throws Exception {

Review comment:
   You could convert it to a test and give the test an assumption on some 
property (or just an Ignore). Then you'd have a seed-reproducible-benchmark. :)
   
   This stuff fits JMH nicely but I understand why you didn't want to roll out 
the big guns here.

##
File path: lucene/core/src/test/org/apache/lucene/util/SorterBenchmark.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.util.Random;
+import org.apache.lucene.util.BaseSortTestCase.Entry;
+import org.apache.lucene.util.BaseSortTestCase.Strategy;
+
+/**
+ * Benchmark for {@link Sorter} implementations.
+ *
+ * Run the static {@link #main(String[])} method to start the benchmark.
+ */
+public class SorterBenchmark {
+
+  private static final int ARRAY_LENGTH = 2;
+  private static final int RUNS = 10;
+  private static final int LOOPS = 100;
+
+  private enum SorterFactory {
+INTRO_SORTER(
+"IntroSorter",
+(arr, s) -> {
+  return new ArrayIntroSorter<>(arr, Entry::c

[jira] [Issue Comment Deleted] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-10-21 Thread Bruno Roustant (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Roustant updated LUCENE-10196:

Comment: was deleted

(was: https://github.com/apache/lucene/pull/404)

> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and we additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
> - Sort small ranges with insertion sort (instead of binary sort).
> - Select the quick sort pivot with medians.
> - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
> - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10196) Improve IntroSorter with 3-ways partitioning

2021-10-21 Thread Bruno Roustant (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Roustant updated LUCENE-10196:

Description: 
I added a SorterBenchmark to evaluate the performance of the various Sorter 
implementations depending on the strategies defined in BaseSortTestCase 
(random, random-low-cardinality, ascending, descending, etc).

By changing the implementation of the IntroSorter to use a 3-ways partitioning, 
we can gain a significant performance improvement when sorting low-cardinality 
lists, and with additional changes we can also improve the performance for all 
the strategies.

Proposed changes:
 - Sort small ranges with insertion sort (instead of binary sort).
 - Select the quick sort pivot with medians.
 - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
 - Replace the tail recursion by a loop.

  was:
I added a SorterBenchmark to evaluate the performance of the various Sorter 
implementations depending on the strategies defined in BaseSortTestCase 
(random, random-low-cardinality, ascending, descending, etc).

By changing the implementation of the IntroSorter to use a 3-ways partitioning, 
we can gain a significant performance improvement when sorting low-cardinality 
lists, and we additional changes we can also improve the performance for all 
the strategies.

Proposed changes:
- Sort small ranges with insertion sort (instead of binary sort).
- Select the quick sort pivot with medians.
- Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
- Replace the tail recursion by a loop.


> Improve IntroSorter with 3-ways partitioning
> 
>
> Key: LUCENE-10196
> URL: https://issues.apache.org/jira/browse/LUCENE-10196
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I added a SorterBenchmark to evaluate the performance of the various Sorter 
> implementations depending on the strategies defined in BaseSortTestCase 
> (random, random-low-cardinality, ascending, descending, etc).
> By changing the implementation of the IntroSorter to use a 3-ways 
> partitioning, we can gain a significant performance improvement when sorting 
> low-cardinality lists, and with additional changes we can also improve the 
> performance for all the strategies.
> Proposed changes:
>  - Sort small ranges with insertion sort (instead of binary sort).
>  - Select the quick sort pivot with medians.
>  - Partition with the fast Bentley-McIlroy 3-ways partitioning algorithm.
>  - Replace the tail recursion by a loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] bruno-roustant commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-21 Thread GitBox


bruno-roustant commented on a change in pull request #404:
URL: https://github.com/apache/lucene/pull/404#discussion_r734035237



##
File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java
##
@@ -35,51 +38,102 @@ public final void sort(int from, int to) {
 quicksort(from, to, 2 * MathUtil.log(to - from, 2));
   }
 
+  /**
+   * Sorts between from (inclusive) and to (exclusive) with intro sort.

Review comment:
   I think yes, because we still fallback to heap sort if the recursive 
stack goes too large.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] bruno-roustant commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-21 Thread GitBox


bruno-roustant commented on a change in pull request #404:
URL: https://github.com/apache/lucene/pull/404#discussion_r734039059



##
File path: lucene/core/src/test/org/apache/lucene/util/SorterBenchmark.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.util.Random;
+import org.apache.lucene.util.BaseSortTestCase.Entry;
+import org.apache.lucene.util.BaseSortTestCase.Strategy;
+
+/**
+ * Benchmark for {@link Sorter} implementations.
+ *
+ * Run the static {@link #main(String[])} method to start the benchmark.
+ */
+public class SorterBenchmark {
+
+  private static final int ARRAY_LENGTH = 2;
+  private static final int RUNS = 10;
+  private static final int LOOPS = 100;
+
+  private enum SorterFactory {
+INTRO_SORTER(
+"IntroSorter",
+(arr, s) -> {
+  return new ArrayIntroSorter<>(arr, Entry::compareTo);
+}),
+TIM_SORTER(
+"TimSorter",
+(arr, s) -> {
+  return new ArrayTimSorter<>(arr, Entry::compareTo, arr.length / 64);
+}),
+MERGE_SORTER(
+"MergeSorter",
+(arr, s) -> {
+  return new ArrayInPlaceMergeSorter<>(arr, Entry::compareTo);
+}),
+;
+final String name;
+final Builder builder;
+
+SorterFactory(String name, Builder builder) {
+  this.name = name;
+  this.builder = builder;
+}
+
+interface Builder {
+  Sorter build(Entry[] arr, Strategy strategy);
+}
+  }
+
+  public static void main(String[] args) throws Exception {

Review comment:
   Can we disable the assertions when running a test?
   (yes I hesitated to go with JMH but indeed I kept it simple, and based on 
the many runs I saw, the ratio between the sorters is reproducible)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-21 Thread GitBox


dweiss commented on a change in pull request #404:
URL: https://github.com/apache/lucene/pull/404#discussion_r734046039



##
File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java
##
@@ -35,51 +38,102 @@ public final void sort(int from, int to) {
 quicksort(from, to, 2 * MathUtil.log(to - from, 2));
   }
 
+  /**
+   * Sorts between from (inclusive) and to (exclusive) with intro sort.

Review comment:
   Ok. The quicksort method name is sort of confusing, but I see it now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] bruno-roustant commented on a change in pull request #404: LUCENE-10196: Improve IntroSorter with 3-ways partitioning.

2021-10-21 Thread GitBox


bruno-roustant commented on a change in pull request #404:
URL: https://github.com/apache/lucene/pull/404#discussion_r734048487



##
File path: lucene/core/src/java/org/apache/lucene/util/IntroSorter.java
##
@@ -35,51 +38,102 @@ public final void sort(int from, int to) {
 quicksort(from, to, 2 * MathUtil.log(to - from, 2));
   }
 
+  /**
+   * Sorts between from (inclusive) and to (exclusive) with intro sort.

Review comment:
   I'll rename the method simply "sort".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default

2021-10-21 Thread GitBox


dsmiley commented on pull request #362:
URL: https://github.com/apache/lucene/pull/362#issuecomment-949243034


   I added your changes but made 3 edits:
   * Removed your change to the randomized highlighter configuration.  It was 
working before; didn't need anything.  Thus we want to continue to test with 
WEIGHT_MATCHES being off, even when the other settings allow for it to be 
enabled.
   * Thanks to the test, which failed, I realized the boolean for checking 
PASSAGE_RELEVANCY_OVER_SPEED was inverted.
   * Enhanced the added test to check how many enum values there are so that if 
we change these enums, we intentionally revisit the default assertions.
   
   This looks ready to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] apanimesh061 commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default

2021-10-21 Thread GitBox


apanimesh061 commented on a change in pull request #362:
URL: https://github.com/apache/lucene/pull/362#discussion_r734200431



##
File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
##
@@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() {
 
 /**
  * Internally use the {@link Weight#matches(LeafReaderContext, int)} API 
for highlighting. It's
- * more accurate to the query, though might not calculate passage 
relevancy as well. Use of this
- * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
- * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default.
+ * more accurate to the query, and the snippets can be a little different 
for phrases because
+ * the whole phrase is marked up instead of each word. The passage 
relevancy calculation can be
+ * different (maybe worse?) and it's slower when highlighting many fields. 
Use of this flag
+ * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
+ * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so 
long as the requirements

Review comment:
   > I added your changes but made 3 edits:
   > 
   > * Removed your change to the randomized highlighter configuration.  It 
was working before; didn't need anything.  Thus we want to continue to test 
with WEIGHT_MATCHES being off, even when the other settings allow for it to be 
enabled.
   > 
   > * Thanks to the test, which failed, I realized the boolean for 
checking PASSAGE_RELEVANCY_OVER_SPEED was inverted.
   > 
   > * Enhanced the added test to check how many enum values there are so 
that if we change these enums, we intentionally revisit the default assertions.
   > 
   > 
   > This looks ready to me.
   
   @dsmiley Thanks a lot for fixing that. I was not sure if the tests were 
supposed to fail.
   
   Just for clarification, maybe I misunderstood your earlier comments. My 
understanding was that WEIGHT_MATCHES should be enabled when MULTI_TERM_QUERY 
and PHRASES are enabled and it does not matter if PASSAGE_RELEVANCY_OVER_SPEED 
is enabled. Based on your modification, it looks like all 3 should be enabled 
for the WEIGHT_MATCHES to be enabled?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default

2021-10-21 Thread GitBox


dsmiley commented on a change in pull request #362:
URL: https://github.com/apache/lucene/pull/362#discussion_r734203232



##
File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
##
@@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() {
 
 /**
  * Internally use the {@link Weight#matches(LeafReaderContext, int)} API 
for highlighting. It's
- * more accurate to the query, though might not calculate passage 
relevancy as well. Use of this
- * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
- * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default.
+ * more accurate to the query, and the snippets can be a little different 
for phrases because
+ * the whole phrase is marked up instead of each word. The passage 
relevancy calculation can be
+ * different (maybe worse?) and it's slower when highlighting many fields. 
Use of this flag
+ * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
+ * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so 
long as the requirements

Review comment:
   I may have misstated it.  I improved the JavaDocs just now for clarity.  
What's confusing is that it's theoretically possible to subclass and return 
WEIGHT_MATCHES without some of the other flags, and so the JavaDocs were saying 
basically that if you do that, then PASSAGE_RELEVANCY_OVER_SPEED will be 
ignored.  But I think it's clearer to speak of the 3 requirements in the same 
way.  And perhaps PASSAGE_RELEVANCY_OVER_SPEED should be an internal/expert 
option that is more of a hold-over from earlier times which has questionable 
value to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] apanimesh061 commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default

2021-10-21 Thread GitBox


apanimesh061 commented on a change in pull request #362:
URL: https://github.com/apache/lucene/pull/362#discussion_r734204016



##
File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
##
@@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() {
 
 /**
  * Internally use the {@link Weight#matches(LeafReaderContext, int)} API 
for highlighting. It's
- * more accurate to the query, though might not calculate passage 
relevancy as well. Use of this
- * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
- * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default.
+ * more accurate to the query, and the snippets can be a little different 
for phrases because
+ * the whole phrase is marked up instead of each word. The passage 
relevancy calculation can be
+ * different (maybe worse?) and it's slower when highlighting many fields. 
Use of this flag
+ * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
+ * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so 
long as the requirements

Review comment:
   Okay great. I understand this now. Thanks.
   
   On a separate note, do we need to create a task for replacing the setter 
with a builder for UnifiedHighlighter class?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on a change in pull request #362: LUCENE-9431: UnifiedHighlighter WEIGHT_MATCHES is now true by default

2021-10-21 Thread GitBox


dsmiley commented on a change in pull request #362:
URL: https://github.com/apache/lucene/pull/362#discussion_r734210117



##
File path: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
##
@@ -1168,9 +1174,12 @@ public CacheHelper getReaderCacheHelper() {
 
 /**
  * Internally use the {@link Weight#matches(LeafReaderContext, int)} API 
for highlighting. It's
- * more accurate to the query, though might not calculate passage 
relevancy as well. Use of this
- * flag requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
- * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. False by default.
+ * more accurate to the query, and the snippets can be a little different 
for phrases because
+ * the whole phrase is marked up instead of each word. The passage 
relevancy calculation can be
+ * different (maybe worse?) and it's slower when highlighting many fields. 
Use of this flag
+ * requires {@link #MULTI_TERM_QUERY} and {@link #PHRASES}. {@link
+ * #PASSAGE_RELEVANCY_OVER_SPEED} will be ignored. True by default, so 
long as the requirements

Review comment:
   Yeah, that'd be a new JIRA issue; you're welcome to do so.  I'm busy but 
happy to review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org