[GitHub] [lucene] bruno-roustant commented on a change in pull request #163: LUCENE-9983: Stop sorting determinize powersets unnecessarily

2021-06-03 Thread GitBox


bruno-roustant commented on a change in pull request #163:
URL: https://github.com/apache/lucene/pull/163#discussion_r644575732



##
File path: lucene/core/build.gradle
##
@@ -20,6 +20,8 @@ apply plugin: 'java-library'
 description = 'Lucene core library'
 
 dependencies {
+  implementation 'com.carrotsearch:hppc'

Review comment:
   Yes removal is fast.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on pull request #2502: SOLR-15316 Update Jetty to 9.4.41 (backport 8x)

2021-06-03 Thread GitBox


janhoy commented on pull request #2502:
URL: https://github.com/apache/lucene-solr/pull/2502#issuecomment-853890546


   Added a few more reviewers who have previously upgraded Jetty. I plan to 
merge this to 8x soon, get some Jenkins runs and then merge to release 
branch_8_9...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9976) WANDScorer assertion error in ensureConsistent

2021-06-03 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356472#comment-17356472
 ] 

Michael McCandless commented on LUCENE-9976:


Looks like it happened again: 
https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/291/

> WANDScorer assertion error in ensureConsistent
> --
>
> Key: LUCENE-9976
> URL: https://issues.apache.org/jira/browse/LUCENE-9976
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Priority: Major
>
> Build fails and is reproducible:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/283/console
> {code}
> ./gradlew test --tests TestExpressionSorts.testQueries 
> -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9379) Directory based approach for index encryption

2021-06-03 Thread Fabio Germann (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356487#comment-17356487
 ] 

Fabio Germann commented on LUCENE-9379:
---

Thanks [~broustant]/[~bruno.roustant], this is also something that I was 
looking for!

As for [~rcmuir]'s comment(s): I think the important distinction to be made is 
the goal of the usage of encryption and the guarantees you need.

If one needs tenant based encryption at rest, os level encryption is a valid 
way to go. Also if one needs maximum performance and tries to squeeze every 
last drop of performance out of their NVMe's - os level encryption (or no 
encryption) would probably be best.

BUT: In todays world there are sometimes things that are more important (or 
pose a greater risk) to a project or a company: namely user privacy and data 
protection. In such cases decreased performance is certainly acceptable (if not 
already anticipated).

Many of the above arguments against this contribution can be addressed one way 
or another. What can NOT be addressed (and why [~bruno.roustant]'s contribution 
is valuable) is:
 * It allows for the stored content to only be accessible to Lucene (the 
process/thread), for the exact duration that Lucene needs to process the data, 
without any dependency on a downstream component.
 * It allows for platform interoperability/independence. (Example:) This allows 
the solution to be deployed to Linux system, while being developed on 
MacOS/Windows. (Sidenote: This is very important if there are large teams 
working on solution building on this.)
 * It can even offer protection from passive privileged users - meaning that 
the file on the filesystem is not readable for a privileged user. In contrast 
to that the os-level encryption that would make such protections more complex.
 * It allows for simple deployment in container technologies (which would be 
tricky with the alternatives proposed by [~rcmuir])

 

Maybe the increased interest in this topic signals that there is something to 
be done?

Also recent research has taken note - like: 
(From the abstract:) "[...] However, currently deployed IR technologies, e.g., 
Apache Lucene - open-source search software, are insufficient when the 
information is protected or deemed to be private [...]"
(Source: 
[https://www.computer.org/csdl/journal/tq//01/08954811/1gs4XOshKHC)] 

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only 
> if an OS level encryption is not possible. OS level encryption better fits 
> Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This 
> Jira issue was created to address those.
> 
>  
> The goal is to provide optional encryption of the index, with a scope limited 
> to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without 
> modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as 
> possible.
> Determine how callers provide encryption keys. They must not be stored on 
> disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9379) Directory based approach for index encryption

2021-06-03 Thread Fabio Germann (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356487#comment-17356487
 ] 

Fabio Germann edited comment on LUCENE-9379 at 6/3/21, 2:52 PM:


Thanks [~broustant]/[~bruno.roustant], this is also something that I was 
looking for!

As for [~rcmuir]'s comment(s): I think the important distinction to be made is 
the goal of the usage of encryption and the guarantees you need.

If one needs tenant based encryption at rest, os level encryption is a valid 
way to go. Also if one needs maximum performance and tries to squeeze every 
last drop of performance out of their NVMe's - os level encryption (or no 
encryption) would probably be best.

BUT: In todays world there are sometimes things that are more important (or 
pose a greater risk) to a project or a company: namely user privacy and data 
protection. In such cases decreased performance is certainly acceptable (if not 
already anticipated).

Many of the above arguments against this contribution can be addressed one way 
or another. What can NOT be addressed (and why [~bruno.roustant]'s contribution 
is valuable) is:
 * It allows for the stored content to only be accessible to Lucene (the 
process/thread), for the exact duration that Lucene needs to process the data, 
without any dependency on a downstream component.
 * It allows for platform interoperability/independence. (Example: ) This 
allows the solution to be deployed to Linux system, while being developed on 
MacOS/Windows. (Sidenote: This is very important if there are large teams 
working on solution building on this.)
 * It can even offer protection from passive privileged users - meaning that 
the file on the filesystem is not readable for a privileged user. In contrast 
to that the os-level encryption that would make such protections more complex.
 * It allows for simple deployment in container technologies (which would be 
tricky with the alternatives proposed by [~rcmuir])

 

Maybe the increased interest in this topic signals that there is something to 
be done?

Also recent research has taken note - like: 
 (From the abstract: ) "[...] However, currently deployed IR technologies, 
e.g., Apache Lucene - open-source search software, are insufficient when the 
information is protected or deemed to be private [...]"
 (Source: 
[https://www.computer.org/csdl/journal/tq//01/08954811/1gs4XOshKHC)] 


was (Author: fabio.germann):
Thanks [~broustant]/[~bruno.roustant], this is also something that I was 
looking for!

As for [~rcmuir]'s comment(s): I think the important distinction to be made is 
the goal of the usage of encryption and the guarantees you need.

If one needs tenant based encryption at rest, os level encryption is a valid 
way to go. Also if one needs maximum performance and tries to squeeze every 
last drop of performance out of their NVMe's - os level encryption (or no 
encryption) would probably be best.

BUT: In todays world there are sometimes things that are more important (or 
pose a greater risk) to a project or a company: namely user privacy and data 
protection. In such cases decreased performance is certainly acceptable (if not 
already anticipated).

Many of the above arguments against this contribution can be addressed one way 
or another. What can NOT be addressed (and why [~bruno.roustant]'s contribution 
is valuable) is:
 * It allows for the stored content to only be accessible to Lucene (the 
process/thread), for the exact duration that Lucene needs to process the data, 
without any dependency on a downstream component.
 * It allows for platform interoperability/independence. (Example:) This allows 
the solution to be deployed to Linux system, while being developed on 
MacOS/Windows. (Sidenote: This is very important if there are large teams 
working on solution building on this.)
 * It can even offer protection from passive privileged users - meaning that 
the file on the filesystem is not readable for a privileged user. In contrast 
to that the os-level encryption that would make such protections more complex.
 * It allows for simple deployment in container technologies (which would be 
tricky with the alternatives proposed by [~rcmuir])

 

Maybe the increased interest in this topic signals that there is something to 
be done?

Also recent research has taken note - like: 
(From the abstract:) "[...] However, currently deployed IR technologies, e.g., 
Apache Lucene - open-source search software, are insufficient when the 
information is protected or deemed to be private [...]"
(Source: 
[https://www.computer.org/csdl/journal/tq//01/08954811/1gs4XOshKHC)] 

> Directory based approach for index encryption
> -
>
> Key: LUCENE-9379
> URL: https://issues.apache.org/jira/browse/LUCENE-9379
> Project: Lucene - Core
>

[jira] [Commented] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager

2021-06-03 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356501#comment-17356501
 ] 

Michael McCandless commented on LUCENE-9944:


Recent randomized test failure, maybe related to this change? 
https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/

> Implement alternative drill sideways faceting with provided CollectorManager
> 
>
> Key: LUCENE-9944
> URL: https://issues.apache.org/jira/browse/LUCENE-9944
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Today, if a user of {{DrillSideways}} wants to provide their own 
> {{CollectorManager}} when invoking {{search}}, they get this alternate, 
> "concurrent" implementation that creates N copies of the provided 
> {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs 
> them all concurrently. This is a very different implementation than the one a 
> user would get if providing a {{Collector}} instead. Additionally, an 
> {{ExecutorService}} must be provided when constructing a {{DrillSideways}} 
> instance if the user wants to bring their own {{CollectorManager}} 
> (otherwise, they'll get an unfriendly NPE when calling {{search}}).
> I propose adding an implementation to {{DrillSideways}} that will run the 
> "non-concurrent" algorithm in the case that a user wants to provide their own 
> {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and 
> doesn't want the concurrent algorithm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager

2021-06-03 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356502#comment-17356502
 ] 

Michael McCandless commented on LUCENE-9944:


Also, do we plan to backport this to 8.x?

> Implement alternative drill sideways faceting with provided CollectorManager
> 
>
> Key: LUCENE-9944
> URL: https://issues.apache.org/jira/browse/LUCENE-9944
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Today, if a user of {{DrillSideways}} wants to provide their own 
> {{CollectorManager}} when invoking {{search}}, they get this alternate, 
> "concurrent" implementation that creates N copies of the provided 
> {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs 
> them all concurrently. This is a very different implementation than the one a 
> user would get if providing a {{Collector}} instead. Additionally, an 
> {{ExecutorService}} must be provided when constructing a {{DrillSideways}} 
> instance if the user wants to bring their own {{CollectorManager}} 
> (otherwise, they'll get an unfriendly NPE when calling {{search}}).
> I propose adding an implementation to {{DrillSideways}} that will run the 
> "non-concurrent" algorithm in the case that a user wants to provide their own 
> {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and 
> doesn't want the concurrent algorithm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9905) Revise approach to specifying NN algorithm

2021-06-03 Thread Julie Tibshirani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356573#comment-17356573
 ] 

Julie Tibshirani commented on LUCENE-9905:
--

I opened [https://github.com/apache/lucene/pull/166] to move the graph 
construction parameters from field type attributes to format parameters. In my 
understanding, this is the last piece to address before we can close out the 
issue.

> Revise approach to specifying NN algorithm
> --
>
> Key: LUCENE-9905
> URL: https://issues.apache.org/jira/browse/LUCENE-9905
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: main (9.0)
>Reporter: Julie Tibshirani
>Priority: Blocker
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In LUCENE-9322 we decided that the new vectors API shouldn’t assume a 
> particular nearest-neighbor search data structure and algorithm. This 
> flexibility is important since NN search is a developing area and we'd like 
> to be able to experiment and evolve the algorithm. Right now we only have one 
> algorithm (HNSW), but we want to maintain the ability to use another.
> Currently the algorithm to use is specified through {{SearchStrategy}}, for 
> example {{SearchStrategy.EUCLIDEAN_HNSW}}. So a single format implementation 
> is expected to handle multiple algorithms. Instead we could have one format 
> implementation per algorithm. Our current implementation would be 
> HNSW-specific like {{HnswVectorFormat}}, and to experiment with another 
> algorithm you could create a new implementation like {{ClusterVectorFormat}}. 
> This would be better aligned with the codec framework, and help avoid 
> exposing algorithm details in the API.
> A concrete proposal (note many of these names will change when LUCENE-9855 is 
> addressed):
> # Rename {{Lucene90VectorFormat}} to {{Lucene90HnswVectorFormat}}. Also add 
> HNSW to name of {{Lucene90VectorWriter}} and {{Lucene90VectorReader}}.
> # Remove references to HNSW in {{SearchStrategy}}, so there is just 
> {{SearchStrategy.EUCLIDEAN}}, etc. Rename {{SearchStrategy}} to something 
> like {{SimilarityFunction}}.
> # Remove {{FieldType}} attributes related to HNSW parameters (maxConn and 
> beamWidth). Instead make these arguments to {{Lucene90HnswVectorFormat}}.
> # Introduce {{PerFieldVectorFormat}} to allow a different NN approach or 
> parameters to be configured per-field \(?\)
> One note: the current HNSW-based format includes logic for storing a numeric 
> vector per document, as well as constructing + storing a HNSW graph. When 
> adding another implementation, it’d be nice to be able to reuse logic for 
> reading/ writing numeric vectors. I don’t think we need to design for this 
> right now, but we can keep it in mind for the future?
> This issue is based on a thread [~jpountz] started: 
> [https://mail-archives.apache.org/mod_mbox/lucene-dev/202103.mbox/%3CCAPsWd%2BOuQv5y2Vw39%3DXdOuqXGtDbM4qXx5-pmYiB1X4jPEdiFQ%40mail.gmail.com%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #166: LUCENE-9905: Move HNSW build parameters to codec

2021-06-03 Thread GitBox


jtibshirani commented on pull request #166:
URL: https://github.com/apache/lucene/pull/166#issuecomment-854021866


   @msokolov as a heads up, I didn't have the chance to try out 
`KnnGraphTester` to double-check the behavior didn't change. However the update 
to `KnnGraphTester` was straightforward and I don't anticipate problems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9988) Address DrillSideways bug discovered during randomized testing

2021-06-03 Thread Greg Miller (Jira)
Greg Miller created LUCENE-9988:
---

 Summary: Address DrillSideways bug discovered during randomized 
testing
 Key: LUCENE-9988
 URL: https://issues.apache.org/jira/browse/LUCENE-9988
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Affects Versions: main (9.0)
Reporter: Greg Miller
Assignee: Greg Miller


There appears to be a correctness bug in DrillSideways likely introduced in 
LUCENE-9944. Need to track it down and fix.

 

Build: [https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/]

1 tests failed.
FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom

Error Message:
java.lang.AssertionError

Stack Trace:
java.lang.AssertionError
        at 
__randomizedtesting.SeedInfo.seed([ADCF6881460FEE2F:DF834D8EF76F585C]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at 
org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1580)
        at 
org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1159)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
        at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36

[jira] [Commented] (LUCENE-9988) Address DrillSideways bug discovered during randomized testing

2021-06-03 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356617#comment-17356617
 ] 

Greg Miller commented on LUCENE-9988:
-

I think I know what's going on here. Should have a PR up soon.

> Address DrillSideways bug discovered during randomized testing
> --
>
> Key: LUCENE-9988
> URL: https://issues.apache.org/jira/browse/LUCENE-9988
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Major
>
> There appears to be a correctness bug in DrillSideways likely introduced in 
> LUCENE-9944. Need to track it down and fix.
>  
> Build: [https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/]
> 1 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
> Error Message:
> java.lang.AssertionError
> Stack Trace:
> java.lang.AssertionError
>         at 
> __randomizedtesting.SeedInfo.seed([ADCF6881460FEE2F:DF834D8EF76F585C]:0)
>         at org.junit.Assert.fail(Assert.java:87)
>         at org.junit.Assert.assertTrue(Assert.java:42)
>         at org.junit.Assert.assertTrue(Assert.java:53)
>         at 
> org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1580)
>         at 
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1159)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)

[GitHub] [lucene] gsmiller opened a new pull request #167: LUCENE-9988: Fix DrillSideways bug discovered in randomized testing

2021-06-03 Thread GitBox


gsmiller opened a new pull request #167:
URL: https://github.com/apache/lucene/pull/167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9988) Address DrillSideways bug discovered during randomized testing

2021-06-03 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356631#comment-17356631
 ] 

Greg Miller commented on LUCENE-9988:
-

OK, a little context and a description of the bug:

Prior to LUCENE-9944, {{DrillSidewaysQuery}} was not safe to use for concurrent 
search. It would share instances of {{FacetsCollector}} across threads. 
LUCENE-9944 modified {{DrillSidewaysQuery}} to use {{FacetsCollectorManager}}s 
instead, instantiating new {{FacetsCollector}} instances whenever new 
{{BulkScorer}}s are created. {{DrillSidewaysQuery}} also keeps track of the 
instantiated {{FacetsCollector}}s and exposes them to {{DrillSideways}}, which 
is responsible for merging them all at the end.

The issue is that, if {{DrillSidewaysQuery#rewrite}} resulting in instantiating 
a new instance, the tracked {{FacetsCollector}}s weren't carried over. So 
{{DrillSideways}} would have a reference to the old query and never see the 
instantiated {{FacetsCollector}}s. 

The actual code change to fix this is pretty simple. PR attached if someone 
wouldn't mind taking a look.

> Address DrillSideways bug discovered during randomized testing
> --
>
> Key: LUCENE-9988
> URL: https://issues.apache.org/jira/browse/LUCENE-9988
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There appears to be a correctness bug in DrillSideways likely introduced in 
> LUCENE-9944. Need to track it down and fix.
>  
> Build: [https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/]
> 1 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
> Error Message:
> java.lang.AssertionError
> Stack Trace:
> java.lang.AssertionError
>         at 
> __randomizedtesting.SeedInfo.seed([ADCF6881460FEE2F:DF834D8EF76F585C]:0)
>         at org.junit.Assert.fail(Assert.java:87)
>         at org.junit.Assert.assertTrue(Assert.java:42)
>         at org.junit.Assert.assertTrue(Assert.java:53)
>         at 
> org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1580)
>         at 
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1159)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> com.

[jira] [Comment Edited] (LUCENE-9988) Address DrillSideways bug discovered during randomized testing

2021-06-03 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356631#comment-17356631
 ] 

Greg Miller edited comment on LUCENE-9988 at 6/3/21, 6:08 PM:
--

OK, a little context and a description of the bug:

Prior to LUCENE-9944, {{DrillSidewaysQuery}} was not safe to use for concurrent 
search. It would share instances of {{FacetsCollector}} across threads. 
LUCENE-9944 modified {{DrillSidewaysQuery}} to use {{FacetsCollectorManagers 
instead, instantiating new FacetsCollector}} instances whenever new 
{{BulkScorers are created. DrillSidewaysQuery}} also keeps track of the 
instantiated {{FacetsCollectors and exposes them to DrillSideways}}, which is 
responsible for merging them all at the end.

The issue is that, if {{DrillSidewaysQuery#rewrite}} resulting in instantiating 
a new instance, the tracked {{FacetsCollectors weren't carried over. So 
DrillSideways}} would have a reference to the old query and never see the 
instantiated FacetsCollectors. 

The actual code change to fix this is pretty simple. PR attached if someone 
wouldn't mind taking a look.


was (Author: gsmiller):
OK, a little context and a description of the bug:

Prior to LUCENE-9944, {{DrillSidewaysQuery}} was not safe to use for concurrent 
search. It would share instances of {{FacetsCollector}} across threads. 
LUCENE-9944 modified {{DrillSidewaysQuery}} to use {{FacetsCollectorManager}}s 
instead, instantiating new {{FacetsCollector}} instances whenever new 
{{BulkScorer}}s are created. {{DrillSidewaysQuery}} also keeps track of the 
instantiated {{FacetsCollector}}s and exposes them to {{DrillSideways}}, which 
is responsible for merging them all at the end.

The issue is that, if {{DrillSidewaysQuery#rewrite}} resulting in instantiating 
a new instance, the tracked {{FacetsCollector}}s weren't carried over. So 
{{DrillSideways}} would have a reference to the old query and never see the 
instantiated {{FacetsCollector}}s. 

The actual code change to fix this is pretty simple. PR attached if someone 
wouldn't mind taking a look.

> Address DrillSideways bug discovered during randomized testing
> --
>
> Key: LUCENE-9988
> URL: https://issues.apache.org/jira/browse/LUCENE-9988
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There appears to be a correctness bug in DrillSideways likely introduced in 
> LUCENE-9944. Need to track it down and fix.
>  
> Build: [https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/]
> 1 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
> Error Message:
> java.lang.AssertionError
> Stack Trace:
> java.lang.AssertionError
>         at 
> __randomizedtesting.SeedInfo.seed([ADCF6881460FEE2F:DF834D8EF76F585C]:0)
>         at org.junit.Assert.fail(Assert.java:87)
>         at org.junit.Assert.assertTrue(Assert.java:42)
>         at org.junit.Assert.assertTrue(Assert.java:53)
>         at 
> org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1580)
>         at 
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1159)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         

[jira] [Created] (LUCENE-9989) Add Dynamic Numeric Faceting

2021-06-03 Thread Yuting Gan (Jira)
Yuting Gan created LUCENE-9989:
--

 Summary: Add Dynamic Numeric Faceting 
 Key: LUCENE-9989
 URL: https://issues.apache.org/jira/browse/LUCENE-9989
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Affects Versions: main (9.0)
Reporter: Yuting Gan


This conversation starts in 

 I am creating a Jira to track the proposal of creating a new feature in Facet 
that can automatically generate dynamic numeric ranges based on the 
distribution of the underlying data without users specifying the ranges up 
front.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9970) provide apps details about why TooManyClauses was thrown

2021-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356685#comment-17356685
 ] 

ASF subversion and git services commented on LUCENE-9970:
-

Commit efb7b2a5e8c1bdc19dfd65f7095f70a142343472 in lucene's branch 
refs/heads/main from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=efb7b2a ]

LUCENE-9970: Add TooManyNestedClauses extends TooManyClauses so that 
IndexSearcher.rewrite can distinguish hos maxClauseCount is exceeded

This is an extension of the work done in LUCENE-8811 which added the two types 
of checks


> provide apps details about why TooManyClauses was thrown
> 
>
> Key: LUCENE-9970
> URL: https://issues.apache.org/jira/browse/LUCENE-9970
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: LUCENE-9970.patch
>
>
> Historically, if {{TooManyClauses}} was thrown it meant exactly one thing: 
> That a QueryX Builder (typically BooleanQuery, but there are a few others in 
> sandbox) was not going to allow it's caller to add a clause because that 
> QueryX object already had the maxClauseCount in _direct_ children.
> LUCENE-8811 added an additional "reason" why {{TooManyClauses}} may be thrown 
> starting in 9.0: IndexSearcher may now throw this exception  if the 
> (rewritten) Query being executed has a _cumulative_ number of clauses – 
> across the entire structure of _nested_ Query objects – that exceeds the 
> maxClauseCount.
> 
> I think it would be helpful to users if it was possible to tell from the 
> {{TooManyClauses}} exception how the maxClauseCount was exceeded (because of 
> the total number of direct children during rewrite, or cumulatively across 
> the entire nested structure) w/o needing to inspect the stack frames to see 
> if the thrower a rewrite method, or a QueryVisitor method.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery

2021-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356686#comment-17356686
 ] 

ASF subversion and git services commented on LUCENE-8811:
-

Commit efb7b2a5e8c1bdc19dfd65f7095f70a142343472 in lucene's branch 
refs/heads/main from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=efb7b2a ]

LUCENE-9970: Add TooManyNestedClauses extends TooManyClauses so that 
IndexSearcher.rewrite can distinguish hos maxClauseCount is exceeded

This is an extension of the work done in LUCENE-8811 which added the two types 
of checks


> Add maximum clause count check to IndexSearcher rather than BooleanQuery
> 
>
> Key: LUCENE-8811
> URL: https://issues.apache.org/jira/browse/LUCENE-8811
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: main (9.0)
>
> Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, 
> LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch
>
>
> Currently we only check whether boolean queries have too many clauses. 
> However there are other ways that queries may have too many clauses, for 
> instance if you have boolean queries that have themselves inner boolean 
> queries.
> Could we use the new Query visitor API to move this check from BooleanQuery 
> to IndexSearcher in order to make this check more consistent across queries? 
> See for instance LUCENE-8810 where a rewrite rule caused the maximum clause 
> count to be hit even though the total number of leaf queries remained the 
> same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9970) provide apps details about why TooManyClauses was thrown

2021-06-03 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter resolved LUCENE-9970.

Fix Version/s: main (9.0)
   Resolution: Fixed

> provide apps details about why TooManyClauses was thrown
> 
>
> Key: LUCENE-9970
> URL: https://issues.apache.org/jira/browse/LUCENE-9970
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: main (9.0)
>
> Attachments: LUCENE-9970.patch
>
>
> Historically, if {{TooManyClauses}} was thrown it meant exactly one thing: 
> That a QueryX Builder (typically BooleanQuery, but there are a few others in 
> sandbox) was not going to allow it's caller to add a clause because that 
> QueryX object already had the maxClauseCount in _direct_ children.
> LUCENE-8811 added an additional "reason" why {{TooManyClauses}} may be thrown 
> starting in 9.0: IndexSearcher may now throw this exception  if the 
> (rewritten) Query being executed has a _cumulative_ number of clauses – 
> across the entire structure of _nested_ Query objects – that exceeds the 
> maxClauseCount.
> 
> I think it would be helpful to users if it was possible to tell from the 
> {{TooManyClauses}} exception how the maxClauseCount was exceeded (because of 
> the total number of direct children during rewrite, or cumulatively across 
> the entire nested structure) w/o needing to inspect the stack frames to see 
> if the thrower a rewrite method, or a QueryVisitor method.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9944) Implement alternative drill sideways faceting with provided CollectorManager

2021-06-03 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356690#comment-17356690
 ] 

Greg Miller commented on LUCENE-9944:
-

I opened LUCENE-9988 to track the bug found in randomized testing (along with a 
fix). Thanks [~mikemccand]!

As for a back port, I don't think it's critical to get it out in the 8.9 
release, and given how close that release seems, I wouldn't try to squeeze this 
in, especially given the bug found in randomized testing. I could work on back 
porting to branch_8x after 8.9 is finalized though. Does that sound right?

> Implement alternative drill sideways faceting with provided CollectorManager
> 
>
> Key: LUCENE-9944
> URL: https://issues.apache.org/jira/browse/LUCENE-9944
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Today, if a user of {{DrillSideways}} wants to provide their own 
> {{CollectorManager}} when invoking {{search}}, they get this alternate, 
> "concurrent" implementation that creates N copies of the provided 
> {{DrillDownQuery}} (where N is the number of drill-down dimensions) and runs 
> them all concurrently. This is a very different implementation than the one a 
> user would get if providing a {{Collector}} instead. Additionally, an 
> {{ExecutorService}} must be provided when constructing a {{DrillSideways}} 
> instance if the user wants to bring their own {{CollectorManager}} 
> (otherwise, they'll get an unfriendly NPE when calling {{search}}).
> I propose adding an implementation to {{DrillSideways}} that will run the 
> "non-concurrent" algorithm in the case that a user wants to provide their own 
> {{CollectorManager}} but doesn't want to provide an {{ExecutorService}} (and 
> doesn't want the concurrent algorithm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7321) Character Mapping

2021-06-03 Thread Ivan Provalov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356696#comment-17356696
 ] 

Ivan Provalov commented on LUCENE-7321:
---

[~marcussorealheis], I have been maintaining it (bug fixes, etc...), not 
upgraded to version 8 yet.  I could do that if there is any interest in 
integrating it.

 

> Character Mapping
> -
>
> Key: LUCENE-7321
> URL: https://issues.apache.org/jira/browse/LUCENE-7321
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.6.1, 5.4.1, 6.0, 6.0.1
>Reporter: Ivan Provalov
>Priority: Minor
>  Labels: patch
> Fix For: 6.0.1
>
> Attachments: CharacterMappingComponent.pdf, LUCENE-7321.patch
>
>
> One of the challenges in search is recall of an item with a common typing 
> variant.  These cases can be as simple as lower/upper case in most languages, 
> accented characters, or more complex morphological phenomena like prefix 
> omitting, or constructing a character with some combining mark.  This 
> component addresses the cases, which are not covered by ASCII folding 
> component, or more complex to design with other tools.  The idea is that a 
> linguist could provide the mappings in a tab-delimited file, which then can 
> be directly used by Solr.
> The mappings are maintained in the tab-delimited file, which could be just a 
> copy paste from Excel spreadsheet.  This gives the linguists the opportunity 
> to create the mappings, then for the developer to include them in Solr 
> configuration.  There are a few cases, when the mappings grow complex, where 
> some additional debugging may be required.  The mappings can contain any 
> sequence of characters to any other sequence of characters.
> Some of the cases I discuss in detail document are handling the voiced vowels 
> for Japanese; common typing substitutions for Korean, Russian, Polish; 
> transliteration for Polish, Arabic; prefix removal for Arabic; suffix folding 
> for Japanese.  In the appendix, I give an example of implementing a Russian 
> light weight stemmer using this component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #167: LUCENE-9988: Fix DrillSideways bug discovered in randomized testing

2021-06-03 Thread GitBox


gsmiller merged pull request #167:
URL: https://github.com/apache/lucene/pull/167


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9988) Address DrillSideways bug discovered during randomized testing

2021-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356758#comment-17356758
 ] 

ASF subversion and git services commented on LUCENE-9988:
-

Commit 7a7003c51c8c0470f04e9df2ee9cb6002e124689 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7a7003c ]

LUCENE-9988: Fix DrillSideways bug discovered in randomized testing (#167)



> Address DrillSideways bug discovered during randomized testing
> --
>
> Key: LUCENE-9988
> URL: https://issues.apache.org/jira/browse/LUCENE-9988
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There appears to be a correctness bug in DrillSideways likely introduced in 
> LUCENE-9944. Need to track it down and fix.
>  
> Build: [https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/]
> 1 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
> Error Message:
> java.lang.AssertionError
> Stack Trace:
> java.lang.AssertionError
>         at 
> __randomizedtesting.SeedInfo.seed([ADCF6881460FEE2F:DF834D8EF76F585C]:0)
>         at org.junit.Assert.fail(Assert.java:87)
>         at org.junit.Assert.assertTrue(Assert.java:42)
>         at org.junit.Assert.assertTrue(Assert.java:53)
>         at 
> org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1580)
>         at 
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1159)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.rules

[jira] [Commented] (LUCENE-9988) Address DrillSideways bug discovered during randomized testing

2021-06-03 Thread Greg Miller (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356760#comment-17356760
 ] 

Greg Miller commented on LUCENE-9988:
-

I went ahead and pushed the fix for this along with a new test case that 
reliably reproduces the bug. It was a small fix and I wanted to make sure the 
bug didn't continue to cause test failures. If anyone has feedback on the fix, 
I'm of course still open to that!

> Address DrillSideways bug discovered during randomized testing
> --
>
> Key: LUCENE-9988
> URL: https://issues.apache.org/jira/browse/LUCENE-9988
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There appears to be a correctness bug in DrillSideways likely introduced in 
> LUCENE-9944. Need to track it down and fix.
>  
> Build: [https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/]
> 1 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
> Error Message:
> java.lang.AssertionError
> Stack Trace:
> java.lang.AssertionError
>         at 
> __randomizedtesting.SeedInfo.seed([ADCF6881460FEE2F:DF834D8EF76F585C]:0)
>         at org.junit.Assert.fail(Assert.java:87)
>         at org.junit.Assert.assertTrue(Assert.java:42)
>         at org.junit.Assert.assertTrue(Assert.java:53)
>         at 
> org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1580)
>         at 
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1159)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapt

[jira] [Resolved] (LUCENE-9988) Address DrillSideways bug discovered during randomized testing

2021-06-03 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller resolved LUCENE-9988.
-
Fix Version/s: main (9.0)
   Resolution: Fixed

> Address DrillSideways bug discovered during randomized testing
> --
>
> Key: LUCENE-9988
> URL: https://issues.apache.org/jira/browse/LUCENE-9988
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Affects Versions: main (9.0)
>Reporter: Greg Miller
>Assignee: Greg Miller
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There appears to be a correctness bug in DrillSideways likely introduced in 
> LUCENE-9944. Need to track it down and fix.
>  
> Build: [https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/62/]
> 1 tests failed.
> FAILED:  org.apache.lucene.facet.TestDrillSideways.testRandom
> Error Message:
> java.lang.AssertionError
> Stack Trace:
> java.lang.AssertionError
>         at 
> __randomizedtesting.SeedInfo.seed([ADCF6881460FEE2F:DF834D8EF76F585C]:0)
>         at org.junit.Assert.fail(Assert.java:87)
>         at org.junit.Assert.assertTrue(Assert.java:42)
>         at org.junit.Assert.assertTrue(Assert.java:53)
>         at 
> org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1580)
>         at 
> org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1159)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>         at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>         at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>         at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>         at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>         at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>         at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>         at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(Abstrac

[GitHub] [lucene] mikemccand commented on pull request #157: LUCENE-9963 Fix issue with FlattenGraphFilter throwing exceptions from holes

2021-06-03 Thread GitBox


mikemccand commented on pull request #157:
URL: https://github.com/apache/lucene/pull/157#issuecomment-854277244


   I love all the added test cases!
   
   Maybe in the new random test we could use `tokenstreamToAutomaton` and then 
confirm no dead states in that, using existing APIs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

2021-06-03 Thread GitBox


rmuir commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r645249079



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java
##
@@ -76,14 +79,55 @@
   static final int VERSION_START = 0;
   static final int VERSION_CURRENT = VERSION_START;
 
-  /** Sole constructor */
+  static final String BEAM_WIDTH_KEY =
+  Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width";
+  static final String MAX_CONN_KEY = 
Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn";
+
+  /**
+   * Controls how many of the nearest neighbor candidates are connected to the 
new node. See {@link
+   * HnswGraph} for details.
+   */
+  private final int maxConn;
+
+  /**
+   * The number of candidate neighbors to track while searching the graph for 
each newly inserted
+   * node. See {@link HnswGraph} for details.
+   */
+  private final int beamWidth;
+
   public Lucene90HnswVectorFormat() {
 super("Lucene90HnswVectorFormat");
+this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH;
+  }
+
+  public Lucene90HnswVectorFormat(int maxConn, int beamWidth) {
+super("Lucene90HnswVectorFormat");
+this.maxConn = maxConn;
+this.beamWidth = beamWidth;
   }
 
   @Override
   public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException 
{
-return new Lucene90HnswVectorWriter(state);
+SegmentInfo segmentInfo = state.segmentInfo;
+putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn));
+putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth));
+return new Lucene90HnswVectorWriter(state, maxConn, beamWidth);
+  }
+
+  private void putFormatAttribute(SegmentInfo si, String key, String value) {
+String previousValue = si.putAttribute(key, value);
+if (previousValue != null && previousValue.equals(value) == false) {

Review comment:
   I don't think these should be written. If someone is using the per-field 
impl, and has different fields with different values, then they'd trample each 
other.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9976) WANDScorer assertion error in ensureConsistent

2021-06-03 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357046#comment-17357046
 ] 

Zach Chen edited comment on LUCENE-9976 at 6/4/21, 4:13 AM:


{quote}I'm using mac, and trying with main branch head commit a6cf46dad
{quote}
Okay I should have also tried to pull the latest main branch before running the 
tests, and after that I'm also able to consistently reproduce this failure. 
Sorry for the confusion earlier!

The failure happened at this line: 
{code:java}
assert minCompetitiveScore == 0 || tailMaxScore < minCompetitiveScore{code}
I reset the commits a few times to see where it started to fail, and believed 
it started from the performance regression fix commit 820e63d2ddf235c from 
https://issues.apache.org/jira/browse/LUCENE-9958 . The change was
{code:java}
diff --git a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java 
b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
index f33af6b8ee8..f5bab49fb71 100644
--- a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
+++ b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
@@ -548,7 +548,7 @@ final class WANDScorer extends Scorer {
 
   /** Insert an entry in 'tail' and evict the least-costly scorer if full. */
   private DisiWrapper insertTailWithOverFlow(DisiWrapper s) {
-if (tailMaxScore + s.maxScore < minCompetitiveScore) {
+if (tailMaxScore + s.maxScore < minCompetitiveScore || tailSize + 1 < 
minShouldMatch) {
   // we have free room for this new entry
   addTail(s);
   tailMaxScore += s.maxScore;
{code}
I think from this logic, _tailMaxScore >= minCompetitiveScore_ is intended to 
happen now, since the block may be entered from condition _tailSize + 1 < 
minShouldMatch._ So the assertion logic should be updated to the following 
(tested locally and passed the test):
{code:java}
assert minCompetitiveScore == 0 || tailMaxScore < minCompetitiveScore || 
tailSize < minShouldMatch{code}
I can raise a quick PR if that looks good?  [~jpountz]


was (Author: zacharymorn):
{quote}I'm using mac, and trying with main branch head commit a6cf46dad
{quote}
Okay I should have also tried to pull the latest main branch before running the 
tests, and after that I'm also able to consistently reproduce this failure. 
Sorry for the confusion earlier!

The failure happened at this line: 
{code:java}
assert minCompetitiveScore == 0 || tailMaxScore < minCompetitiveScore{code}
I reset the commits a few times to see where it started to fail, and believed 
it started from the performance regression fix commit 820e63d2ddf235c from 
https://issues.apache.org/jira/browse/LUCENE-9958 . The change was
{code:java}
diff --git a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java 
b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
index f33af6b8ee8..f5bab49fb71 100644
--- a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
+++ b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
@@ -548,7 +548,7 @@ final class WANDScorer extends Scorer {
 
   /** Insert an entry in 'tail' and evict the least-costly scorer if full. */
   private DisiWrapper insertTailWithOverFlow(DisiWrapper s) {
-if (tailMaxScore + s.maxScore < minCompetitiveScore) {
+if (tailMaxScore + s.maxScore < minCompetitiveScore || tailSize + 1 < 
minShouldMatch) {
   // we have free room for this new entry
   addTail(s);
   tailMaxScore += s.maxScore;
{code}
I think from this logic, _tailMaxScore >= minCompetitiveScore_ is intended to 
happen now, since the block may be entered from condition _tailSize + 1 < 
minShouldMatch._ So the assertion logic should be updated to the following 
(tested locally and passed the test):

 
{code:java}
assert minCompetitiveScore == 0 || tailMaxScore < minCompetitiveScore || 
tailSize < minShouldMatch{code}
 

I can raise a quick PR if that looks good?  [~jpountz]

> WANDScorer assertion error in ensureConsistent
> --
>
> Key: LUCENE-9976
> URL: https://issues.apache.org/jira/browse/LUCENE-9976
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Priority: Major
>
> Build fails and is reproducible:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/283/console
> {code}
> ./gradlew test --tests TestExpressionSorts.testQueries 
> -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9976) WANDScorer assertion error in ensureConsistent

2021-06-03 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357046#comment-17357046
 ] 

Zach Chen commented on LUCENE-9976:
---

{quote}I'm using mac, and trying with main branch head commit a6cf46dad
{quote}
Okay I should have also tried to pull the latest main branch before running the 
tests, and after that I'm also able to consistently reproduce this failure. 
Sorry for the confusion earlier!

The failure happened at this line: 
{code:java}
assert minCompetitiveScore == 0 || tailMaxScore < minCompetitiveScore{code}
I reset the commits a few times to see where it started to fail, and believed 
it started from the performance regression fix commit 820e63d2ddf235c from 
https://issues.apache.org/jira/browse/LUCENE-9958 . The change was
{code:java}
diff --git a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java 
b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
index f33af6b8ee8..f5bab49fb71 100644
--- a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
+++ b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
@@ -548,7 +548,7 @@ final class WANDScorer extends Scorer {
 
   /** Insert an entry in 'tail' and evict the least-costly scorer if full. */
   private DisiWrapper insertTailWithOverFlow(DisiWrapper s) {
-if (tailMaxScore + s.maxScore < minCompetitiveScore) {
+if (tailMaxScore + s.maxScore < minCompetitiveScore || tailSize + 1 < 
minShouldMatch) {
   // we have free room for this new entry
   addTail(s);
   tailMaxScore += s.maxScore;
{code}
I think from this logic, _tailMaxScore >= minCompetitiveScore_ is intended to 
happen now, since the block may be entered from condition _tailSize + 1 < 
minShouldMatch._ So the assertion logic should be updated to the following 
(tested locally and passed the test):

 
{code:java}
assert minCompetitiveScore == 0 || tailMaxScore < minCompetitiveScore || 
tailSize < minShouldMatch{code}
 

I can raise a quick PR if that looks good?  [~jpountz]

> WANDScorer assertion error in ensureConsistent
> --
>
> Key: LUCENE-9976
> URL: https://issues.apache.org/jira/browse/LUCENE-9976
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Priority: Major
>
> Build fails and is reproducible:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/283/console
> {code}
> ./gradlew test --tests TestExpressionSorts.testQueries 
> -Dtests.seed=FF571CE915A0955 -Dtests.multiplier=2 -Dtests.nightly=true 
> -Dtests.slow=true -Dtests.asserts=true -p lucene/expressions/
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org