date:20211026



jpountz commented on a change in pull request #413:
URL: https://github.com/apache/lucene/pull/413#discussion_r736218409



##
File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java
##
@@ -60,7 +60,8 @@ public KnnVectorQuery(String field, float[] target, int k) {
   public Query rewrite(IndexReader reader) throws IOException {
 TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()];
 for (LeafReaderContext ctx : reader.leaves()) {
-  perLeafResults[ctx.ord] = searchLeaf(ctx, Math.min(k, reader.numDocs()));
+  int numDocs = ctx.reader().numDocs();
+  perLeafResults[ctx.ord] = numDocs > 0 ? searchLeaf(ctx, Math.min(k, 
numDocs)) : NO_RESULTS;

Review comment:
   I was thinking of passing `k` here, and moving the logic to avoid 
oversizing the heap to Lucene90HnswVectorsReader by doing `k = min(k, size())` 
(where `size()` is the number of docs that have a vector).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss merged pull request #405: LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)

2021-10-26 Thread ASF subversion and git services (Jira)



dweiss merged pull request #405:
URL: https://github.com/apache/lucene/pull/405


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)



[ 
https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434150#comment-17434150
 ] 

ASF subversion and git services commented on LUCENE-10198:
--

Commit 780846a732b9c3f9c8b0abeae7d1d2c19df524e4 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=780846a ]

LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults 
(heap, stack and system proxies) (#405)

Co-authored-by: balmukundblr 

> Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack 
> and system proxies)
> ---
>
> Key: LUCENE-10198
> URL: https://issues.apache.org/jira/browse/LUCENE-10198
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10198.
--
Fix Version/s: main (9.0)
   Resolution: Fixed

Merged this in.

> Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack 
> and system proxies)
> ---
>
> Key: LUCENE-10198
> URL: https://issues.apache.org/jira/browse/LUCENE-10198
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jprinet opened a new pull request #414: LUCENE-10195: Improve Gradle build speed



jprinet opened a new pull request #414:
URL: https://github.com/apache/lucene/pull/414


   # Description
   Improve Gradle build speed by mainly focussing on up-to-date checks and task 
caching
   
   # Solution
   Using Gradle Enterprise to identify room for improvements
   
   # Tests
   nightly and regression tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434249#comment-17434249
 ] 

Jerome Prinet commented on LUCENE-10195:


I mainly focussed on up-to-date checks and task caching, the most spectacular 
improvement happens when running a clean with a populated cache:

./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false 
-Ptests.neverUpToDate=false --scan

 

*Before changes*

[https://gradle.com/s/kxkbukaklyiz4]

1206 tasks executed in 40 projects *in 5m 39s, with 72 avoided tasks saving 
16.359s*

*After changes*

[https://gradle.com/s/mfiwiheg4wxjq]

1206 tasks executed in 40 projects *in 26s, with 394 avoided tasks saving 30m 
2.417s*

 

*Here is the detail of the changes:*
 * Declare outputs in your tasks to benefit from up-to-date checks 
(_CollectJarInfos_)
 * _validateSourcePatterns_ task should not take into account _.idea_, nor 
_.gradle_ files
 * Annotate tasks Cacheable to benefit from cache (_EcjLint, 
ValidateSourcePatterns, RatTask, RenderJavadoc, checkBrokenLinks_)
 * Use valid outputs rather than dummy ones (_EcjLint_)
 * Do not use string representation for task inputs being collection of files 
or directories (_ie. resources, scriptResources_) as you can’t benefit from 
caching when relocating workspace to a different folder
 * Minimize direct usage of system properties which are location or OS 
dependent, as they are part of the cache entry key
 * Do not set location or OS related information in the _MANIFEST.MF 
(X-Build-OS)_

 

*Here some advices for future improvements:*
 * Fixing _tests.seed_ obviously help to benefit from up-to-date check, I get 
the point about randomization, but this is a trade-off with expensive cost of 
resources
 * Use the standard _Gradle wrapper_
 * Setup local _gradle.properties_ in Gradle home folder rather than having an 
automatic generation from _gradle/generation/local-settings.gradle_
 * Add to VCS gradle _.properties & Gradle wrapper_
 * Do not override Gradle daemon TTL to 15mn, this is way to short
 * Do not create / commit generated test files to src directory 
(_frenchArticles.txt, Top50KWiki.utf8, CambridgeMA.utf8, 
Latin-dont-break-on-hyphens.rbbi_)

=> _:lucene:analysis:common:compileTestJava_ is not cacheable due to 
overlapping outputs

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434260#comment-17434260
 ] 

Jerome Prinet commented on LUCENE-10195:


Using _com.palantir.consistent-versions_ Gradle plugin triggers a deprecation 
warning and makes it impossible to enable [configuration on 
demand|https://docs.gradle.org/current/userguide/multi_project_configuration_and_execution.html#sec:configuration_on_demand]
 which would help to improve performances as well.

I filed an 
[issue|https://github.com/palantir/gradle-consistent-versions/issues/781] in 
order to get this resolved.

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434267#comment-17434267
 ] 

Dawid Weiss commented on LUCENE-10195:
--

I did take a look at the patch. Thank you - some of the things look like nice 
improvements. This command is not what should be the benchmark though:
{code}
./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false 
-Ptests.neverUpToDate=false --scan
{code}
instead, the second invocation of this (full incremental build) should be 
measured instead (tests are a separate story):
{code}
./gradlew check -x test
{code}

I understand you're advocating for gradle cache and I think it's great but I 
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a corporate 
CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused 
much. And we do want folks
to run stuff in their own environments because this is a good regression test 
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced 
down their throats.

As for the recomendations, here are my thoughts.

> Fixing tests.seed obviously help to benefit from up-to-date check, I get the 
> point about randomization, but this is a trade-off with expensive cost of 
> resources

Yes, it is a tradeoff we're willing to take. Again - if somebody wants a 
locally fixed seed, they can do it. You'd
be surprised how frequently those tests fail on boundary conditions in only 
certain environment combinations.

> Use the standard Gradle wrapper

There is a reason why non-standard wrapper is used - please look up the 
relevant issue in Jira (source release shouldn't ship
a binary artifact).

> Setup local gradle.properties in Gradle home folder rather than having an 
> automatic generation from gradle/generation/local-settings.gradle

There is a reason why gradle.properties is generated (it adjusts the defaults 
to the local machine). I wish gradle had
a mechanism of tuning, say, max-workers dynamically, but I don't think it does.

> Do not override Gradle daemon TTL to 15mn, this is way to short

If you run the build regularily switching VMs then background gradle daemons 
eat up all your memory.
So no, I don't think it's too short.

> Do not create / commit generated test files to src directory 
> (frenchArticles.txt, Top50KWiki.utf8, 
> CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi)

You don't understand why they're there. One of these generated files requires 
16GB memory and over 15 
minutes on a decent server to generate. Even if you use the gradle cache, the 
first run on your 
old-ish laptop will kill your build with an OOM. Some of these resources 
require specific 
environments (like Linux toolchain). I don't think there is a mechanism in 
gradle which would allow
only regenerating these resources if their source input triggers actually 
change.

I'll go through the changes you suggested and will cherry-pick some of the 
improvements, thank you.

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement

[
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434267#comment-17434267
]

Dawid Weiss edited comment on LUCENE-10195 at 10/26/21, 11:02 AM:
--

I did take a look at the patch. Thank you - some of the things look like nice
improvements. This command is not what should be the benchmark though:
{code}
./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false
-Ptests.neverUpToDate=false --scan
{code}
instead, the second invocation of this (full incremental build) should be
measured instead (tests are a separate story):
{code}
./gradlew check -x test
{code}

I understand you're advocating for gradle cache and I think it's great but I
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a bunch of
corporate CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused
much. And we do want folks
to run stuff in their own environments because this is a good regression test
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced
down their throats.

As for the recomendations, here are my thoughts.

bq. Fixing tests.seed obviously help to benefit from up-to-date check, I get
the point about randomization, but this is a trade-off with expensive cost of
resources

Yes, it is a tradeoff we're willing to take. Again - if somebody wants a
locally fixed seed, they can do it. You'd
be surprised how frequently those tests fail on boundary conditions in only
certain environment combinations.

bq. Use the standard Gradle wrapper

There is a reason why non-standard wrapper is used - please look up the
relevant issue in Jira (source release shouldn't ship
a binary artifact).

bq. Setup local gradle.properties in Gradle home folder rather than having an
automatic generation from gradle/generation/local-settings.gradle

There is a reason why gradle.properties is generated (it adjusts the defaults
to the local machine). I wish gradle had
a mechanism of tuning, say, max-workers dynamically, but I don't think it does.

bq. Do not override Gradle daemon TTL to 15mn, this is way to short

If you run the build regularily switching VMs then background gradle daemons
eat up all your memory.
So no, I don't think it's too short.

bq. Do not create / commit generated test files to src directory
(frenchArticles.txt, Top50KWiki.utf8,
bq. CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi)

You don't understand why they're there. One of these generated files requires
16GB memory and over 15
minutes on a decent server to generate. Even if you use the gradle cache, the
first run on your
old-ish laptop will kill your build with an OOM. Some of these resources
require specific
environments (like Linux toolchain). I don't think there is a mechanism in
gradle which would allow
only regenerating these resources if their source input triggers actually
change.

I'll go through the changes you suggested and will cherry-pick some of the
improvements, thank you.

was (Author: dweiss):
I did take a look at the patch. Thank you - some of the things look like nice
improvements. This command is not what should be the benchmark though:
{code}
./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false
-Ptests.neverUpToDate=false --scan
{code}
instead, the second invocation of this (full incremental build) should be
measured instead (tests are a separate story):
{code}
./gradlew check -x test
{code}

I understand you're advocating for gradle cache and I think it's great but I
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a corporate
CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused
much. And we do want folks
to run stuff in their own environments because this is a good regression test
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced
down their throats.

As for the recomendations, here are my thoughts.

> Fixing tests.seed obviously help to benefit from up-to-date check, I get the
> point about randomization, but this is a trade-off with expensive cost of
> resources

> Use the standard Gradle wrapper

There is a reason why non-standard wrapper is used - please look up the
relevant issue in Jira (source release shouldn't ship
a binary artifact).

> Setup local gradle.properties in Gradle home folder rather than having an
> automatic generation from gradle/generation/local-settings.gradle

There is a

[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support

2021-10-26 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434299#comment-17434299
 ] 

Adrien Grand commented on LUCENE-10061:
---

bq. in order to merge impacts from multiple fields for CombinedFieldsQuery, we 
may need to compute all the possible summation combinations of competitive 
{freq, norm} across all fields

I agree that there is a combinatorial explosion issue, and I fear that it's 
even worse than the example that you gave since we also need to consider the 
case when some fields do not match the query.

 In the examples I've seen, there's often a field that has a much higher weight 
than other fields (e.g. a title field that has a 10x greater weight than a body 
field), so I am wondering if we could leverage this property to start from the 
impacts of the field that has the highest weight and see how we can cheaply 
incorporate impacts from other fields, even if this would overestimate the 
actual maximum score for the query.

> CombinedFieldsQuery needs dynamic pruning support
> -
>
> Key: LUCENE-10061
> URL: https://issues.apache.org/jira/browse/LUCENE-10061
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, 
> forcing Lucene to collect all matches in order to figure the top-k hits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene

2021-10-26 Thread Christine Poerschke (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434314#comment-17434314
 ] 

Christine Poerschke commented on LUCENE-10157:
--

Hi [~cvandenberg], thanks for opening this issue for additional Indri 
functionality!

Would you be open to contributing via a pull request to 
[https://github.com/apache/lucene] {{main}} branch instead of patch attachment? 
E.g. the CI would run automatically on it and subjectively perhaps some folks 
would find it more convenient to review.

Suggestion from quick look at the patch: to add some sort of test coverage for 
the new queries.

> Add Additional Indri Search Engine Functionality to Lucene
> --
>
> Key: LUCENE-10157
> URL: https://issues.apache.org/jira/browse/LUCENE-10157
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser, core/search
>Reporter: Cameron VandenBerg
>Priority: Major
> Attachments: LUCENE-10157.patch
>
>
> In Jira issue LUCENE-9537, basic functionality from the Indri search engine 
> ([http://lemurproject.org/indri.php]) was added to Lucene.  With that 
> functionality in place, we would love to build upon that to add additional 
> Indri queries and an Indri query parser to Lucene to broaden the Indri 
> functionality within Lucene.  In this patch, I have added the Indri NOT, the 
> INDRI OR, and the Indri WeightedSum functionality.  I have also included an 
> IndriQueryParser for accessing this functionality.  More information on these 
> query operators can be seen here: 
> [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: 
> [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/]
>  
> I would be very excited to work with the Lucene community again to try to add 
> this functionality.  I am open to suggestions, and I am happy to make any 
> changes that might be suggested.  Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434344#comment-17434344
 ] 

Jerome Prinet commented on LUCENE-10195:


First, thanks David for sharing some context, it is definitely helpful.

In regards with cache, you can rely only on local cache (in my case went up to 
28Mo), which does not prevent the build from being run/tested on different 
systems. You can even wipe this periodically to keep it minimal.

About Gradle configuration, I'd rather have the local related settings in 
_~/.gradle/gradle.properties_ which takes precedence over the project's 
_gradle.properties_ (see 
[here|[https://docs.gradle.org/current/userguide/build_environment.html#sec:gradle_configuration_properties])]

My bad for the files, I didn't get the point. I was probably surprised by 
having them colocated with some Java source files.  Anyway, you're right, any 
IO bound operation is most likely not to give you real benefit when cached.

Thanks for reviewing!

 

 

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434347#comment-17434347
 ] 

Uwe Schindler commented on LUCENE-10195:


{quote}
I understand you're advocating for gradle cache and I think it's great but I 
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a bunch of 
corporate CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused 
much. And we do want folks
to run stuff in their own environments because this is a good regression test 
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced 
down their throats.
{quote}

I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a 
developer I want and expect to take the build longer if I run "gradlew clean". 
I want "gradlew clean" to forget the build and then compile everything again 
and especially, I want the build to rerun all checks and tests

As provider of the Lucene build servers, every run should do all build steps 
again, because we want to test out JVM problems and this only works if the 
gradle build forgets everything.

I just ping [~rcmuir], because he also has a strong opinion on that.

So in short. Some of the changes in the PR looks fine, but everything that 
caches stuff on my local disk and serializes test results and checks thats 
should be avoided sorry!

Thanks for including my opinion.

Uwe

P.S.: IMHO the Gradle build cache is a feature for streamlined projects with 
zillions of build servers to spare CPU resources mabye in organizational 
environments where the business logic is important. But For Lucene, if we have 
a zillion of build servers we want all of them redundantly run tests to find 
bugs in the JVMs. This is why we run the tests with different settings 
(compressed ops on/off, different Garbage collectors). That's what Policeman's 
Jenkins server is behind: Find bugs in Garbage collectors and different JVM 
version by running the build suite and tests 24/7. Also [~mikemccand] does this 
every night to monitor performance. Everything caching results would be a 
desaster!

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434348#comment-17434348
 ] 

Uwe Schindler edited comment on LUCENE-10195 at 10/26/21, 1:07 PM:
---

{quote}
If you run the build regularily switching VMs then background gradle daemons 
eat up all your memory.

So no, I don't think it's too short.
{quote}

+1. On our build servers we disable the Gradle Daemon completely. I switch it 
on locally when I do quick incremental builds (try-and-error). But for running 
the whole build for releases or debugging Gradle-shitbugs, it s also off 
locally.


was (Author: thetaphi):
bq. If you run the build regularily switching VMs then background gradle 
daemons eat up all your memory.
So no, I don't think it's too short.

+1. On our build servers we disable the Gradle Daemon completely. I switch it 
on locally when I do quick incremental builds (try-and-error). But for running 
the whole build for releases or debugging Gradle-shitbugs, it s also off 
locally.

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434348#comment-17434348
 ] 

Uwe Schindler commented on LUCENE-10195:


bq. If you run the build regularily switching VMs then background gradle 
daemons eat up all your memory.
So no, I don't think it's too short.

+1. On our build servers we disable the Gradle Daemon completely. I switch it 
on locally when I do quick incremental builds (try-and-error). But for running 
the whole build for releases or debugging Gradle-shitbugs, it s also off 
locally.

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434347#comment-17434347
 ] 

Uwe Schindler edited comment on LUCENE-10195 at 10/26/21, 1:09 PM:
---

{quote}
I understand you're advocating for gradle cache and I think it's great but I 
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a bunch of 
corporate CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused 
much. And we do want folks
to run stuff in their own environments because this is a good regression test 
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced 
down their throats.
{quote}

I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a 
developer I want and expect to take the build longer if I run "gradlew clean". 
I want "gradlew clean" to forget the build and then compile everything again 
and especially, I want the build to rerun all checks and tests

As provider of the Lucene build servers, every run should do all build steps 
again, because we want to test out JVM problems and this only works if the 
gradle build forgets everything.

I just ping [~rcmuir], because he also has a strong opinion on that.

So in short. Some of the changes in the PR looks fine, but everything that 
caches stuff on my local disk and serializes test results and checks thats 
should be avoided sorry!

Thanks for including my opinion.

Uwe

P.S.: IMHO the Gradle build cache is a feature for streamlined projects with 
zillions of build servers to spare CPU resources mabye in organizational 
environments where the business logic is important. But For Lucene, if we have 
a zillion of build servers we want all of them redundantly run tests to find 
bugs in the JVMs. This is why we run the tests with different settings 
(compressed ops on/off, different Garbage collectors). That's what Policeman's 
Jenkins server is behind: Find bugs in Garbage collectors and different JVM 
version by running the build suite and tests 24/7. Also [~mikemccand] does this 
every night to monitor performance. Everything caching results would be a 
desaster!

If the build cache helps local developers, ok -- but more important is to 
configure Input/Outputs correctly. I have a local machine with one operating 
system and dont need to cache results several days. It's only working myself.


was (Author: thetaphi):
{quote}
I understand you're advocating for gradle cache and I think it's great but I 
don't think it should be the
default setting - sorry, this is my honest opinion. Unless you have a bunch of 
corporate CI servers it'll only pollute
your home with a gazillion of megabytes of data that will simply not be reused 
much. And we do want folks
to run stuff in their own environments because this is a good regression test 
(different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced 
down their throats.
{quote}

I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a 
developer I want and expect to take the build longer if I run "gradlew clean". 
I want "gradlew clean" to forget the build and then compile everything again 
and especially, I want the build to rerun all checks and tests

As provider of the Lucene build servers, every run should do all build steps 
again, because we want to test out JVM problems and this only works if the 
gradle build forgets everything.

I just ping [~rcmuir], because he also has a strong opinion on that.

So in short. Some of the changes in the PR looks fine, but everything that 
caches stuff on my local disk and serializes test results and checks thats 
should be avoided sorry!

Thanks for including my opinion.

Uwe

P.S.: IMHO the Gradle build cache is a feature for streamlined projects with 
zillions of build servers to spare CPU resources mabye in organizational 
environments where the business logic is important. But For Lucene, if we have 
a zillion of build servers we want all of them redundantly run tests to find 
bugs in the JVMs. This is why we run the tests with different settings 
(compressed ops on/off, different Garbage collectors). That's what Policeman's 
Jenkins server is behind: Find bugs in Garbage collectors and different JVM 
version by running the build suite and tests 24/7. Also [~mikemccand] does this 
every night to monitor performance. Everything caching results would be a 
desaster!

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434357#comment-17434357
 ] 

Dawid Weiss commented on LUCENE-10195:
--

I will review the patch one-by-one (thanks for splitting the commits, it 
helps). Please give me some time as my schedule is pretty intense. It'd be 
awesome if you guys at gradle could take a closer look at some of the issues I 
outlined in my e-mail on the dev list [1] - don't know if you saw it. These are 
*hard* and require deeper knowledge of gradle internals (not to mention the 
will to perhaps change the implementation here or there).

[1] https://markmail.org/message/vjpfc2jwocroz7nd


> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434362#comment-17434362
 ] 

Dawid Weiss commented on LUCENE-10195:
--

bq. I'd rather have the local related settings in ~/.gradle/gradle.properties 
which takes precedence over the project's gradle.properties

The thing is: these are global. And many of these settings are 
project-specific. What works in one project wouldn't work in another. I found 
it irritating that something so easily solved in ant (include defaults, then 
local user properties from the project) is so difficult in gradle. I know I'm 
in no position to suggest anything but I would love to see a way of 
bootstraping with more than one project-local gradle*properties... or to have a 
way to compute some of the properties dynamically (so that machine settings can 
be fined-tuned to).

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434365#comment-17434365
 ] 

Robert Muir commented on LUCENE-10195:
--

I agree with Dawid and Uwe.

I will just add one thing: I do see one potential use-case for enabling the 
cache: regenerating those enormous jflex DFAs (from 'regenerate'). 
This seems contained enough, that we could possibly make it work efficiently 
and have all the inputs and outputs correct?
This really is a case similar to javac, we are using a third party tool (jflex) 
to translate input grammar into .java output.
The end result is actually quite small (e.g. 2MB result), but it requires 
gigabytes of memory and many minutes.
Dawid has stuff in the build to "control" this already, so that the build fails 
if someone tries to edit a generated file directly.

But even so, I am wary of the current build cache. It doesn't allow me to 
easily bound the size: https://github.com/gradle/gradle/issues/3346
Will the cache behave correctly when it runs out of disk space?
I would be happy to just configure a 10MB fixed loopback mount for this cache 
as a workaround, so that I generate the jflex DFA less often :)

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene

2021-10-26 Thread Cameron VandenBerg (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434372#comment-17434372
 ] 

Cameron VandenBerg commented on LUCENE-10157:
-

Hi [~cpoerschke]!  Thanks for your response!  I would be happy to create a pull 
request, and I will make sure to add tests for the new queries.

> Add Additional Indri Search Engine Functionality to Lucene
> --
>
> Key: LUCENE-10157
> URL: https://issues.apache.org/jira/browse/LUCENE-10157
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser, core/search
>Reporter: Cameron VandenBerg
>Priority: Major
> Attachments: LUCENE-10157.patch
>
>
> In Jira issue LUCENE-9537, basic functionality from the Indri search engine 
> ([http://lemurproject.org/indri.php]) was added to Lucene.  With that 
> functionality in place, we would love to build upon that to add additional 
> Indri queries and an Indri query parser to Lucene to broaden the Indri 
> functionality within Lucene.  In this patch, I have added the Indri NOT, the 
> INDRI OR, and the Indri WeightedSum functionality.  I have also included an 
> IndriQueryParser for accessing this functionality.  More information on these 
> query operators can be seen here: 
> [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: 
> [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/]
>  
> I would be very excited to work with the Lucene community again to try to add 
> this functionality.  I am open to suggestions, and I am happy to make any 
> changes that might be suggested.  Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434375#comment-17434375
 ] 

Dawid Weiss commented on LUCENE-10195:
--

bq. so that I generate the jflex DFA less often 

The build cache would have to fetch this from an external server so you'd need 
a network connection then. Besides - what causes it to be rerun? It should be 
skipped in the current build (unless you're really forcing it to run)?

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10204) Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery)

2021-10-26 Thread Greg Miller (Jira)

Greg Miller created LUCENE-10204:


 Summary: Support iteration of sub-matches in join queries 
(ToParentBlockJoinQuery / ToChildBlockJoinQuery)
 Key: LUCENE-10204
 URL: https://issues.apache.org/jira/browse/LUCENE-10204
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Greg Miller


It would be nice to be able to iterate over the "sub-matches" in these join 
queries for the purpose of faceting (or possibly other use-cases?).

For example, we have a use-case where our query matches on "child" docs, using 
a {{ToParentBlockJoinQuery}} to "emit" the associated parents, which are 
ultimately added to our match set. But, we want to iterate over the matching 
"children" for the purpose of faceting.

To make it concrete, consider searching over a product catalog where "offers" 
and "items" are indexed side-by-side, with the offers being represented as 
"children" of the parent items. An offer contains information like "condition" 
(new vs. used), selling price, etc. for the parent item. If we want to facet on 
"condition", we want to observe all children that matched the query to know if 
the parent item had a "new" or "used" offer (or both). This requires iterating 
over the child matches when faceting, which we cannot do today since the child 
hit information isn't retained anywhere.

We can support this by "caching" the child hits in a bitset but there is some 
complexity when multiple join queries appear in a query structure (would need 
to logically combine various "cached" bitsets using the same boolean operations 
as in the original query structure).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434379#comment-17434379
 ] 

Jerome Prinet commented on LUCENE-10195:


{quote}I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY 
DEFAULT. As a developer I want and expect to take the build longer if I run 
"gradlew clean". I want "gradlew clean" to forget the build and then compile 
everything again and especially, I want the build to rerun all checks and 
tests
{quote}
Regarding tests, they will be rerun by default with tests.neverUpToDate flag. 
You might be interested by the [--rerun-tasks 
option|https://docs.gradle.org/current/userguide/command_line_interface.html#sec:rerun_tasks]
 which allow to ignore up-to-date checks.
{quote}P.S.: IMHO the Gradle build cache is a feature for streamlined projects 
with zillions of build servers to spare CPU resources mabye in organizational 
environments where the business logic is important. 
{quote}
We can differentiate between local cache and remote cache, this PR was not 
triggering any remote cache inclusion.
{quote}If the build cache helps local developers, ok – but more important is to 
configure Input/Outputs correctly. I have a local machine with one operating 
system and dont need to cache results several days. It's only working myself.
{quote}
Yep this is the tricky part, configuring inputs and outputs accurately, but 
once you get there, it can be super interesting to not recompute something 
which was already computed. This comes with a price obviously, disk space taken 
but again can be super beneficial in some cases.
{quote} It'd be awesome if you guys at gradle could take a closer look at some 
of the issues I outlined in my e-mail on the dev list [1]
{quote}
I will definitely relay that internally
{quote}Will the cache behave correctly when it runs out of disk space?
{quote}
probably not
{quote}I would be happy to just configure a 10MB fixed loopback mount for this 
cache as a workaround, so that I generate the jflex DFA less often
{quote}
There is no way to do that out of the box

 

 

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434382#comment-17434382
 ] 

Robert Muir commented on LUCENE-10195:
--

Sorry, maybe i wasn't clear. It is my understanding, that by default, it could 
cache 2MB in the local cache and it would persist across "gradle clean".

And yes, I know the large DFA task is skipped by default, but I imagined this 
would make it much less annoying, and we could potentially enable it? Sure, it 
doesn't fix the real minimization issue that causes it to take 20 minutes cpu + 
10GB ram, but it would reduce the pain.

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10205) Should Packed64 use a byte[] plus VarHandles?

2021-10-26 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-10205:
-

 Summary: Should Packed64 use a byte[] plus VarHandles?
 Key: LUCENE-10205
 URL: https://issues.apache.org/jira/browse/LUCENE-10205
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


By being backed by a long[], Packed64 often has to merge bits coming from two 
different longs. If it was backed by a byte[], it could always read a single 
long, which would help remove conditionals?

The main downside is that we'd need paging to support high value counts with 
high numbers of bits (when value_count * bits_per_value / 8 > 
ArrayUtil.MAX_ARRAY_LENGTH).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434390#comment-17434390
 ] 

Robert Muir commented on LUCENE-10195:
--

{quote}
> I would be happy to just configure a 10MB fixed loopback mount for this cache 
> as a workaround, so that I generate the jflex DFA less often
There is no way to do that out of the box
{quote}

No, i mean I would do it myself, and configure {{/home/rmuir/.gradle/caches/}} 
in {{/etc/fstab}} to be 10MB. So it would run gradle out of disk space if it 
tried to write any more than that.

I really don't want size-unbounded caches storing stuff or trashing my SSD. I 
keep all my caches on a short leash, it is pretty easy since most apps behave 
and store stuff under {{~/.cache}}. So I already mount this as tmpfs with a 
size limit. And I pass flags such as {{chromium --disk-cache-size}} when apps 
have a way to explicitly bound the size.

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434396#comment-17434396
 ] 

Jerome Prinet commented on LUCENE-10195:


Just to clarify, you can limit the TTL in cache, but not the whole cache size

[https://docs.gradle.org/current/userguide/build_cache.html#sec:build_cache_configure]

 

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10206) Implement O(1) count on query cache

2021-10-26 Thread Nik Everett (Jira)

Nik Everett created LUCENE-10206:


 Summary: Implement O(1) count on query cache
 Key: LUCENE-10206
 URL: https://issues.apache.org/jira/browse/LUCENE-10206
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Nik Everett


I'd like to implement the `Weight#count` method in `LRUQueryCache` so cached 
queries can quickly return their counts. We already have a count on all of the 
bit sets we use for the query cache we just have to store it and "plug it in".

 

I got here because we frequently end up wanting to get counts and I saw hot 
`RoaringDocIdSet`'s iterator hot spotting. I don't think it's slow or anything, 
but when the collector is just `count++` the iterator is substantial. It seems 
like we could frequently avoid the whole thing by implementing `count` in the 
query cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434401#comment-17434401
 ] 

Dawid Weiss commented on LUCENE-10195:
--

{quote}You might be interested by the [--rerun-tasks 
option|https://docs.gradle.org/current/userguide/command_line_interface.html#sec:rerun_tasks]
 which allow to ignore up-to-date checks.
{quote}
This reruns all tasks in the graph which is more of a pain than help (in 
majority of cases :)). To me the absolutely best feature of gradle lies in 
incremental tasks. When things are configured correctly this means 
incremental-check subsystem pretty much takes care of itself. I almost never 
have the need to run a full 'clean'.
{quote}Sorry, maybe i wasn't clear. It is my understanding, that by default, it 
could cache 2MB in the local cache and it would persist across "gradle clean".
{quote}
It would still try to run this task on the first run when the input/output 
information isn't locally available (assuming no external cache is provided). 
This means it'd run at least once. To me this is a no-go. I really wish there 
was a mechanism for somehow persisting the state of up-to-date checks but there 
isn't.

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] nik9000 opened a new pull request #415: LUCENE-10206 Implement O(1) count on query cache

nik9000 opened a new pull request #415:
URL: https://github.com/apache/lucene/pull/415

# Description

When we load a query into the query cache we always calculate the count
of matching documents. This uses that count to power the new `O(1)`
`Weight#count` method.

# Solution

I've tried a bunch of approaches but settled on opening this PR with the
simplest one - add a new class that keeps the BitSet and the count. I'm not
particularly tied to it other than that it is fairly simple.

I am assuming it's right to try and implement `count` here rather than do
something else. It feels like that method was made for situations like this
though.

# Tests

I've added some to LRUCache's unit test. I haven't done any performance
testing here. A little because "it's obvious that returning a number is faster
than counting stuff". Nanoseconds vs microseconds. But I'd love to do more with
this if folks want me to.

# Checklist

Please review the following and check all that apply:

- [x] I have reviewed the guidelines for [How to
Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code
conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request
title.
- [ ] I have given Lucene maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended)
- [ ] I have developed this patch against the `main` branch.
- [x] I have run `./gradlew check`.
- [x] I have added tests for my changes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache



nik9000 commented on pull request #415:
URL: https://github.com/apache/lucene/pull/415#issuecomment-952025788


   > 
   
   I had an out of date `main`. I'll update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache



nik9000 commented on pull request #415:
URL: https://github.com/apache/lucene/pull/415#issuecomment-952040157


   > I had an out of date `main`. I'll update.
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache



jpountz commented on a change in pull request #415:
URL: https://github.com/apache/lucene/pull/415#discussion_r736682046



##
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##
@@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws 
IOException {
 reader.close();
 dir.close();
   }
+
+  public void testCacheHasFastCount() throws IOException {
+Query query = new PhraseQuery("words", new BytesRef("alice"), new 
BytesRef("ran"));
+
+Directory dir = newDirectory();
+RandomIndexWriter w =
+new RandomIndexWriter(
+random(), dir, 
newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE));
+Document doc1 = new Document();
+doc1.add(new TextField("words", "tom ran", Store.NO));
+Document doc2 = new Document();
+doc2.add(new TextField("words", "alice ran", Store.NO));
+doc2.add(new StringField("f", "a", Store.NO));
+Document doc3 = new Document();
+doc3.add(new TextField("words", "alice ran", Store.NO));
+doc3.add(new StringField("f", "b", Store.NO));
+w.addDocuments(Arrays.asList(doc1, doc2, doc3));
+
+try (IndexReader reader = w.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);
+  searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+  LRUQueryCache allCache =
+  new LRUQueryCache(100, 1000, context -> true, 
Float.POSITIVE_INFINITY);
+  searcher.setQueryCache(allCache);
+  Weight weight = searcher.createWeight(query, 
ScoreMode.COMPLETE_NO_SCORES, 1);
+  LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+  // We don't have a fast count before the cache is filled
+  assertEquals(weight.count(context), -1);
+  // Fetch the scorer to populate the cache
+  weight.scorer(context);
+  assertEquals(List.of(query), allCache.cachedQueries());
+  // Now we *do* have a fast count
+  assertEquals(weight.count(context), 2);
+}
+
+w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b";
+try (IndexReader reader = w.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);
+  searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+  LRUQueryCache allCache =
+  new LRUQueryCache(100, 1000, context -> true, 
Float.POSITIVE_INFINITY);
+  searcher.setQueryCache(allCache);
+  Weight weight = searcher.createWeight(query, 
ScoreMode.COMPLETE_NO_SCORES, 1);
+  LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+  // We don't have a fast count before the cache is filled
+  assertEquals(weight.count(context), -1);
+  // Fetch the scorer to populate the cache
+  weight.scorer(context);
+  assertEquals(List.of(query), allCache.cachedQueries());
+  // We still don't have a fast count because we have deleted documents
+  assertEquals(weight.count(context), -1);

Review comment:
   nit: assertEquals expects the expected value first, so that we get 
better error messages in case of failure

##
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##
@@ -1088,7 +1089,19 @@ public void testRandom() throws IOException {
 cachedSearcher.setQueryCachingPolicy(ALWAYS_CACHE);
   }
   final Query q = buildRandomQuery(0);
+  /*
+   * Counts are the same. If the query has already been cached
+   * this'll use the O(1) Weight#count method.
+   */
   assertEquals(uncachedSearcher.count(q), cachedSearcher.count(q));
+  /*
+   * Just to make sure we can iterate every time also check that the
+   * same docs are returned in the same order.
+   */
+  int size = 1 + random().nextInt(1000);
+  assertArrayEquals(
+  Arrays.stream(uncachedSearcher.search(q, size).scoreDocs).mapToInt(d 
-> d.doc).toArray(),
+  Arrays.stream(cachedSearcher.search(q, size).scoreDocs).mapToInt(d 
-> d.doc).toArray());

Review comment:
   you might want to use `CheckHits#checkEqual`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] nik9000 commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache



nik9000 commented on a change in pull request #415:
URL: https://github.com/apache/lucene/pull/415#discussion_r736693202



##
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##
@@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws 
IOException {
 reader.close();
 dir.close();
   }
+
+  public void testCacheHasFastCount() throws IOException {
+Query query = new PhraseQuery("words", new BytesRef("alice"), new 
BytesRef("ran"));
+
+Directory dir = newDirectory();
+RandomIndexWriter w =
+new RandomIndexWriter(
+random(), dir, 
newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE));
+Document doc1 = new Document();
+doc1.add(new TextField("words", "tom ran", Store.NO));
+Document doc2 = new Document();
+doc2.add(new TextField("words", "alice ran", Store.NO));
+doc2.add(new StringField("f", "a", Store.NO));
+Document doc3 = new Document();
+doc3.add(new TextField("words", "alice ran", Store.NO));
+doc3.add(new StringField("f", "b", Store.NO));
+w.addDocuments(Arrays.asList(doc1, doc2, doc3));
+
+try (IndexReader reader = w.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);
+  searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+  LRUQueryCache allCache =
+  new LRUQueryCache(100, 1000, context -> true, 
Float.POSITIVE_INFINITY);
+  searcher.setQueryCache(allCache);
+  Weight weight = searcher.createWeight(query, 
ScoreMode.COMPLETE_NO_SCORES, 1);
+  LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+  // We don't have a fast count before the cache is filled
+  assertEquals(weight.count(context), -1);
+  // Fetch the scorer to populate the cache
+  weight.scorer(context);
+  assertEquals(List.of(query), allCache.cachedQueries());
+  // Now we *do* have a fast count
+  assertEquals(weight.count(context), 2);
+}
+
+w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b";
+try (IndexReader reader = w.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);
+  searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+  LRUQueryCache allCache =
+  new LRUQueryCache(100, 1000, context -> true, 
Float.POSITIVE_INFINITY);
+  searcher.setQueryCache(allCache);
+  Weight weight = searcher.createWeight(query, 
ScoreMode.COMPLETE_NO_SCORES, 1);
+  LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+  // We don't have a fast count before the cache is filled
+  assertEquals(weight.count(context), -1);
+  // Fetch the scorer to populate the cache
+  weight.scorer(context);
+  assertEquals(List.of(query), allCache.cachedQueries());
+  // We still don't have a fast count because we have deleted documents
+  assertEquals(weight.count(context), -1);

Review comment:
   Bah! I leave this same comment on other people's code. And yet I make 
the same mistake. Will fix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434471#comment-17434471
 ] 

Jerome Prinet commented on LUCENE-10195:


{quote}The thing is: these are global. And many of these settings are 
project-specific. What works in one project wouldn't work in another. I found 
it irritating that something so easily solved in ant (include defaults, then 
local user properties from the project) is so difficult in gradle. I know I'm 
in no position to suggest anything but I would love to see a way of 
bootstraping with more than one project-local gradle*properties... or to have a 
way to compute some of the properties dynamically (so that machine settings can 
be fined-tuned to).
{quote}
You might want to give a look at init scripts which allow to add some 
conditional logic

[https://docs.gradle.org/current/userguide/init_scripts.html]

 

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434472#comment-17434472
 ] 

Jerome Prinet commented on LUCENE-10195:


support for project-controlled up-to-date mechanism (versioned artifact 
checksums) that is not bypassed on the first run. This is needed for generated 
resources - I didn't figure out how to do it other than with ugly hacks.

Did you explore 
[upToDateWhen()|https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/TaskOutputs.html]
 method? 

 

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434472#comment-17434472
 ] 

Jerome Prinet edited comment on LUCENE-10195 at 10/26/21, 5:06 PM:
---

{quote}support for project-controlled up-to-date mechanism (versioned artifact 
checksums) that is not bypassed on the first run. This is needed for generated 
resources - I didn't figure out how to do it other than with ugly hacks.
{quote}
Did you explore 
[upToDateWhen()|https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/TaskOutputs.html]
 method? 

 


was (Author: jeromep):
support for project-controlled up-to-date mechanism (versioned artifact 
checksums) that is not bypassed on the first run. This is needed for generated 
resources - I didn't figure out how to do it other than with ugly hacks.

Did you explore 
[upToDateWhen()|https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/TaskOutputs.html]
 method? 

 

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10195) Gradle build speed improvement



[ 
https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434473#comment-17434473
 ] 

Jerome Prinet commented on LUCENE-10195:


{quote}test runner test case ordering optimization (load balancing between 
worker JVMs); currently the long-tail test case can slow down builds 
significantly.
{quote}
[Test 
distribution|https://docs.gradle.com/enterprise/test-distribution-gradle-plugin/]
 might be Gradle way to tackle that

> Gradle build speed improvement
> --
>
> Key: LUCENE-10195
> URL: https://issues.apache.org/jira/browse/LUCENE-10195
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jerome Prinet
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Increase Gradle build speed with help of Gradle built-in features, mostly 
> cache and up-to-date checks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] nik9000 commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache



nik9000 commented on a change in pull request #415:
URL: https://github.com/apache/lucene/pull/415#discussion_r736786185



##
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##
@@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws 
IOException {
 reader.close();
 dir.close();
   }
+
+  public void testCacheHasFastCount() throws IOException {
+Query query = new PhraseQuery("words", new BytesRef("alice"), new 
BytesRef("ran"));
+
+Directory dir = newDirectory();
+RandomIndexWriter w =
+new RandomIndexWriter(
+random(), dir, 
newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE));
+Document doc1 = new Document();
+doc1.add(new TextField("words", "tom ran", Store.NO));
+Document doc2 = new Document();
+doc2.add(new TextField("words", "alice ran", Store.NO));
+doc2.add(new StringField("f", "a", Store.NO));
+Document doc3 = new Document();
+doc3.add(new TextField("words", "alice ran", Store.NO));
+doc3.add(new StringField("f", "b", Store.NO));
+w.addDocuments(Arrays.asList(doc1, doc2, doc3));
+
+try (IndexReader reader = w.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);
+  searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+  LRUQueryCache allCache =
+  new LRUQueryCache(100, 1000, context -> true, 
Float.POSITIVE_INFINITY);
+  searcher.setQueryCache(allCache);
+  Weight weight = searcher.createWeight(query, 
ScoreMode.COMPLETE_NO_SCORES, 1);
+  LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+  // We don't have a fast count before the cache is filled
+  assertEquals(weight.count(context), -1);
+  // Fetch the scorer to populate the cache
+  weight.scorer(context);
+  assertEquals(List.of(query), allCache.cachedQueries());
+  // Now we *do* have a fast count
+  assertEquals(weight.count(context), 2);
+}
+
+w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b";
+try (IndexReader reader = w.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);
+  searcher.setQueryCachingPolicy(ALWAYS_CACHE);
+  LRUQueryCache allCache =
+  new LRUQueryCache(100, 1000, context -> true, 
Float.POSITIVE_INFINITY);
+  searcher.setQueryCache(allCache);
+  Weight weight = searcher.createWeight(query, 
ScoreMode.COMPLETE_NO_SCORES, 1);
+  LeafReaderContext context = getOnlyLeafReader(reader).getContext();
+  // We don't have a fast count before the cache is filled
+  assertEquals(weight.count(context), -1);
+  // Fetch the scorer to populate the cache
+  weight.scorer(context);
+  assertEquals(List.of(query), allCache.cachedQueries());
+  // We still don't have a fast count because we have deleted documents
+  assertEquals(weight.count(context), -1);

Review comment:
   Done.

##
File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java
##
@@ -1088,7 +1089,19 @@ public void testRandom() throws IOException {
 cachedSearcher.setQueryCachingPolicy(ALWAYS_CACHE);
   }
   final Query q = buildRandomQuery(0);
+  /*
+   * Counts are the same. If the query has already been cached
+   * this'll use the O(1) Weight#count method.
+   */
   assertEquals(uncachedSearcher.count(q), cachedSearcher.count(q));
+  /*
+   * Just to make sure we can iterate every time also check that the
+   * same docs are returned in the same order.
+   */
+  int size = 1 + random().nextInt(1000);
+  assertArrayEquals(
+  Arrays.stream(uncachedSearcher.search(q, size).scoreDocs).mapToInt(d 
-> d.doc).toArray(),
+  Arrays.stream(cachedSearcher.search(q, size).scoreDocs).mapToInt(d 
-> d.doc).toArray());

Review comment:
   Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] chatman opened a new pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



chatman opened a new pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594


   A new quickstart guide that can potentially replace (or live side by side 
with) the Solr tutorial.
   This is WIP at the moment, but would appreciate early feedback and thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



epugh commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736805151



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \

Review comment:
   A nit pick is that the collection is tech productions, and we have 
books.Maybe we should think (separately) renaming `techproducts` to 
just `products`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



epugh commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736806111



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }'
+
+
+Multiple documents can be indexed in the same request:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  [
+  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }
+,
+  {
+"id" : "978-1423103349",
+"cat" : ["book","paperback"],
+"name" : "The Sea of Monsters",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 2,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 6.49,
+"pages_i" : 304
+  }
+]'
+
+
+A file containing the documents can be indexed as follows:
+[source,subs="verbatim,attributes+"]
+
+$ curl -X POST -d @example/exampledocs/books.json 
http://localhost:8983/api/collections/techproducts/update
+
+
+== Commit
+After documents are indexed into a collection, they are not immediately 
available for searching. In order to have them searchable, a commit operation 
(also called `refresh` in other search engines like OpenSearch etc.) is needed. 
Commits can be scheduled at periodic intervals using auto-commits as follows.

Review comment:
   i don't know if introducing terms used by other search engines is 
useful...   though maybe we want to build up a gloassary that would list 
"equivalent" terms from other engines?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail:

[GitHub] [lucene-solr] chatman commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



chatman commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736806880



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \

Review comment:
   Yes, good idea. I just took those docs off the Solr tutorial (which 
indexes books into techproducts). But, clearly, it is time for a better example.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] chatman commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



chatman commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736807226



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }'
+
+
+Multiple documents can be indexed in the same request:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  [
+  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }
+,
+  {
+"id" : "978-1423103349",
+"cat" : ["book","paperback"],
+"name" : "The Sea of Monsters",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 2,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 6.49,
+"pages_i" : 304
+  }
+]'
+
+
+A file containing the documents can be indexed as follows:
+[source,subs="verbatim,attributes+"]
+
+$ curl -X POST -d @example/exampledocs/books.json 
http://localhost:8983/api/collections/techproducts/update
+
+
+== Commit
+After documents are indexed into a collection, they are not immediately 
available for searching. In order to have them searchable, a commit operation 
(also called `refresh` in other search engines like OpenSearch etc.) is needed. 
Commits can be scheduled at periodic intervals using auto-commits as follows.

Review comment:
   A glossary sounds like a very good idea, for people coming from 
different systems.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] chatman commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



chatman commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736808710



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }'
+
+
+Multiple documents can be indexed in the same request:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  [
+  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }
+,
+  {
+"id" : "978-1423103349",
+"cat" : ["book","paperback"],
+"name" : "The Sea of Monsters",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 2,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 6.49,
+"pages_i" : 304
+  }
+]'
+
+
+A file containing the documents can be indexed as follows:
+[source,subs="verbatim,attributes+"]
+
+$ curl -X POST -d @example/exampledocs/books.json 
http://localhost:8983/api/collections/techproducts/update
+
+
+== Commit
+After documents are indexed into a collection, they are not immediately 
available for searching. In order to have them searchable, a commit operation 
(also called `refresh` in other search engines like OpenSearch etc.) is needed. 
Commits can be scheduled at periodic intervals using auto-commits as follows.

Review comment:
   > i don't know if introducing terms used by other search engines is 
useful
   I feel that those coming from ES / OpenSearch backgrounds might be able to 
relate better. My main motivation with this document is to cut down on 
paragraphs of text and have more copy-paste-able snippets, esp. using JSON/V2 
apis, to make Solr more appealing to those who find ES easy to use (mainly due 
to their superior beginner documentation).

##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  S

[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



epugh commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736816462



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }'
+
+
+Multiple documents can be indexed in the same request:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  [
+  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }
+,
+  {
+"id" : "978-1423103349",
+"cat" : ["book","paperback"],
+"name" : "The Sea of Monsters",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 2,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 6.49,
+"pages_i" : 304
+  }
+]'
+
+
+A file containing the documents can be indexed as follows:
+[source,subs="verbatim,attributes+"]
+
+$ curl -X POST -d @example/exampledocs/books.json 
http://localhost:8983/api/collections/techproducts/update
+
+
+== Commit
+After documents are indexed into a collection, they are not immediately 
available for searching. In order to have them searchable, a commit operation 
(also called `refresh` in other search engines like OpenSearch etc.) is needed. 
Commits can be scheduled at periodic intervals using auto-commits as follows.

Review comment:
   That makes sense...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide



epugh commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736816706



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }'
+
+
+Multiple documents can be indexed in the same request:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  [
+  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }
+,
+  {
+"id" : "978-1423103349",
+"cat" : ["book","paperback"],
+"name" : "The Sea of Monsters",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 2,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 6.49,
+"pages_i" : 304
+  }
+]'
+
+
+A file containing the documents can be indexed as follows:
+[source,subs="verbatim,attributes+"]
+
+$ curl -X POST -d @example/exampledocs/books.json 
http://localhost:8983/api/collections/techproducts/update
+
+
+== Commit
+After documents are indexed into a collection, they are not immediately 
available for searching. In order to have them searchable, a commit operation 
(also called `refresh` in other search engines like OpenSearch etc.) is needed. 
Commits can be scheduled at periodic intervals using auto-commits as follows.

Review comment:
   "solr for ES/OS refugees"
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] magibney commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide

2021-10-26 Thread ASF subversion and git services (Jira)



magibney commented on a change in pull request #2594:
URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736836224



##
File path: solr/solr-ref-guide/src/quickstart.adoc
##
@@ -0,0 +1,140 @@
+= Quickstart Guide
+:experimental:
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+Here's a quickstart guide to start Solr, add some documents and perform some 
searches.
+
+== Starting Solr
+
+Start a Solr node in cluster mode (SolrCloud mode)
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c
+
+Waiting up to 180 seconds to see Solr running on port 8983 [\]
+Started Solr server on port 8983 (pid=34942). Happy searching!
+
+
+To start another Solr node and have it join the cluster alongside the first 
node,
+
+[source,subs="verbatim,attributes+"]
+
+$ bin/solr -c -z localhost:9983 -p 8984
+
+
+An instance of the cluster coordination service, i.e. Zookeeper, was started 
on port 9983 when the first node was started. To start Zookeeper separately, 
please refer to .
+
+== Creating a collection
+
+Like a database system holds data in tables, Solr holds data in collections. A 
collection can be created as follows:
+
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url http://localhost:8983/api/collections \
+  --header 'Content-Type: application/json' \
+  --data '{
+   "create": {
+   "name": "techproducts",
+   "numShards": 1,
+   "replicationFactor": 1
+   }
+}'
+
+
+== Indexing documents
+
+A single document can be indexed as:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }'
+
+
+Multiple documents can be indexed in the same request:
+[source,subs="verbatim,attributes+"]
+
+$ curl --request POST \
+  --url 'http://localhost:8983/api/collections/techproducts/update' \
+  --header 'Content-Type: application/json' \
+  --data '  [
+  {
+"id" : "978-0641723445",
+"cat" : ["book","hardcover"],
+"name" : "The Lightning Thief",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 1,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 12.50,
+"pages_i" : 384
+  }
+,
+  {
+"id" : "978-1423103349",
+"cat" : ["book","paperback"],
+"name" : "The Sea of Monsters",
+"author" : "Rick Riordan",
+"series_t" : "Percy Jackson and the Olympians",
+"sequence_i" : 2,
+"genre_s" : "fantasy",
+"inStock" : true,
+"price" : 6.49,
+"pages_i" : 304
+  }
+]'
+
+
+A file containing the documents can be indexed as follows:
+[source,subs="verbatim,attributes+"]
+
+$ curl -X POST -d @example/exampledocs/books.json 
http://localhost:8983/api/collections/techproducts/update
+
+
+== Commit
+After documents are indexed into a collection, they are not immediately 
available for searching. In order to have them searchable, a commit operation 
(also called `refresh` in other search engines like OpenSearch etc.) is needed. 
Commits can be scheduled at periodic intervals using auto-commits as follows.

Review comment:
   Makes sense to me as a point of reference. It might be more economical 
to say "(also called `refresh` in ElasticSearch/OpenSearch)" ... unless there 
are other search engines that refer to this concept as "refresh"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[jira] [Commented] (LUCENE-10163) Review top-level .txt and .md files



[ 
https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434533#comment-17434533
 ] 

ASF subversion and git services commented on LUCENE-10163:
--

Commit 08c03566648c0b024b8160869b3d694c3cebaabd in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=08c0356 ]

LUCENE-10163: clean up and remove some old cruft in readme files. Move binary 
release only README.md to the distribution project so that it doesn't look 
weird in the source tree. (#406)



> Review top-level *.txt and *.md files
> -
>
> Key: LUCENE-10163
> URL: https://issues.apache.org/jira/browse/LUCENE-10163
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Some of them contain obsolete pointers and information 
> (SYSTEM_REQUIREMENTS.md, etc.).
> Also, move the files that are distribution-specific (lucene/README.md) to the 
> distribution project. Otherwise they
> give odd, incorrect information like:
> {code}
> To review the documentation, read the main documentation page, located at: 
> `docs/index.html` 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10163) Review top-level .txt and .md files



 [ 
https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10163.
--
Fix Version/s: main (9.0)
   Resolution: Fixed

> Review top-level *.txt and *.md files
> -
>
> Key: LUCENE-10163
> URL: https://issues.apache.org/jira/browse/LUCENE-10163
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some of them contain obsolete pointers and information 
> (SYSTEM_REQUIREMENTS.md, etc.).
> Also, move the files that are distribution-specific (lucene/README.md) to the 
> distribution project. Otherwise they
> give odd, incorrect information like:
> {code}
> To review the documentation, read the main documentation page, located at: 
> `docs/index.html` 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss merged pull request #406: LUCENE-10163: clean up and remove some old cruft in readme files.



dweiss merged pull request #406:
URL: https://github.com/apache/lucene/pull/406


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss merged pull request #407: LUCENE-10199: drop binary .zip artifact.

2021-10-26 Thread ASF subversion and git services (Jira)



dweiss merged pull request #407:
URL: https://github.com/apache/lucene/pull/407


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10199) Drop ZIP binary distribution from release artifacts



[ 
https://issues.apache.org/jira/browse/LUCENE-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434534#comment-17434534
 ] 

ASF subversion and git services commented on LUCENE-10199:
--

Commit fb6aaa7b2c28749c93553c7ffb7e5f5a372ad9b3 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb6aaa7 ]

LUCENE-10199: drop binary .zip artifact. (#407)



> Drop ZIP binary distribution from release artifacts
> ---
>
> Key: LUCENE-10199
> URL: https://issues.apache.org/jira/browse/LUCENE-10199
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10199) Drop ZIP binary distribution from release artifacts

2021-10-26 Thread ASF subversion and git services (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-10199.
--
Fix Version/s: main (9.0)
   Resolution: Fixed

> Drop ZIP binary distribution from release artifacts
> ---
>
> Key: LUCENE-10199
> URL: https://issues.apache.org/jira/browse/LUCENE-10199
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: main (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)



[ 
https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434541#comment-17434541
 ] 

ASF subversion and git services commented on LUCENE-10198:
--

Commit 4329450392f11303fdd8ed5352d9cfffca8dc8c1 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4329450 ]

LUCENE-10198: remove debug statement that crept in.


> Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack 
> and system proxies)
> ---
>
> Key: LUCENE-10198
> URL: https://issues.apache.org/jira/browse/LUCENE-10198
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery

2021-10-26 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-10207:
-

Summary: Make TermInSetQuery usable with IndexOrDocValuesQuery
Key: LUCENE-10207
URL: https://issues.apache.org/jira/browse/LUCENE-10207
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand

IndexOrDocValuesQuery is very useful to pick the right execution mode for a
query depending on other bits of the query tree.

We would like to be able to use it to optimize execution of TermInSetQuery.
However IndexOrDocValuesQuery only works well if the "index" query can give an
estimation of the cost of the query without doing anything expensive (like
looking up all terms of the TermInSetQuery in the terms dict). Maybe we could
implement it for primary keys (terms.size() == sumDocFreq) by returning the
number of terms of the query? Another idea is to multiply the number of terms
by the average postings length, though this could be dangerous if the field has
a zipfian distribution and some terms have a much higher doc frequency than the
average.

[~romseygeek] and I were discussing this a few weeks ago, and more recently
[~mikemccand] and [~gsmiller] again independently. So it looks like there is
interest in this. Here is an email thread where this was recently discussed:
https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10163) Review top-level .txt and .md files

2021-10-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434546#comment-17434546
 ] 

ASF subversion and git services commented on LUCENE-10163:
--

Commit 1613355149e5fc11d0804b457742f5862e843ae2 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1613355 ]

LUCENE-10163: update smoke tester - README inside lucene/ is no longer there in 
the source release.


> Review top-level *.txt and *.md files
> -
>
> Key: LUCENE-10163
> URL: https://issues.apache.org/jira/browse/LUCENE-10163
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Major
> Fix For: main (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Some of them contain obsolete pointers and information 
> (SYSTEM_REQUIREMENTS.md, etc.).
> Also, move the files that are distribution-specific (lucene/README.md) to the 
> distribution project. Otherwise they
> give odd, incorrect information like:
> {code}
> To review the documentation, read the main documentation page, located at: 
> `docs/index.html` 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on pull request #1676: SOLR-13973: Depricate Tika support in 8.7



epugh commented on pull request #1676:
URL: https://github.com/apache/lucene-solr/pull/1676#issuecomment-952283806


   We should have merged this PR!Oh well...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery



[ 
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434547#comment-17434547
 ] 

Robert Muir commented on LUCENE-10207:
--

we may be able to relax it slightly by computing worst-case cost, something 
like:

{code}
cost = numQueryTerms * (1 + terms.sumDocFreq - terms.size)
{code}

This will overestimate the cost when the field isn't anything like a unique-key 
field, but it will never underestimate it. So it would be always be "safe" to 
use the IndexOrDocValuesQuery.

> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -
>
> Key: LUCENE-10207
> URL: https://issues.apache.org/jira/browse/LUCENE-10207
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a 
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery. 
> However IndexOrDocValuesQuery only works well if the "index" query can give 
> an estimation of the cost of the query without doing anything expensive (like 
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could 
> implement it for primary keys (terms.size() == sumDocFreq) by returning the 
> number of terms of the query? Another idea is to multiply the number of terms 
> by the average postings length, though this could be dangerous if the field 
> has a zipfian distribution and some terms have a much higher doc frequency 
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently 
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is 
> interest in this. Here is an email thread where this was recently discussed: 
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety



apanimesh061 commented on a change in pull request #412:
URL: https://github.com/apache/lucene/pull/412#discussion_r736109899



##
File path: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/TestUnifiedHighlighter.java
##
@@ -460,6 +462,26 @@ public void testBuddhism() throws Exception {
 ir.close();
   }
 
+  public void testUnifiedHighlighterBuilder() throws Exception {

Review comment:
   This is not a real unit test. I only added it to demo that the builder 
can be sub-classed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery



[ 
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434572#comment-17434572
 ] 

Robert Muir commented on LUCENE-10207:
--

I think we can slightly tweak it (still completely safe) by doing:

{code}
cost = numQueryTerms + (terms.sumDocFreq - terms.size)
{code}

Similar to the previous comment, the cost is correct for the unique-key field. 
We assume that we'll match _all_ the "non-unique" postings as well, the 
worst-case. But the overestimation is less aggressive.


> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -
>
> Key: LUCENE-10207
> URL: https://issues.apache.org/jira/browse/LUCENE-10207
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a 
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery. 
> However IndexOrDocValuesQuery only works well if the "index" query can give 
> an estimation of the cost of the query without doing anything expensive (like 
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could 
> implement it for primary keys (terms.size() == sumDocFreq) by returning the 
> number of terms of the query? Another idea is to multiply the number of terms 
> by the average postings length, though this could be dangerous if the field 
> has a zipfian distribution and some terms have a much higher doc frequency 
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently 
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is 
> interest in this. Here is an email thread where this was recently discussed: 
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery



[ 
https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434582#comment-17434582
 ] 

Robert Muir commented on LUCENE-10207:
--

Also there is more we can do to better reflect the costs for these queries, for 
stuff like existing SortedSetDocValuesRangeQuery

I feel like the existing {{matchCost}} is bogusly hardcoded for the "actually 
multivalued case" at:
{code}
@Override
public float matchCost() {
  return 2; // 2 comparisons
}
{code}

But this seems wrong? Matching is a loop. I feel like it should at least try to 
account for the multi-valued loop:
{code}
final float avgDVsPerDoc = terms.sumDocFreq / (float) terms.getDocCount;
...
@Override
public float matchCost() {
  return 2 * avgDVsPerDoc; // 2 comparisons in a loop over ordinals
}
{code}



> Make TermInSetQuery usable with IndexOrDocValuesQuery
> -
>
> Key: LUCENE-10207
> URL: https://issues.apache.org/jira/browse/LUCENE-10207
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> IndexOrDocValuesQuery is very useful to pick the right execution mode for a 
> query depending on other bits of the query tree.
> We would like to be able to use it to optimize execution of TermInSetQuery. 
> However IndexOrDocValuesQuery only works well if the "index" query can give 
> an estimation of the cost of the query without doing anything expensive (like 
> looking up all terms of the TermInSetQuery in the terms dict). Maybe we could 
> implement it for primary keys (terms.size() == sumDocFreq) by returning the 
> number of terms of the query? Another idea is to multiply the number of terms 
> by the average postings length, though this could be dangerous if the field 
> has a zipfian distribution and some terms have a much higher doc frequency 
> than the average.
> [~romseygeek] and I were discussing this a few weeks ago, and more recently 
> [~mikemccand] and [~gsmiller] again independently. So it looks like there is 
> interest in this. Here is an email thread where this was recently discussed: 
> https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2595: LUCENE-10141: Add the next minor version on Lucene's main branch in the split repo so the backcompat_master task works



thelabdude opened a new pull request #2595:
URL: https://github.com/apache/lucene-solr/pull/2595


   I think the reason the `addBackcompatIndexes.py` script failed 
(`backcompat_master` step) when I built 8.10 was the missing Version info for 
8_11, see: https://issues.apache.org/jira/browse/LUCENE-10131
   
   So this PR adds a task to run the `addVersion.py` script for Lucene's main 
branch (in the split-out repo) so that the `backcompat_master` step works later 
in the release process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude commented on pull request #2595: LUCENE-10141: Add the next minor version on Lucene's main branch in the split repo so the backcompat_master task works



thelabdude commented on pull request #2595:
URL: https://github.com/apache/lucene-solr/pull/2595#issuecomment-952381002


   Not sure I have all the git commands right here ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0



jtibshirani commented on a change in pull request #413:
URL: https://github.com/apache/lucene/pull/413#discussion_r737016446



##
File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java
##
@@ -60,7 +60,8 @@ public KnnVectorQuery(String field, float[] target, int k) {
   public Query rewrite(IndexReader reader) throws IOException {
 TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()];
 for (LeafReaderContext ctx : reader.leaves()) {
-  perLeafResults[ctx.ord] = searchLeaf(ctx, Math.min(k, reader.numDocs()));
+  int numDocs = ctx.reader().numDocs();
+  perLeafResults[ctx.ord] = numDocs > 0 ? searchLeaf(ctx, Math.min(k, 
numDocs)) : NO_RESULTS;

Review comment:
   This makes sense to me, I pushed a change. Instead of 
`Lucene90HnswVectorsReader`, I thought it could make sense to apply the bound 
in `HnswGraph`. But this turned out messier because there's separate concepts 
for `topK` and `numSeed` (we're cleaning this up as part of 
[LUCENE-10054](https://issues.apache.org/jira/browse/LUCENE-10054)).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on a change in pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0



msokolov commented on a change in pull request #413:
URL: https://github.com/apache/lucene/pull/413#discussion_r737073221



##
File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java
##
@@ -60,7 +60,8 @@ public KnnVectorQuery(String field, float[] target, int k) {
   public Query rewrite(IndexReader reader) throws IOException {
 TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()];
 for (LeafReaderContext ctx : reader.leaves()) {
-  perLeafResults[ctx.ord] = searchLeaf(ctx, Math.min(k, reader.numDocs()));
+  int numDocs = ctx.reader().numDocs();
+  perLeafResults[ctx.ord] = numDocs > 0 ? searchLeaf(ctx, Math.min(k, 
numDocs)) : NO_RESULTS;

Review comment:
   Thanks for fixing this - it makes sense to me use `size()` instead of 
`numDocs()`, or even simply `k`; I wasn't aware of the costly nature of that 
call. Indeed the idea here was just to avoid spending extra work on tiny 
segments; something I noticed all the time in tests, but which is probably not 
much of an issue in reality.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] noblepaul commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide