[GitHub] [lucene] jpountz commented on a change in pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0
jpountz commented on a change in pull request #413: URL: https://github.com/apache/lucene/pull/413#discussion_r736218409 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -60,7 +60,8 @@ public KnnVectorQuery(String field, float[] target, int k) { public Query rewrite(IndexReader reader) throws IOException { TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()]; for (LeafReaderContext ctx : reader.leaves()) { - perLeafResults[ctx.ord] = searchLeaf(ctx, Math.min(k, reader.numDocs())); + int numDocs = ctx.reader().numDocs(); + perLeafResults[ctx.ord] = numDocs > 0 ? searchLeaf(ctx, Math.min(k, numDocs)) : NO_RESULTS; Review comment: I was thinking of passing `k` here, and moving the logic to avoid oversizing the heap to Lucene90HnswVectorsReader by doing `k = min(k, size())` (where `size()` is the number of docs that have a vector). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #405: LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)
dweiss merged pull request #405: URL: https://github.com/apache/lucene/pull/405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)
[ https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434150#comment-17434150 ] ASF subversion and git services commented on LUCENE-10198: -- Commit 780846a732b9c3f9c8b0abeae7d1d2c19df524e4 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=780846a ] LUCENE-10198: Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies) (#405) Co-authored-by: balmukundblr > Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack > and system proxies) > --- > > Key: LUCENE-10198 > URL: https://issues.apache.org/jira/browse/LUCENE-10198 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)
[ https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10198. -- Fix Version/s: main (9.0) Resolution: Fixed Merged this in. > Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack > and system proxies) > --- > > Key: LUCENE-10198 > URL: https://issues.apache.org/jira/browse/LUCENE-10198 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jprinet opened a new pull request #414: LUCENE-10195: Improve Gradle build speed
jprinet opened a new pull request #414: URL: https://github.com/apache/lucene/pull/414 # Description Improve Gradle build speed by mainly focussing on up-to-date checks and task caching # Solution Using Gradle Enterprise to identify room for improvements # Tests nightly and regression tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434249#comment-17434249 ] Jerome Prinet commented on LUCENE-10195: I mainly focussed on up-to-date checks and task caching, the most spectacular improvement happens when running a clean with a populated cache: ./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false -Ptests.neverUpToDate=false --scan *Before changes* [https://gradle.com/s/kxkbukaklyiz4] 1206 tasks executed in 40 projects *in 5m 39s, with 72 avoided tasks saving 16.359s* *After changes* [https://gradle.com/s/mfiwiheg4wxjq] 1206 tasks executed in 40 projects *in 26s, with 394 avoided tasks saving 30m 2.417s* *Here is the detail of the changes:* * Declare outputs in your tasks to benefit from up-to-date checks (_CollectJarInfos_) * _validateSourcePatterns_ task should not take into account _.idea_, nor _.gradle_ files * Annotate tasks Cacheable to benefit from cache (_EcjLint, ValidateSourcePatterns, RatTask, RenderJavadoc, checkBrokenLinks_) * Use valid outputs rather than dummy ones (_EcjLint_) * Do not use string representation for task inputs being collection of files or directories (_ie. resources, scriptResources_) as you can’t benefit from caching when relocating workspace to a different folder * Minimize direct usage of system properties which are location or OS dependent, as they are part of the cache entry key * Do not set location or OS related information in the _MANIFEST.MF (X-Build-OS)_ *Here some advices for future improvements:* * Fixing _tests.seed_ obviously help to benefit from up-to-date check, I get the point about randomization, but this is a trade-off with expensive cost of resources * Use the standard _Gradle wrapper_ * Setup local _gradle.properties_ in Gradle home folder rather than having an automatic generation from _gradle/generation/local-settings.gradle_ * Add to VCS gradle _.properties & Gradle wrapper_ * Do not override Gradle daemon TTL to 15mn, this is way to short * Do not create / commit generated test files to src directory (_frenchArticles.txt, Top50KWiki.utf8, CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi_) => _:lucene:analysis:common:compileTestJava_ is not cacheable due to overlapping outputs > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434260#comment-17434260 ] Jerome Prinet commented on LUCENE-10195: Using _com.palantir.consistent-versions_ Gradle plugin triggers a deprecation warning and makes it impossible to enable [configuration on demand|https://docs.gradle.org/current/userguide/multi_project_configuration_and_execution.html#sec:configuration_on_demand] which would help to improve performances as well. I filed an [issue|https://github.com/palantir/gradle-consistent-versions/issues/781] in order to get this resolved. > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434267#comment-17434267 ] Dawid Weiss commented on LUCENE-10195: -- I did take a look at the patch. Thank you - some of the things look like nice improvements. This command is not what should be the benchmark though: {code} ./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false -Ptests.neverUpToDate=false --scan {code} instead, the second invocation of this (full incremental build) should be measured instead (tests are a separate story): {code} ./gradlew check -x test {code} I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems). If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats. As for the recomendations, here are my thoughts. > Fixing tests.seed obviously help to benefit from up-to-date check, I get the > point about randomization, but this is a trade-off with expensive cost of > resources Yes, it is a tradeoff we're willing to take. Again - if somebody wants a locally fixed seed, they can do it. You'd be surprised how frequently those tests fail on boundary conditions in only certain environment combinations. > Use the standard Gradle wrapper There is a reason why non-standard wrapper is used - please look up the relevant issue in Jira (source release shouldn't ship a binary artifact). > Setup local gradle.properties in Gradle home folder rather than having an > automatic generation from gradle/generation/local-settings.gradle There is a reason why gradle.properties is generated (it adjusts the defaults to the local machine). I wish gradle had a mechanism of tuning, say, max-workers dynamically, but I don't think it does. > Do not override Gradle daemon TTL to 15mn, this is way to short If you run the build regularily switching VMs then background gradle daemons eat up all your memory. So no, I don't think it's too short. > Do not create / commit generated test files to src directory > (frenchArticles.txt, Top50KWiki.utf8, > CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi) You don't understand why they're there. One of these generated files requires 16GB memory and over 15 minutes on a decent server to generate. Even if you use the gradle cache, the first run on your old-ish laptop will kill your build with an OOM. Some of these resources require specific environments (like Linux toolchain). I don't think there is a mechanism in gradle which would allow only regenerating these resources if their source input triggers actually change. I'll go through the changes you suggested and will cherry-pick some of the improvements, thank you. > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434267#comment-17434267 ] Dawid Weiss edited comment on LUCENE-10195 at 10/26/21, 11:02 AM: -- I did take a look at the patch. Thank you - some of the things look like nice improvements. This command is not what should be the benchmark though: {code} ./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false -Ptests.neverUpToDate=false --scan {code} instead, the second invocation of this (full incremental build) should be measured instead (tests are a separate story): {code} ./gradlew check -x test {code} I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a bunch of corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems). If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats. As for the recomendations, here are my thoughts. bq. Fixing tests.seed obviously help to benefit from up-to-date check, I get the point about randomization, but this is a trade-off with expensive cost of resources Yes, it is a tradeoff we're willing to take. Again - if somebody wants a locally fixed seed, they can do it. You'd be surprised how frequently those tests fail on boundary conditions in only certain environment combinations. bq. Use the standard Gradle wrapper There is a reason why non-standard wrapper is used - please look up the relevant issue in Jira (source release shouldn't ship a binary artifact). bq. Setup local gradle.properties in Gradle home folder rather than having an automatic generation from gradle/generation/local-settings.gradle There is a reason why gradle.properties is generated (it adjusts the defaults to the local machine). I wish gradle had a mechanism of tuning, say, max-workers dynamically, but I don't think it does. bq. Do not override Gradle daemon TTL to 15mn, this is way to short If you run the build regularily switching VMs then background gradle daemons eat up all your memory. So no, I don't think it's too short. bq. Do not create / commit generated test files to src directory (frenchArticles.txt, Top50KWiki.utf8, bq. CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi) You don't understand why they're there. One of these generated files requires 16GB memory and over 15 minutes on a decent server to generate. Even if you use the gradle cache, the first run on your old-ish laptop will kill your build with an OOM. Some of these resources require specific environments (like Linux toolchain). I don't think there is a mechanism in gradle which would allow only regenerating these resources if their source input triggers actually change. I'll go through the changes you suggested and will cherry-pick some of the improvements, thank you. was (Author: dweiss): I did take a look at the patch. Thank you - some of the things look like nice improvements. This command is not what should be the benchmark though: {code} ./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false -Ptests.neverUpToDate=false --scan {code} instead, the second invocation of this (full incremental build) should be measured instead (tests are a separate story): {code} ./gradlew check -x test {code} I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems). If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats. As for the recomendations, here are my thoughts. > Fixing tests.seed obviously help to benefit from up-to-date check, I get the > point about randomization, but this is a trade-off with expensive cost of > resources Yes, it is a tradeoff we're willing to take. Again - if somebody wants a locally fixed seed, they can do it. You'd be surprised how frequently those tests fail on boundary conditions in only certain environment combinations. > Use the standard Gradle wrapper There is a reason why non-standard wrapper is used - please look up the relevant issue in Jira (source release shouldn't ship a binary artifact). > Setup local gradle.properties in Gradle home folder rather than having an > automatic generation from gradle/generation/local-settings.gradle There is a
[jira] [Commented] (LUCENE-10061) CombinedFieldsQuery needs dynamic pruning support
[ https://issues.apache.org/jira/browse/LUCENE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434299#comment-17434299 ] Adrien Grand commented on LUCENE-10061: --- bq. in order to merge impacts from multiple fields for CombinedFieldsQuery, we may need to compute all the possible summation combinations of competitive {freq, norm} across all fields I agree that there is a combinatorial explosion issue, and I fear that it's even worse than the example that you gave since we also need to consider the case when some fields do not match the query. In the examples I've seen, there's often a field that has a much higher weight than other fields (e.g. a title field that has a 10x greater weight than a body field), so I am wondering if we could leverage this property to start from the impacts of the field that has the highest weight and see how we can cheaply incorporate impacts from other fields, even if this would overestimate the actual maximum score for the query. > CombinedFieldsQuery needs dynamic pruning support > - > > Key: LUCENE-10061 > URL: https://issues.apache.org/jira/browse/LUCENE-10061 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > CombinedFieldQuery's Scorer doesn't implement advanceShallow/getMaxScore, > forcing Lucene to collect all matches in order to figure the top-k hits. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434314#comment-17434314 ] Christine Poerschke commented on LUCENE-10157: -- Hi [~cvandenberg], thanks for opening this issue for additional Indri functionality! Would you be open to contributing via a pull request to [https://github.com/apache/lucene] {{main}} branch instead of patch attachment? E.g. the CI would run automatically on it and subjectively perhaps some folks would find it more convenient to review. Suggestion from quick look at the patch: to add some sort of test coverage for the new queries. > Add Additional Indri Search Engine Functionality to Lucene > -- > > Key: LUCENE-10157 > URL: https://issues.apache.org/jira/browse/LUCENE-10157 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser, core/search >Reporter: Cameron VandenBerg >Priority: Major > Attachments: LUCENE-10157.patch > > > In Jira issue LUCENE-9537, basic functionality from the Indri search engine > ([http://lemurproject.org/indri.php]) was added to Lucene. With that > functionality in place, we would love to build upon that to add additional > Indri queries and an Indri query parser to Lucene to broaden the Indri > functionality within Lucene. In this patch, I have added the Indri NOT, the > INDRI OR, and the Indri WeightedSum functionality. I have also included an > IndriQueryParser for accessing this functionality. More information on these > query operators can be seen here: > [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: > [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/] > > I would be very excited to work with the Lucene community again to try to add > this functionality. I am open to suggestions, and I am happy to make any > changes that might be suggested. Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434344#comment-17434344 ] Jerome Prinet commented on LUCENE-10195: First, thanks David for sharing some context, it is definitely helpful. In regards with cache, you can rely only on local cache (in my case went up to 28Mo), which does not prevent the build from being run/tested on different systems. You can even wipe this periodically to keep it minimal. About Gradle configuration, I'd rather have the local related settings in _~/.gradle/gradle.properties_ which takes precedence over the project's _gradle.properties_ (see [here|[https://docs.gradle.org/current/userguide/build_environment.html#sec:gradle_configuration_properties])] My bad for the files, I didn't get the point. I was probably surprised by having them colocated with some Java source files. Anyway, you're right, any IO bound operation is most likely not to give you real benefit when cached. Thanks for reviewing! > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434347#comment-17434347 ] Uwe Schindler commented on LUCENE-10195: {quote} I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a bunch of corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems). If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats. {quote} I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a developer I want and expect to take the build longer if I run "gradlew clean". I want "gradlew clean" to forget the build and then compile everything again and especially, I want the build to rerun all checks and tests As provider of the Lucene build servers, every run should do all build steps again, because we want to test out JVM problems and this only works if the gradle build forgets everything. I just ping [~rcmuir], because he also has a strong opinion on that. So in short. Some of the changes in the PR looks fine, but everything that caches stuff on my local disk and serializes test results and checks thats should be avoided sorry! Thanks for including my opinion. Uwe P.S.: IMHO the Gradle build cache is a feature for streamlined projects with zillions of build servers to spare CPU resources mabye in organizational environments where the business logic is important. But For Lucene, if we have a zillion of build servers we want all of them redundantly run tests to find bugs in the JVMs. This is why we run the tests with different settings (compressed ops on/off, different Garbage collectors). That's what Policeman's Jenkins server is behind: Find bugs in Garbage collectors and different JVM version by running the build suite and tests 24/7. Also [~mikemccand] does this every night to monitor performance. Everything caching results would be a desaster! > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434348#comment-17434348 ] Uwe Schindler edited comment on LUCENE-10195 at 10/26/21, 1:07 PM: --- {quote} If you run the build regularily switching VMs then background gradle daemons eat up all your memory. So no, I don't think it's too short. {quote} +1. On our build servers we disable the Gradle Daemon completely. I switch it on locally when I do quick incremental builds (try-and-error). But for running the whole build for releases or debugging Gradle-shitbugs, it s also off locally. was (Author: thetaphi): bq. If you run the build regularily switching VMs then background gradle daemons eat up all your memory. So no, I don't think it's too short. +1. On our build servers we disable the Gradle Daemon completely. I switch it on locally when I do quick incremental builds (try-and-error). But for running the whole build for releases or debugging Gradle-shitbugs, it s also off locally. > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434348#comment-17434348 ] Uwe Schindler commented on LUCENE-10195: bq. If you run the build regularily switching VMs then background gradle daemons eat up all your memory. So no, I don't think it's too short. +1. On our build servers we disable the Gradle Daemon completely. I switch it on locally when I do quick incremental builds (try-and-error). But for running the whole build for releases or debugging Gradle-shitbugs, it s also off locally. > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434347#comment-17434347 ] Uwe Schindler edited comment on LUCENE-10195 at 10/26/21, 1:09 PM: --- {quote} I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a bunch of corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems). If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats. {quote} I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a developer I want and expect to take the build longer if I run "gradlew clean". I want "gradlew clean" to forget the build and then compile everything again and especially, I want the build to rerun all checks and tests As provider of the Lucene build servers, every run should do all build steps again, because we want to test out JVM problems and this only works if the gradle build forgets everything. I just ping [~rcmuir], because he also has a strong opinion on that. So in short. Some of the changes in the PR looks fine, but everything that caches stuff on my local disk and serializes test results and checks thats should be avoided sorry! Thanks for including my opinion. Uwe P.S.: IMHO the Gradle build cache is a feature for streamlined projects with zillions of build servers to spare CPU resources mabye in organizational environments where the business logic is important. But For Lucene, if we have a zillion of build servers we want all of them redundantly run tests to find bugs in the JVMs. This is why we run the tests with different settings (compressed ops on/off, different Garbage collectors). That's what Policeman's Jenkins server is behind: Find bugs in Garbage collectors and different JVM version by running the build suite and tests 24/7. Also [~mikemccand] does this every night to monitor performance. Everything caching results would be a desaster! If the build cache helps local developers, ok -- but more important is to configure Input/Outputs correctly. I have a local machine with one operating system and dont need to cache results several days. It's only working myself. was (Author: thetaphi): {quote} I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a bunch of corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems). If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats. {quote} I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a developer I want and expect to take the build longer if I run "gradlew clean". I want "gradlew clean" to forget the build and then compile everything again and especially, I want the build to rerun all checks and tests As provider of the Lucene build servers, every run should do all build steps again, because we want to test out JVM problems and this only works if the gradle build forgets everything. I just ping [~rcmuir], because he also has a strong opinion on that. So in short. Some of the changes in the PR looks fine, but everything that caches stuff on my local disk and serializes test results and checks thats should be avoided sorry! Thanks for including my opinion. Uwe P.S.: IMHO the Gradle build cache is a feature for streamlined projects with zillions of build servers to spare CPU resources mabye in organizational environments where the business logic is important. But For Lucene, if we have a zillion of build servers we want all of them redundantly run tests to find bugs in the JVMs. This is why we run the tests with different settings (compressed ops on/off, different Garbage collectors). That's what Policeman's Jenkins server is behind: Find bugs in Garbage collectors and different JVM version by running the build suite and tests 24/7. Also [~mikemccand] does this every night to monitor performance. Everything caching results would be a desaster! > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major >
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434357#comment-17434357 ] Dawid Weiss commented on LUCENE-10195: -- I will review the patch one-by-one (thanks for splitting the commits, it helps). Please give me some time as my schedule is pretty intense. It'd be awesome if you guys at gradle could take a closer look at some of the issues I outlined in my e-mail on the dev list [1] - don't know if you saw it. These are *hard* and require deeper knowledge of gradle internals (not to mention the will to perhaps change the implementation here or there). [1] https://markmail.org/message/vjpfc2jwocroz7nd > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434362#comment-17434362 ] Dawid Weiss commented on LUCENE-10195: -- bq. I'd rather have the local related settings in ~/.gradle/gradle.properties which takes precedence over the project's gradle.properties The thing is: these are global. And many of these settings are project-specific. What works in one project wouldn't work in another. I found it irritating that something so easily solved in ant (include defaults, then local user properties from the project) is so difficult in gradle. I know I'm in no position to suggest anything but I would love to see a way of bootstraping with more than one project-local gradle*properties... or to have a way to compute some of the properties dynamically (so that machine settings can be fined-tuned to). > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434365#comment-17434365 ] Robert Muir commented on LUCENE-10195: -- I agree with Dawid and Uwe. I will just add one thing: I do see one potential use-case for enabling the cache: regenerating those enormous jflex DFAs (from 'regenerate'). This seems contained enough, that we could possibly make it work efficiently and have all the inputs and outputs correct? This really is a case similar to javac, we are using a third party tool (jflex) to translate input grammar into .java output. The end result is actually quite small (e.g. 2MB result), but it requires gigabytes of memory and many minutes. Dawid has stuff in the build to "control" this already, so that the build fails if someone tries to edit a generated file directly. But even so, I am wary of the current build cache. It doesn't allow me to easily bound the size: https://github.com/gradle/gradle/issues/3346 Will the cache behave correctly when it runs out of disk space? I would be happy to just configure a 10MB fixed loopback mount for this cache as a workaround, so that I generate the jflex DFA less often :) > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10157) Add Additional Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-10157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434372#comment-17434372 ] Cameron VandenBerg commented on LUCENE-10157: - Hi [~cpoerschke]! Thanks for your response! I would be happy to create a pull request, and I will make sure to add tests for the new queries. > Add Additional Indri Search Engine Functionality to Lucene > -- > > Key: LUCENE-10157 > URL: https://issues.apache.org/jira/browse/LUCENE-10157 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser, core/search >Reporter: Cameron VandenBerg >Priority: Major > Attachments: LUCENE-10157.patch > > > In Jira issue LUCENE-9537, basic functionality from the Indri search engine > ([http://lemurproject.org/indri.php]) was added to Lucene. With that > functionality in place, we would love to build upon that to add additional > Indri queries and an Indri query parser to Lucene to broaden the Indri > functionality within Lucene. In this patch, I have added the Indri NOT, the > INDRI OR, and the Indri WeightedSum functionality. I have also included an > IndriQueryParser for accessing this functionality. More information on these > query operators can be seen here: > [https://sourceforge.net/p/lemur/wiki/Belief%20Operations/] and here: > [https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/.|https://sourceforge.net/p/lemur/wiki/Indri%20Query%20Language%20Reference/] > > I would be very excited to work with the Lucene community again to try to add > this functionality. I am open to suggestions, and I am happy to make any > changes that might be suggested. Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434375#comment-17434375 ] Dawid Weiss commented on LUCENE-10195: -- bq. so that I generate the jflex DFA less often The build cache would have to fetch this from an external server so you'd need a network connection then. Besides - what causes it to be rerun? It should be skipped in the current build (unless you're really forcing it to run)? > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10204) Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery)
Greg Miller created LUCENE-10204: Summary: Support iteration of sub-matches in join queries (ToParentBlockJoinQuery / ToChildBlockJoinQuery) Key: LUCENE-10204 URL: https://issues.apache.org/jira/browse/LUCENE-10204 Project: Lucene - Core Issue Type: Improvement Components: modules/join Reporter: Greg Miller It would be nice to be able to iterate over the "sub-matches" in these join queries for the purpose of faceting (or possibly other use-cases?). For example, we have a use-case where our query matches on "child" docs, using a {{ToParentBlockJoinQuery}} to "emit" the associated parents, which are ultimately added to our match set. But, we want to iterate over the matching "children" for the purpose of faceting. To make it concrete, consider searching over a product catalog where "offers" and "items" are indexed side-by-side, with the offers being represented as "children" of the parent items. An offer contains information like "condition" (new vs. used), selling price, etc. for the parent item. If we want to facet on "condition", we want to observe all children that matched the query to know if the parent item had a "new" or "used" offer (or both). This requires iterating over the child matches when faceting, which we cannot do today since the child hit information isn't retained anywhere. We can support this by "caching" the child hits in a bitset but there is some complexity when multiple join queries appear in a query structure (would need to logically combine various "cached" bitsets using the same boolean operations as in the original query structure). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434379#comment-17434379 ] Jerome Prinet commented on LUCENE-10195: {quote}I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a developer I want and expect to take the build longer if I run "gradlew clean". I want "gradlew clean" to forget the build and then compile everything again and especially, I want the build to rerun all checks and tests {quote} Regarding tests, they will be rerun by default with tests.neverUpToDate flag. You might be interested by the [--rerun-tasks option|https://docs.gradle.org/current/userguide/command_line_interface.html#sec:rerun_tasks] which allow to ignore up-to-date checks. {quote}P.S.: IMHO the Gradle build cache is a feature for streamlined projects with zillions of build servers to spare CPU resources mabye in organizational environments where the business logic is important. {quote} We can differentiate between local cache and remote cache, this PR was not triggering any remote cache inclusion. {quote}If the build cache helps local developers, ok – but more important is to configure Input/Outputs correctly. I have a local machine with one operating system and dont need to cache results several days. It's only working myself. {quote} Yep this is the tricky part, configuring inputs and outputs accurately, but once you get there, it can be super interesting to not recompute something which was already computed. This comes with a price obviously, disk space taken but again can be super beneficial in some cases. {quote} It'd be awesome if you guys at gradle could take a closer look at some of the issues I outlined in my e-mail on the dev list [1] {quote} I will definitely relay that internally {quote}Will the cache behave correctly when it runs out of disk space? {quote} probably not {quote}I would be happy to just configure a 10MB fixed loopback mount for this cache as a workaround, so that I generate the jflex DFA less often {quote} There is no way to do that out of the box > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434382#comment-17434382 ] Robert Muir commented on LUCENE-10195: -- Sorry, maybe i wasn't clear. It is my understanding, that by default, it could cache 2MB in the local cache and it would persist across "gradle clean". And yes, I know the large DFA task is skipped by default, but I imagined this would make it much less annoying, and we could potentially enable it? Sure, it doesn't fix the real minimization issue that causes it to take 20 minutes cpu + 10GB ram, but it would reduce the pain. > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10205) Should Packed64 use a byte[] plus VarHandles?
Adrien Grand created LUCENE-10205: - Summary: Should Packed64 use a byte[] plus VarHandles? Key: LUCENE-10205 URL: https://issues.apache.org/jira/browse/LUCENE-10205 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand By being backed by a long[], Packed64 often has to merge bits coming from two different longs. If it was backed by a byte[], it could always read a single long, which would help remove conditionals? The main downside is that we'd need paging to support high value counts with high numbers of bits (when value_count * bits_per_value / 8 > ArrayUtil.MAX_ARRAY_LENGTH). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434390#comment-17434390 ] Robert Muir commented on LUCENE-10195: -- {quote} > I would be happy to just configure a 10MB fixed loopback mount for this cache > as a workaround, so that I generate the jflex DFA less often There is no way to do that out of the box {quote} No, i mean I would do it myself, and configure {{/home/rmuir/.gradle/caches/}} in {{/etc/fstab}} to be 10MB. So it would run gradle out of disk space if it tried to write any more than that. I really don't want size-unbounded caches storing stuff or trashing my SSD. I keep all my caches on a short leash, it is pretty easy since most apps behave and store stuff under {{~/.cache}}. So I already mount this as tmpfs with a size limit. And I pass flags such as {{chromium --disk-cache-size}} when apps have a way to explicitly bound the size. > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434396#comment-17434396 ] Jerome Prinet commented on LUCENE-10195: Just to clarify, you can limit the TTL in cache, but not the whole cache size [https://docs.gradle.org/current/userguide/build_cache.html#sec:build_cache_configure] > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10206) Implement O(1) count on query cache
Nik Everett created LUCENE-10206: Summary: Implement O(1) count on query cache Key: LUCENE-10206 URL: https://issues.apache.org/jira/browse/LUCENE-10206 Project: Lucene - Core Issue Type: Improvement Reporter: Nik Everett I'd like to implement the `Weight#count` method in `LRUQueryCache` so cached queries can quickly return their counts. We already have a count on all of the bit sets we use for the query cache we just have to store it and "plug it in". I got here because we frequently end up wanting to get counts and I saw hot `RoaringDocIdSet`'s iterator hot spotting. I don't think it's slow or anything, but when the collector is just `count++` the iterator is substantial. It seems like we could frequently avoid the whole thing by implementing `count` in the query cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434401#comment-17434401 ] Dawid Weiss commented on LUCENE-10195: -- {quote}You might be interested by the [--rerun-tasks option|https://docs.gradle.org/current/userguide/command_line_interface.html#sec:rerun_tasks] which allow to ignore up-to-date checks. {quote} This reruns all tasks in the graph which is more of a pain than help (in majority of cases :)). To me the absolutely best feature of gradle lies in incremental tasks. When things are configured correctly this means incremental-check subsystem pretty much takes care of itself. I almost never have the need to run a full 'clean'. {quote}Sorry, maybe i wasn't clear. It is my understanding, that by default, it could cache 2MB in the local cache and it would persist across "gradle clean". {quote} It would still try to run this task on the first run when the input/output information isn't locally available (assuming no external cache is provided). This means it'd run at least once. To me this is a no-go. I really wish there was a mechanism for somehow persisting the state of up-to-date checks but there isn't. > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nik9000 opened a new pull request #415: LUCENE-10206 Implement O(1) count on query cache
nik9000 opened a new pull request #415: URL: https://github.com/apache/lucene/pull/415 # Description When we load a query into the query cache we always calculate the count of matching documents. This uses that count to power the new `O(1)` `Weight#count` method. # Solution I've tried a bunch of approaches but settled on opening this PR with the simplest one - add a new class that keeps the BitSet and the count. I'm not particularly tied to it other than that it is fairly simple. I am assuming it's right to try and implement `count` here rather than do something else. It feels like that method was made for situations like this though. # Tests I've added some to LRUCache's unit test. I haven't done any performance testing here. A little because "it's obvious that returning a number is faster than counting stuff". Nanoseconds vs microseconds. But I'd love to do more with this if folks want me to. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `main` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache
nik9000 commented on pull request #415: URL: https://github.com/apache/lucene/pull/415#issuecomment-952025788 > I had an out of date `main`. I'll update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nik9000 commented on pull request #415: LUCENE-10206 Implement O(1) count on query cache
nik9000 commented on pull request #415: URL: https://github.com/apache/lucene/pull/415#issuecomment-952040157 > I had an out of date `main`. I'll update. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache
jpountz commented on a change in pull request #415: URL: https://github.com/apache/lucene/pull/415#discussion_r736682046 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws IOException { reader.close(); dir.close(); } + + public void testCacheHasFastCount() throws IOException { +Query query = new PhraseQuery("words", new BytesRef("alice"), new BytesRef("ran")); + +Directory dir = newDirectory(); +RandomIndexWriter w = +new RandomIndexWriter( +random(), dir, newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE)); +Document doc1 = new Document(); +doc1.add(new TextField("words", "tom ran", Store.NO)); +Document doc2 = new Document(); +doc2.add(new TextField("words", "alice ran", Store.NO)); +doc2.add(new StringField("f", "a", Store.NO)); +Document doc3 = new Document(); +doc3.add(new TextField("words", "alice ran", Store.NO)); +doc3.add(new StringField("f", "b", Store.NO)); +w.addDocuments(Arrays.asList(doc1, doc2, doc3)); + +try (IndexReader reader = w.getReader()) { + IndexSearcher searcher = newSearcher(reader); + searcher.setQueryCachingPolicy(ALWAYS_CACHE); + LRUQueryCache allCache = + new LRUQueryCache(100, 1000, context -> true, Float.POSITIVE_INFINITY); + searcher.setQueryCache(allCache); + Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1); + LeafReaderContext context = getOnlyLeafReader(reader).getContext(); + // We don't have a fast count before the cache is filled + assertEquals(weight.count(context), -1); + // Fetch the scorer to populate the cache + weight.scorer(context); + assertEquals(List.of(query), allCache.cachedQueries()); + // Now we *do* have a fast count + assertEquals(weight.count(context), 2); +} + +w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b"; +try (IndexReader reader = w.getReader()) { + IndexSearcher searcher = newSearcher(reader); + searcher.setQueryCachingPolicy(ALWAYS_CACHE); + LRUQueryCache allCache = + new LRUQueryCache(100, 1000, context -> true, Float.POSITIVE_INFINITY); + searcher.setQueryCache(allCache); + Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1); + LeafReaderContext context = getOnlyLeafReader(reader).getContext(); + // We don't have a fast count before the cache is filled + assertEquals(weight.count(context), -1); + // Fetch the scorer to populate the cache + weight.scorer(context); + assertEquals(List.of(query), allCache.cachedQueries()); + // We still don't have a fast count because we have deleted documents + assertEquals(weight.count(context), -1); Review comment: nit: assertEquals expects the expected value first, so that we get better error messages in case of failure ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1088,7 +1089,19 @@ public void testRandom() throws IOException { cachedSearcher.setQueryCachingPolicy(ALWAYS_CACHE); } final Query q = buildRandomQuery(0); + /* + * Counts are the same. If the query has already been cached + * this'll use the O(1) Weight#count method. + */ assertEquals(uncachedSearcher.count(q), cachedSearcher.count(q)); + /* + * Just to make sure we can iterate every time also check that the + * same docs are returned in the same order. + */ + int size = 1 + random().nextInt(1000); + assertArrayEquals( + Arrays.stream(uncachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray(), + Arrays.stream(cachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray()); Review comment: you might want to use `CheckHits#checkEqual` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nik9000 commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache
nik9000 commented on a change in pull request #415: URL: https://github.com/apache/lucene/pull/415#discussion_r736693202 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws IOException { reader.close(); dir.close(); } + + public void testCacheHasFastCount() throws IOException { +Query query = new PhraseQuery("words", new BytesRef("alice"), new BytesRef("ran")); + +Directory dir = newDirectory(); +RandomIndexWriter w = +new RandomIndexWriter( +random(), dir, newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE)); +Document doc1 = new Document(); +doc1.add(new TextField("words", "tom ran", Store.NO)); +Document doc2 = new Document(); +doc2.add(new TextField("words", "alice ran", Store.NO)); +doc2.add(new StringField("f", "a", Store.NO)); +Document doc3 = new Document(); +doc3.add(new TextField("words", "alice ran", Store.NO)); +doc3.add(new StringField("f", "b", Store.NO)); +w.addDocuments(Arrays.asList(doc1, doc2, doc3)); + +try (IndexReader reader = w.getReader()) { + IndexSearcher searcher = newSearcher(reader); + searcher.setQueryCachingPolicy(ALWAYS_CACHE); + LRUQueryCache allCache = + new LRUQueryCache(100, 1000, context -> true, Float.POSITIVE_INFINITY); + searcher.setQueryCache(allCache); + Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1); + LeafReaderContext context = getOnlyLeafReader(reader).getContext(); + // We don't have a fast count before the cache is filled + assertEquals(weight.count(context), -1); + // Fetch the scorer to populate the cache + weight.scorer(context); + assertEquals(List.of(query), allCache.cachedQueries()); + // Now we *do* have a fast count + assertEquals(weight.count(context), 2); +} + +w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b"; +try (IndexReader reader = w.getReader()) { + IndexSearcher searcher = newSearcher(reader); + searcher.setQueryCachingPolicy(ALWAYS_CACHE); + LRUQueryCache allCache = + new LRUQueryCache(100, 1000, context -> true, Float.POSITIVE_INFINITY); + searcher.setQueryCache(allCache); + Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1); + LeafReaderContext context = getOnlyLeafReader(reader).getContext(); + // We don't have a fast count before the cache is filled + assertEquals(weight.count(context), -1); + // Fetch the scorer to populate the cache + weight.scorer(context); + assertEquals(List.of(query), allCache.cachedQueries()); + // We still don't have a fast count because we have deleted documents + assertEquals(weight.count(context), -1); Review comment: Bah! I leave this same comment on other people's code. And yet I make the same mistake. Will fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434471#comment-17434471 ] Jerome Prinet commented on LUCENE-10195: {quote}The thing is: these are global. And many of these settings are project-specific. What works in one project wouldn't work in another. I found it irritating that something so easily solved in ant (include defaults, then local user properties from the project) is so difficult in gradle. I know I'm in no position to suggest anything but I would love to see a way of bootstraping with more than one project-local gradle*properties... or to have a way to compute some of the properties dynamically (so that machine settings can be fined-tuned to). {quote} You might want to give a look at init scripts which allow to add some conditional logic [https://docs.gradle.org/current/userguide/init_scripts.html] > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434472#comment-17434472 ] Jerome Prinet commented on LUCENE-10195: support for project-controlled up-to-date mechanism (versioned artifact checksums) that is not bypassed on the first run. This is needed for generated resources - I didn't figure out how to do it other than with ugly hacks. Did you explore [upToDateWhen()|https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/TaskOutputs.html] method? > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434472#comment-17434472 ] Jerome Prinet edited comment on LUCENE-10195 at 10/26/21, 5:06 PM: --- {quote}support for project-controlled up-to-date mechanism (versioned artifact checksums) that is not bypassed on the first run. This is needed for generated resources - I didn't figure out how to do it other than with ugly hacks. {quote} Did you explore [upToDateWhen()|https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/TaskOutputs.html] method? was (Author: jeromep): support for project-controlled up-to-date mechanism (versioned artifact checksums) that is not bypassed on the first run. This is needed for generated resources - I didn't figure out how to do it other than with ugly hacks. Did you explore [upToDateWhen()|https://docs.gradle.org/current/javadoc/org/gradle/api/tasks/TaskOutputs.html] method? > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10195) Gradle build speed improvement
[ https://issues.apache.org/jira/browse/LUCENE-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434473#comment-17434473 ] Jerome Prinet commented on LUCENE-10195: {quote}test runner test case ordering optimization (load balancing between worker JVMs); currently the long-tail test case can slow down builds significantly. {quote} [Test distribution|https://docs.gradle.com/enterprise/test-distribution-gradle-plugin/] might be Gradle way to tackle that > Gradle build speed improvement > -- > > Key: LUCENE-10195 > URL: https://issues.apache.org/jira/browse/LUCENE-10195 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jerome Prinet >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Increase Gradle build speed with help of Gradle built-in features, mostly > cache and up-to-date checks > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] nik9000 commented on a change in pull request #415: LUCENE-10206 Implement O(1) count on query cache
nik9000 commented on a change in pull request #415: URL: https://github.com/apache/lucene/pull/415#discussion_r736786185 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1976,4 +1989,60 @@ public void testSkipCachingForTermQuery() throws IOException { reader.close(); dir.close(); } + + public void testCacheHasFastCount() throws IOException { +Query query = new PhraseQuery("words", new BytesRef("alice"), new BytesRef("ran")); + +Directory dir = newDirectory(); +RandomIndexWriter w = +new RandomIndexWriter( +random(), dir, newIndexWriterConfig().setMergePolicy(NoMergePolicy.INSTANCE)); +Document doc1 = new Document(); +doc1.add(new TextField("words", "tom ran", Store.NO)); +Document doc2 = new Document(); +doc2.add(new TextField("words", "alice ran", Store.NO)); +doc2.add(new StringField("f", "a", Store.NO)); +Document doc3 = new Document(); +doc3.add(new TextField("words", "alice ran", Store.NO)); +doc3.add(new StringField("f", "b", Store.NO)); +w.addDocuments(Arrays.asList(doc1, doc2, doc3)); + +try (IndexReader reader = w.getReader()) { + IndexSearcher searcher = newSearcher(reader); + searcher.setQueryCachingPolicy(ALWAYS_CACHE); + LRUQueryCache allCache = + new LRUQueryCache(100, 1000, context -> true, Float.POSITIVE_INFINITY); + searcher.setQueryCache(allCache); + Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1); + LeafReaderContext context = getOnlyLeafReader(reader).getContext(); + // We don't have a fast count before the cache is filled + assertEquals(weight.count(context), -1); + // Fetch the scorer to populate the cache + weight.scorer(context); + assertEquals(List.of(query), allCache.cachedQueries()); + // Now we *do* have a fast count + assertEquals(weight.count(context), 2); +} + +w.deleteDocuments(new TermQuery(new Term("f", new BytesRef("b"; +try (IndexReader reader = w.getReader()) { + IndexSearcher searcher = newSearcher(reader); + searcher.setQueryCachingPolicy(ALWAYS_CACHE); + LRUQueryCache allCache = + new LRUQueryCache(100, 1000, context -> true, Float.POSITIVE_INFINITY); + searcher.setQueryCache(allCache); + Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1); + LeafReaderContext context = getOnlyLeafReader(reader).getContext(); + // We don't have a fast count before the cache is filled + assertEquals(weight.count(context), -1); + // Fetch the scorer to populate the cache + weight.scorer(context); + assertEquals(List.of(query), allCache.cachedQueries()); + // We still don't have a fast count because we have deleted documents + assertEquals(weight.count(context), -1); Review comment: Done. ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1088,7 +1089,19 @@ public void testRandom() throws IOException { cachedSearcher.setQueryCachingPolicy(ALWAYS_CACHE); } final Query q = buildRandomQuery(0); + /* + * Counts are the same. If the query has already been cached + * this'll use the O(1) Weight#count method. + */ assertEquals(uncachedSearcher.count(q), cachedSearcher.count(q)); + /* + * Just to make sure we can iterate every time also check that the + * same docs are returned in the same order. + */ + int size = 1 + random().nextInt(1000); + assertArrayEquals( + Arrays.stream(uncachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray(), + Arrays.stream(cachedSearcher.search(q, size).scoreDocs).mapToInt(d -> d.doc).toArray()); Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman opened a new pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
chatman opened a new pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594 A new quickstart guide that can potentially replace (or live side by side with) the Solr tutorial. This is WIP at the moment, but would appreciate early feedback and thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
epugh commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736805151 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ Review comment: A nit pick is that the collection is tech productions, and we have books.Maybe we should think (separately) renaming `techproducts` to just `products`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
epugh commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736806111 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + }' + + +Multiple documents can be indexed in the same request: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' [ + { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + } +, + { +"id" : "978-1423103349", +"cat" : ["book","paperback"], +"name" : "The Sea of Monsters", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 2, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 6.49, +"pages_i" : 304 + } +]' + + +A file containing the documents can be indexed as follows: +[source,subs="verbatim,attributes+"] + +$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update + + +== Commit +After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows. Review comment: i don't know if introducing terms used by other search engines is useful... though maybe we want to build up a gloassary that would list "equivalent" terms from other engines? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail:
[GitHub] [lucene-solr] chatman commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
chatman commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736806880 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ Review comment: Yes, good idea. I just took those docs off the Solr tutorial (which indexes books into techproducts). But, clearly, it is time for a better example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
chatman commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736807226 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + }' + + +Multiple documents can be indexed in the same request: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' [ + { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + } +, + { +"id" : "978-1423103349", +"cat" : ["book","paperback"], +"name" : "The Sea of Monsters", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 2, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 6.49, +"pages_i" : 304 + } +]' + + +A file containing the documents can be indexed as follows: +[source,subs="verbatim,attributes+"] + +$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update + + +== Commit +After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows. Review comment: A glossary sounds like a very good idea, for people coming from different systems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
chatman commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736808710 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + }' + + +Multiple documents can be indexed in the same request: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' [ + { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + } +, + { +"id" : "978-1423103349", +"cat" : ["book","paperback"], +"name" : "The Sea of Monsters", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 2, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 6.49, +"pages_i" : 304 + } +]' + + +A file containing the documents can be indexed as follows: +[source,subs="verbatim,attributes+"] + +$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update + + +== Commit +After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows. Review comment: > i don't know if introducing terms used by other search engines is useful I feel that those coming from ES / OpenSearch backgrounds might be able to relate better. My main motivation with this document is to cut down on paragraphs of text and have more copy-paste-able snippets, esp. using JSON/V2 apis, to make Solr more appealing to those who find ES easy to use (mainly due to their superior beginner documentation). ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S
[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
epugh commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736816462 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + }' + + +Multiple documents can be indexed in the same request: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' [ + { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + } +, + { +"id" : "978-1423103349", +"cat" : ["book","paperback"], +"name" : "The Sea of Monsters", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 2, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 6.49, +"pages_i" : 304 + } +]' + + +A file containing the documents can be indexed as follows: +[source,subs="verbatim,attributes+"] + +$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update + + +== Commit +After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows. Review comment: That makes sense... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
epugh commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736816706 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + }' + + +Multiple documents can be indexed in the same request: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' [ + { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + } +, + { +"id" : "978-1423103349", +"cat" : ["book","paperback"], +"name" : "The Sea of Monsters", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 2, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 6.49, +"pages_i" : 304 + } +]' + + +A file containing the documents can be indexed as follows: +[source,subs="verbatim,attributes+"] + +$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update + + +== Commit +After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows. Review comment: "solr for ES/OS refugees" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] magibney commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
magibney commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r736836224 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 + } +}' + + +== Indexing documents + +A single document can be indexed as: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + }' + + +Multiple documents can be indexed in the same request: +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url 'http://localhost:8983/api/collections/techproducts/update' \ + --header 'Content-Type: application/json' \ + --data ' [ + { +"id" : "978-0641723445", +"cat" : ["book","hardcover"], +"name" : "The Lightning Thief", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 1, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 12.50, +"pages_i" : 384 + } +, + { +"id" : "978-1423103349", +"cat" : ["book","paperback"], +"name" : "The Sea of Monsters", +"author" : "Rick Riordan", +"series_t" : "Percy Jackson and the Olympians", +"sequence_i" : 2, +"genre_s" : "fantasy", +"inStock" : true, +"price" : 6.49, +"pages_i" : 304 + } +]' + + +A file containing the documents can be indexed as follows: +[source,subs="verbatim,attributes+"] + +$ curl -X POST -d @example/exampledocs/books.json http://localhost:8983/api/collections/techproducts/update + + +== Commit +After documents are indexed into a collection, they are not immediately available for searching. In order to have them searchable, a commit operation (also called `refresh` in other search engines like OpenSearch etc.) is needed. Commits can be scheduled at periodic intervals using auto-commits as follows. Review comment: Makes sense to me as a point of reference. It might be more economical to say "(also called `refresh` in ElasticSearch/OpenSearch)" ... unless there are other search engines that refer to this concept as "refresh"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
[jira] [Commented] (LUCENE-10163) Review top-level *.txt and *.md files
[ https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434533#comment-17434533 ] ASF subversion and git services commented on LUCENE-10163: -- Commit 08c03566648c0b024b8160869b3d694c3cebaabd in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=08c0356 ] LUCENE-10163: clean up and remove some old cruft in readme files. Move binary release only README.md to the distribution project so that it doesn't look weird in the source tree. (#406) > Review top-level *.txt and *.md files > - > > Key: LUCENE-10163 > URL: https://issues.apache.org/jira/browse/LUCENE-10163 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Some of them contain obsolete pointers and information > (SYSTEM_REQUIREMENTS.md, etc.). > Also, move the files that are distribution-specific (lucene/README.md) to the > distribution project. Otherwise they > give odd, incorrect information like: > {code} > To review the documentation, read the main documentation page, located at: > `docs/index.html` > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10163) Review top-level *.txt and *.md files
[ https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10163. -- Fix Version/s: main (9.0) Resolution: Fixed > Review top-level *.txt and *.md files > - > > Key: LUCENE-10163 > URL: https://issues.apache.org/jira/browse/LUCENE-10163 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > Some of them contain obsolete pointers and information > (SYSTEM_REQUIREMENTS.md, etc.). > Also, move the files that are distribution-specific (lucene/README.md) to the > distribution project. Otherwise they > give odd, incorrect information like: > {code} > To review the documentation, read the main documentation page, located at: > `docs/index.html` > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #406: LUCENE-10163: clean up and remove some old cruft in readme files.
dweiss merged pull request #406: URL: https://github.com/apache/lucene/pull/406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #407: LUCENE-10199: drop binary .zip artifact.
dweiss merged pull request #407: URL: https://github.com/apache/lucene/pull/407 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10199) Drop ZIP binary distribution from release artifacts
[ https://issues.apache.org/jira/browse/LUCENE-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434534#comment-17434534 ] ASF subversion and git services commented on LUCENE-10199: -- Commit fb6aaa7b2c28749c93553c7ffb7e5f5a372ad9b3 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb6aaa7 ] LUCENE-10199: drop binary .zip artifact. (#407) > Drop ZIP binary distribution from release artifacts > --- > > Key: LUCENE-10199 > URL: https://issues.apache.org/jira/browse/LUCENE-10199 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10199) Drop ZIP binary distribution from release artifacts
[ https://issues.apache.org/jira/browse/LUCENE-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-10199. -- Fix Version/s: main (9.0) Resolution: Fixed > Drop ZIP binary distribution from release artifacts > --- > > Key: LUCENE-10199 > URL: https://issues.apache.org/jira/browse/LUCENE-10199 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Minor > Fix For: main (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10198) Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack and system proxies)
[ https://issues.apache.org/jira/browse/LUCENE-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434541#comment-17434541 ] ASF subversion and git services commented on LUCENE-10198: -- Commit 4329450392f11303fdd8ed5352d9cfffca8dc8c1 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4329450 ] LUCENE-10198: remove debug statement that crept in. > Allow external JAVA_OPTS in gradlew scripts; use sane defaults (heap, stack > and system proxies) > --- > > Key: LUCENE-10198 > URL: https://issues.apache.org/jira/browse/LUCENE-10198 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
Adrien Grand created LUCENE-10207: - Summary: Make TermInSetQuery usable with IndexOrDocValuesQuery Key: LUCENE-10207 URL: https://issues.apache.org/jira/browse/LUCENE-10207 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand IndexOrDocValuesQuery is very useful to pick the right execution mode for a query depending on other bits of the query tree. We would like to be able to use it to optimize execution of TermInSetQuery. However IndexOrDocValuesQuery only works well if the "index" query can give an estimation of the cost of the query without doing anything expensive (like looking up all terms of the TermInSetQuery in the terms dict). Maybe we could implement it for primary keys (terms.size() == sumDocFreq) by returning the number of terms of the query? Another idea is to multiply the number of terms by the average postings length, though this could be dangerous if the field has a zipfian distribution and some terms have a much higher doc frequency than the average. [~romseygeek] and I were discussing this a few weeks ago, and more recently [~mikemccand] and [~gsmiller] again independently. So it looks like there is interest in this. Here is an email thread where this was recently discussed: https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10163) Review top-level *.txt and *.md files
[ https://issues.apache.org/jira/browse/LUCENE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434546#comment-17434546 ] ASF subversion and git services commented on LUCENE-10163: -- Commit 1613355149e5fc11d0804b457742f5862e843ae2 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1613355 ] LUCENE-10163: update smoke tester - README inside lucene/ is no longer there in the source release. > Review top-level *.txt and *.md files > - > > Key: LUCENE-10163 > URL: https://issues.apache.org/jira/browse/LUCENE-10163 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Major > Fix For: main (9.0) > > Time Spent: 40m > Remaining Estimate: 0h > > Some of them contain obsolete pointers and information > (SYSTEM_REQUIREMENTS.md, etc.). > Also, move the files that are distribution-specific (lucene/README.md) to the > distribution project. Otherwise they > give odd, incorrect information like: > {code} > To review the documentation, read the main documentation page, located at: > `docs/index.html` > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh commented on pull request #1676: SOLR-13973: Depricate Tika support in 8.7
epugh commented on pull request #1676: URL: https://github.com/apache/lucene-solr/pull/1676#issuecomment-952283806 We should have merged this PR!Oh well... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434547#comment-17434547 ] Robert Muir commented on LUCENE-10207: -- we may be able to relax it slightly by computing worst-case cost, something like: {code} cost = numQueryTerms * (1 + terms.sumDocFreq - terms.size) {code} This will overestimate the cost when the field isn't anything like a unique-key field, but it will never underestimate it. So it would be always be "safe" to use the IndexOrDocValuesQuery. > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] apanimesh061 commented on a change in pull request #412: LUCENE-10197: UnifiedHighlighter should use builders for thread-safety
apanimesh061 commented on a change in pull request #412: URL: https://github.com/apache/lucene/pull/412#discussion_r736109899 ## File path: lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/TestUnifiedHighlighter.java ## @@ -460,6 +462,26 @@ public void testBuddhism() throws Exception { ir.close(); } + public void testUnifiedHighlighterBuilder() throws Exception { Review comment: This is not a real unit test. I only added it to demo that the builder can be sub-classed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434572#comment-17434572 ] Robert Muir commented on LUCENE-10207: -- I think we can slightly tweak it (still completely safe) by doing: {code} cost = numQueryTerms + (terms.sumDocFreq - terms.size) {code} Similar to the previous comment, the cost is correct for the unique-key field. We assume that we'll match _all_ the "non-unique" postings as well, the worst-case. But the overestimation is less aggressive. > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10207) Make TermInSetQuery usable with IndexOrDocValuesQuery
[ https://issues.apache.org/jira/browse/LUCENE-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434582#comment-17434582 ] Robert Muir commented on LUCENE-10207: -- Also there is more we can do to better reflect the costs for these queries, for stuff like existing SortedSetDocValuesRangeQuery I feel like the existing {{matchCost}} is bogusly hardcoded for the "actually multivalued case" at: {code} @Override public float matchCost() { return 2; // 2 comparisons } {code} But this seems wrong? Matching is a loop. I feel like it should at least try to account for the multi-valued loop: {code} final float avgDVsPerDoc = terms.sumDocFreq / (float) terms.getDocCount; ... @Override public float matchCost() { return 2 * avgDVsPerDoc; // 2 comparisons in a loop over ordinals } {code} > Make TermInSetQuery usable with IndexOrDocValuesQuery > - > > Key: LUCENE-10207 > URL: https://issues.apache.org/jira/browse/LUCENE-10207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > IndexOrDocValuesQuery is very useful to pick the right execution mode for a > query depending on other bits of the query tree. > We would like to be able to use it to optimize execution of TermInSetQuery. > However IndexOrDocValuesQuery only works well if the "index" query can give > an estimation of the cost of the query without doing anything expensive (like > looking up all terms of the TermInSetQuery in the terms dict). Maybe we could > implement it for primary keys (terms.size() == sumDocFreq) by returning the > number of terms of the query? Another idea is to multiply the number of terms > by the average postings length, though this could be dangerous if the field > has a zipfian distribution and some terms have a much higher doc frequency > than the average. > [~romseygeek] and I were discussing this a few weeks ago, and more recently > [~mikemccand] and [~gsmiller] again independently. So it looks like there is > interest in this. Here is an email thread where this was recently discussed: > https://lists.apache.org/thread.html/re3b20a486c9a4e66b2ca4a2646e2d3be48535a90cdd95911a8445183%40%3Cdev.lucene.apache.org%3E. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2595: LUCENE-10141: Add the next minor version on Lucene's main branch in the split repo so the backcompat_master task works
thelabdude opened a new pull request #2595: URL: https://github.com/apache/lucene-solr/pull/2595 I think the reason the `addBackcompatIndexes.py` script failed (`backcompat_master` step) when I built 8.10 was the missing Version info for 8_11, see: https://issues.apache.org/jira/browse/LUCENE-10131 So this PR adds a task to run the `addVersion.py` script for Lucene's main branch (in the split-out repo) so that the `backcompat_master` step works later in the release process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude commented on pull request #2595: LUCENE-10141: Add the next minor version on Lucene's main branch in the split repo so the backcompat_master task works
thelabdude commented on pull request #2595: URL: https://github.com/apache/lucene-solr/pull/2595#issuecomment-952381002 Not sure I have all the git commands right here ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0
jtibshirani commented on a change in pull request #413: URL: https://github.com/apache/lucene/pull/413#discussion_r737016446 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -60,7 +60,8 @@ public KnnVectorQuery(String field, float[] target, int k) { public Query rewrite(IndexReader reader) throws IOException { TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()]; for (LeafReaderContext ctx : reader.leaves()) { - perLeafResults[ctx.ord] = searchLeaf(ctx, Math.min(k, reader.numDocs())); + int numDocs = ctx.reader().numDocs(); + perLeafResults[ctx.ord] = numDocs > 0 ? searchLeaf(ctx, Math.min(k, numDocs)) : NO_RESULTS; Review comment: This makes sense to me, I pushed a change. Instead of `Lucene90HnswVectorsReader`, I thought it could make sense to apply the bound in `HnswGraph`. But this turned out messier because there's separate concepts for `topK` and `numSeed` (we're cleaning this up as part of [LUCENE-10054](https://issues.apache.org/jira/browse/LUCENE-10054)). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #413: LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0
msokolov commented on a change in pull request #413: URL: https://github.com/apache/lucene/pull/413#discussion_r737073221 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -60,7 +60,8 @@ public KnnVectorQuery(String field, float[] target, int k) { public Query rewrite(IndexReader reader) throws IOException { TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()]; for (LeafReaderContext ctx : reader.leaves()) { - perLeafResults[ctx.ord] = searchLeaf(ctx, Math.min(k, reader.numDocs())); + int numDocs = ctx.reader().numDocs(); + perLeafResults[ctx.ord] = numDocs > 0 ? searchLeaf(ctx, Math.min(k, numDocs)) : NO_RESULTS; Review comment: Thanks for fixing this - it makes sense to me use `size()` instead of `numDocs()`, or even simply `k`; I wasn't aware of the costly nature of that call. Indeed the idea here was just to avoid spending extra work on tiny segments; something I noticed all the time in tests, but which is probably not much of an issue in reality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #2594: SOLR-14726: Initial draft of a new quickstart guide
noblepaul commented on a change in pull request #2594: URL: https://github.com/apache/lucene-solr/pull/2594#discussion_r737074614 ## File path: solr/solr-ref-guide/src/quickstart.adoc ## @@ -0,0 +1,140 @@ += Quickstart Guide +:experimental: +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here's a quickstart guide to start Solr, add some documents and perform some searches. + +== Starting Solr + +Start a Solr node in cluster mode (SolrCloud mode) + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c + +Waiting up to 180 seconds to see Solr running on port 8983 [\] +Started Solr server on port 8983 (pid=34942). Happy searching! + + +To start another Solr node and have it join the cluster alongside the first node, + +[source,subs="verbatim,attributes+"] + +$ bin/solr -c -z localhost:9983 -p 8984 + + +An instance of the cluster coordination service, i.e. Zookeeper, was started on port 9983 when the first node was started. To start Zookeeper separately, please refer to . + +== Creating a collection + +Like a database system holds data in tables, Solr holds data in collections. A collection can be created as follows: + +[source,subs="verbatim,attributes+"] + +$ curl --request POST \ + --url http://localhost:8983/api/collections \ + --header 'Content-Type: application/json' \ + --data '{ + "create": { + "name": "techproducts", + "numShards": 1, + "replicationFactor": 1 Review comment: why no `config` attribute ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org