[jira] [Created] (SOLR-14890) Refactor code to use annotations for cluster API
Noble Paul created SOLR-14890: - Summary: Refactor code to use annotations for cluster API Key: SOLR-14890 URL: https://issues.apache.org/jira/browse/SOLR-14890 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Noble Paul Assignee: Noble Paul -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
arafalov commented on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697179182 Strong words there "worse than useless", especially considering that this - to me - seems a strong improvement on the current schemaless mode as it looks at more values and actually supports single/multivalued fields. In general, I was trying to implement Hoss's proposal, but I am open to the other ideas, if we can clarify the use case. My understanding is that the use case is of having a lot of data that one does not quite know the shape off. So, they want to index it quickly, explore and then do some manual adjustments. I am not expecting this to be anywhere near production. Schemaless mode should not have been either. I am not sure how many people will know how to do step 6, but currently they don't even have that option. Switching from single-value to multi-value is impossible (very hard?) once the actual values are in the index. One has to basically delete everything and start again. As happens in the films example, if one misses the README. With this one, they can look at field definitions in Admin UI and remove or add fields as required without underlying lucene indexes throwing complains. The way I am seeing this (as well as for other example) is to have a super minimal learning configuration where every additional field is quite obvious. That learning schema, clearly, would not need the step 2 as it would be all setup. I thought your question was about how you would test the code for yourself. Additionally, to help see what was changed, I think the tag JIRA could be helpful. And frankly, in my imagination, it is not a cloud setup, but a simple learning one. Whether that, by itself, is a breaking point for you, we shall have to see. Generating Schema JSON raises its own questions, such as the shape of the schema it will be applied to, as guessing is currently happening as a differential to the existing schema. Also, this does not seem like the code that should be in this particular URP, but more of a general utility. If one existed, maybe it would make sense to leverage on top of it. In general, I am open to implement it any way that seems most useful. I will wait for another couple of opinions rather than chasing one very strong one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader
s1monw commented on pull request #1909: URL: https://github.com/apache/lucene-solr/pull/1909#issuecomment-697184182 > I think our current DV format pulls doc values for a single field several times when flushing/merging, e.g. first to figure out whether the field is single-valued and how many bits per value are needed, and a second time to actually write data. Should we at least cache the last DVs that got pulled so that the second time you pull them, we don't re-do a lot of work? that's correct. some of them are pulled like 5 times. I added a very simple cache and assertions that makes sure we can reuse the same instance if it's pulled more than once. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
arafalov commented on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697188648 Also, I am not even sure there is a pathway to return a non-error message from commit that bin/post will echo to the user as a positive statement. For queries, yes. But we are talking an Update handler with a URP chain. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
noblepaul commented on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697194702 >Strong words there "worse than useless", especially considering that this - to me - seems a strong improvement on the current schemaless mode as it looks at more values and actually supports single/multivalued fields. I was referring to the current solution we have in Solr (schemaless, guess schema thing) . It's not a comment on the new solution. The current solution is indeed worse than useless >Generating Schema JSON raises its own questions, such as the shape of the schema it will be applied to, as guessing is currently happening as a differential to the existing schema. The command is only relevant for that moment. If you execute it right away, it's useful. Users most likely will just copy paste the command (and edit, if required) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul opened a new pull request #1911: SOLR-14890: Refactor code to use annotations for cluster API
noblepaul opened a new pull request #1911: URL: https://github.com/apache/lucene-solr/pull/1911 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader
jpountz commented on a change in pull request #1909: URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493279172 ## File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java ## @@ -510,4 +457,52 @@ public LeafMetaData getMetaData() { return metaData; } + // we try to cache the last used DV or Norms instance since during merge + // this instance is used more than once. We could in addition to this single instance + // also cache the fields that are used for sorting since we do the work twice for these fields + private String cachedField; + private Object cachedObject; + private boolean cacheIsNorms; + + private T getOrCreateNorms(String field, IOSupplier supplier) throws IOException { +return getOrCreate(field, true, supplier); + } + + @SuppressWarnings("unchecked") + private synchronized T getOrCreate(String field, boolean norms, IOSupplier supplier) throws IOException { +if ((field.equals(cachedField) && cacheIsNorms == norms) == false) { + assert assertCreatedOnlyOnce(field, norms); + cachedObject = supplier.get(); + cachedField = field; + cacheIsNorms = norms; + +} +assert cachedObject != null; +return (T) cachedObject; + } + + private final Map cacheStats = new HashMap<>(); // only with assertions enabled + private boolean assertCreatedOnlyOnce(String field, boolean norms) { +assert Thread.holdsLock(this); +// this is mainly there to make sure we change anything in the way we merge we realize it early +Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i == null ? 1 : i.intValue() + 1); +if (timesCached > 1) { + assert norms == false :"[" + field + "] norms must not be cached twice"; Review comment: I think we might cache norms twice if full-text is indexed, as we'd pull norms once for merging norms, and another time to index impacts in postings for the same field. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul closed pull request #1599: SOLR-14586: replace the second function parameter in computeIfAbsent …
noblepaul closed pull request #1599: URL: https://github.com/apache/lucene-solr/pull/1599 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14891) Upgrade Jetty to 9.4.28+ to fix Startup Warning
Bernd Wahlen created SOLR-14891: --- Summary: Upgrade Jetty to 9.4.28+ to fix Startup Warning Key: SOLR-14891 URL: https://issues.apache.org/jira/browse/SOLR-14891 Project: Solr Issue Type: Wish Security Level: Public (Default Security Level. Issues are Public) Reporter: Bernd Wahlen Solr currently using Jetty 9.4.27 which displays strange Warning at startup. I think it is fixed in 9.4.28 https://github.com/eclipse/jetty.project/issues/4631 2020-09-23 09:57:57.346 WARN (main) [ ] o.e.j.x.XmlConfiguration Ignored arg: solr.jetty -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (SOLR-14357) solrj: using insecure namedCurves
[ https://issues.apache.org/jira/browse/SOLR-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Wahlen closed SOLR-14357. --- > solrj: using insecure namedCurves > - > > Key: SOLR-14357 > URL: https://issues.apache.org/jira/browse/SOLR-14357 > Project: Solr > Issue Type: Bug >Reporter: Bernd Wahlen >Priority: Major > > i tried to run our our backend with solrj 8.4.1 on jdk14 and get the > following error: > Caused by: java.lang.IllegalArgumentException: Error in security property. > Constraint unknown: c2tnb191v1 > after i removed all the X9.62 algoriths from the property > jdk.disabled.namedCurves in > /usr/lib/jvm/java-14-openjdk-14.0.0.36-1.rolling.el7.x86_64/conf/security/java.security > everything is running. > This does not happend on staging (i think because of only 1 solr node - not > using lb client). > We do not set or change any ssl settings in solr.in.sh. > I don't know how to fix that (default config?, apache client settings?), but > i think using insecure algorithms may be a security risk and not only a > jdk14 issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Closed] (SOLR-13862) JDK 13+Shenandoah stability/recovery problems
[ https://issues.apache.org/jira/browse/SOLR-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Wahlen closed SOLR-13862. --- > JDK 13+Shenandoah stability/recovery problems > - > > Key: SOLR-13862 > URL: https://issues.apache.org/jira/browse/SOLR-13862 > Project: Solr > Issue Type: Bug >Affects Versions: 8.2 >Reporter: Bernd Wahlen >Priority: Major > > after updating my cluster (centos 7.7, solr 8.2, jdk12) to JDK13 (3 nodes, 4 > collections, 1 shard) everything was running good (with lower p95) for some > hours. Then 2 nodes (not the leader) going to recovery state, but ~"Recovery > failed Error opening new searcher". I tried rolling restart the cluster, but > recovery is not working. After i switched to jdk11 recovery works again. In > summary jdk11 or jdk12 was running stable, jdk13 not. > This is my solr.in.sh: > GC_TUNE="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC" > SOLR_TIMEZONE="CET" > > GC_LOG_OPTS="-Xlog:gc*:file=/var/log/solr/solr_gc.log:time:filecount=9,filesize=20M:safepoint" > I also tried ADDREPLICA during my attempt to reapair the cluster, which > causes Out of Memory on JDK 13 and worked after going back to JDK 11. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14891) Upgrade Jetty to 9.4.28+ to fix Startup Warning
[ https://issues.apache.org/jira/browse/SOLR-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Wahlen updated SOLR-14891: Affects Version/s: 8.6.2 > Upgrade Jetty to 9.4.28+ to fix Startup Warning > --- > > Key: SOLR-14891 > URL: https://issues.apache.org/jira/browse/SOLR-14891 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.6.2 >Reporter: Bernd Wahlen >Priority: Minor > > Solr currently using Jetty 9.4.27 which displays strange Warning at startup. > I think it is fixed in 9.4.28 > https://github.com/eclipse/jetty.project/issues/4631 > 2020-09-23 09:57:57.346 WARN (main) [ ] o.e.j.x.XmlConfiguration Ignored > arg: > class="com.codahale.metrics.jetty9.InstrumentedQueuedThreadPool"> name="registry"> > class="com.codahale.metrics.SharedMetricRegistries">solr.jetty > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200698#comment-17200698 ] Adrien Grand commented on LUCENE-9535: -- I tried to reproduce the slowdown locally but results do not look significant. Since I don't have as many cores as Mike's beast, only 24, I ran with half the index buffer size and the number of threads, ie. 1024MB of index buffer and 18 threads on the wikimediumall corpus. Baseline (master): - 247GB/h 224 flushes - 259GB/h 225 flushes - 248GB/h 226 flushes - 262GB/h 224 flushes Patch (stored fields ignored in IndexingChain memory accounting): - 256GB/h 224 flushes - 258GB/h 223 flushes While the nightly benchmarks are seeing a ~10% slowdown, I'm not seeing a significant change. I'm running out of ideas so I will decrease the block size of stored fields later today to see whether that makes a difference for nightly benchmarks, which might help confirm whether stored fields are actually the problem or whether it's something else. > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true
Hronom commented on a change in pull request #1864: URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493408910 ## File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java ## @@ -94,6 +94,12 @@ protected ShardRequest doRetrieveStatsRequest(ResponseBuilder rb) { protected void doMergeToGlobalStats(SolrQueryRequest req, List responses) { Set allTerms = new HashSet<>(); for (ShardResponse r : responses) { + if ("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && r.getException() != null) { Review comment: @sigram thank you for details there to put this, let me work on this, I will ping you back when it's done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true
Hronom commented on a change in pull request #1864: URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493408910 ## File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java ## @@ -94,6 +94,12 @@ protected ShardRequest doRetrieveStatsRequest(ResponseBuilder rb) { protected void doMergeToGlobalStats(SolrQueryRequest req, List responses) { Set allTerms = new HashSet<>(); for (ShardResponse r : responses) { + if ("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && r.getException() != null) { Review comment: @sigram thank you for details where to put this, let me work on this, I will ping you back when it's done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200713#comment-17200713 ] ASF subversion and git services commented on LUCENE-9535: - Commit 12dd19427e4888421202115fd86d87d0bb04eae6 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=12dd194 ] LUCENE-9535: Reduce the size of compressed blocks of stored fields by 2x. In order to see whether this has any effect on nigthly benchmarks. > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200714#comment-17200714 ] ASF subversion and git services commented on LUCENE-9535: - Commit 12664ddbc188c4c1c7f73de7493f341befe32fd0 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=12664dd ] LUCENE-9535: Reduce the size of compressed blocks of stored fields by 2x. In order to see whether this has any effect on nigthly benchmarks. > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1912: LUCENE-9535: Try to do larger flushes.
jpountz opened a new pull request #1912: URL: https://github.com/apache/lucene-solr/pull/1912 DWPTPool currently always returns the last DWPT that was added to the pool. By returning the largest DWPT instead, we could try to do larger flushes by finishing DWPTs that are close to being full instead of the last one that was added to the pool, which might be close to being empty. When indexing wikimediumall, this change did not seem to improve the indexing rate significantly, but it didn't slow things down either and the number of flushes went from 224-226 to 216, about 4% less. My expectation is that our nightly benchmarks are a best-case scenario for DWPTPool as the same number of threads is dedicated to indexing over time, but in the case when you have e.g. a single fixed threadpool that is responsible for indexing into several indices, the number of indexing threads that contribute to a given index might greatly vary over time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jimczi commented on a change in pull request #1903: Fix bug in sort optimization
jimczi commented on a change in pull request #1903: URL: https://github.com/apache/lucene-solr/pull/1903#discussion_r493439005 ## File path: lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java ## @@ -432,7 +439,48 @@ public void testDocSortOptimization() throws IOException { assertTrue(topDocs.totalHits.value < 10); // assert that very few docs were collected } +reader.close(); +dir.close(); + } + + /** + * Test that sorting on _doc works correctly. + * This test goes through DefaultBulkSorter::scoreRange, where scorerIterator is BitSetIterator. + * As a conjunction of this BitSetIterator with DocComparator's iterator, we get BitSetConjunctionDISI. + * BitSetConjuctionDISI advances based on the DocComparator's iterator, and doesn't consider + * that its BitSetIterator may have advanced passed a certain doc. Review comment: Should we consider this a bug in `BitSetIterator` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader
s1monw commented on a change in pull request #1909: URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493477140 ## File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java ## @@ -510,4 +457,52 @@ public LeafMetaData getMetaData() { return metaData; } + // we try to cache the last used DV or Norms instance since during merge + // this instance is used more than once. We could in addition to this single instance + // also cache the fields that are used for sorting since we do the work twice for these fields + private String cachedField; + private Object cachedObject; + private boolean cacheIsNorms; + + private T getOrCreateNorms(String field, IOSupplier supplier) throws IOException { +return getOrCreate(field, true, supplier); + } + + @SuppressWarnings("unchecked") + private synchronized T getOrCreate(String field, boolean norms, IOSupplier supplier) throws IOException { +if ((field.equals(cachedField) && cacheIsNorms == norms) == false) { + assert assertCreatedOnlyOnce(field, norms); + cachedObject = supplier.get(); + cachedField = field; + cacheIsNorms = norms; + +} +assert cachedObject != null; +return (T) cachedObject; + } + + private final Map cacheStats = new HashMap<>(); // only with assertions enabled + private boolean assertCreatedOnlyOnce(String field, boolean norms) { +assert Thread.holdsLock(this); +// this is mainly there to make sure we change anything in the way we merge we realize it early +Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i == null ? 1 : i.intValue() + 1); +if (timesCached > 1) { + assert norms == false :"[" + field + "] norms must not be cached twice"; Review comment: can you point me to the place where we do this? If that is the case our tests are not good enough here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader
s1monw commented on a change in pull request #1909: URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493482012 ## File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java ## @@ -510,4 +457,52 @@ public LeafMetaData getMetaData() { return metaData; } + // we try to cache the last used DV or Norms instance since during merge + // this instance is used more than once. We could in addition to this single instance + // also cache the fields that are used for sorting since we do the work twice for these fields + private String cachedField; + private Object cachedObject; + private boolean cacheIsNorms; + + private T getOrCreateNorms(String field, IOSupplier supplier) throws IOException { +return getOrCreate(field, true, supplier); + } + + @SuppressWarnings("unchecked") + private synchronized T getOrCreate(String field, boolean norms, IOSupplier supplier) throws IOException { +if ((field.equals(cachedField) && cacheIsNorms == norms) == false) { + assert assertCreatedOnlyOnce(field, norms); + cachedObject = supplier.get(); + cachedField = field; + cacheIsNorms = norms; + +} +assert cachedObject != null; +return (T) cachedObject; + } + + private final Map cacheStats = new HashMap<>(); // only with assertions enabled + private boolean assertCreatedOnlyOnce(String field, boolean norms) { +assert Thread.holdsLock(this); +// this is mainly there to make sure we change anything in the way we merge we realize it early +Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i == null ? 1 : i.intValue() + 1); +if (timesCached > 1) { + assert norms == false :"[" + field + "] norms must not be cached twice"; Review comment: I think what we do here is we pull the already merged norms instance from disk instead of the one from the source reader. Is that what you mean in `PushPostingsWriterBase` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul edited a comment on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
noblepaul edited a comment on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697194702 >Strong words there "worse than useless", especially considering that this - to me - seems a strong improvement on the current schemaless mode as it looks at more values and actually supports single/multivalued fields. I was referring to the current solution we have in Solr (schemaless, guess schema thing) . It's not a comment on the new solution. The current schemaless is indeed worse than useless >Generating Schema JSON raises its own questions, such as the shape of the schema it will be applied to, as guessing is currently happening as a differential to the existing schema. The command is only relevant for that moment. If you execute it right away, it's useful. Users most likely will just copy paste the command (and edit, if required) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul edited a comment on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
noblepaul edited a comment on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697194702 >Strong words there "worse than useless", especially considering that this - to me - seems a strong improvement on the current schemaless mode as it looks at more values and actually supports single/multivalued fields. I'm sorry for the confusion. I was referring to the current solution we have in Solr (schemaless, guess schema thing) . It's not a comment on the new solution. The current schemaless is indeed worse than useless >Generating Schema JSON raises its own questions, such as the shape of the schema it will be applied to, as guessing is currently happening as a differential to the existing schema. The command is only relevant for that moment. If you execute it right away, it's useful. Users most likely will just copy paste the command (and edit, if required) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader
jpountz commented on a change in pull request #1909: URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493490751 ## File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java ## @@ -510,4 +457,52 @@ public LeafMetaData getMetaData() { return metaData; } + // we try to cache the last used DV or Norms instance since during merge + // this instance is used more than once. We could in addition to this single instance + // also cache the fields that are used for sorting since we do the work twice for these fields + private String cachedField; + private Object cachedObject; + private boolean cacheIsNorms; + + private T getOrCreateNorms(String field, IOSupplier supplier) throws IOException { +return getOrCreate(field, true, supplier); + } + + @SuppressWarnings("unchecked") + private synchronized T getOrCreate(String field, boolean norms, IOSupplier supplier) throws IOException { +if ((field.equals(cachedField) && cacheIsNorms == norms) == false) { + assert assertCreatedOnlyOnce(field, norms); + cachedObject = supplier.get(); + cachedField = field; + cacheIsNorms = norms; + +} +assert cachedObject != null; +return (T) cachedObject; + } + + private final Map cacheStats = new HashMap<>(); // only with assertions enabled + private boolean assertCreatedOnlyOnce(String field, boolean norms) { +assert Thread.holdsLock(this); +// this is mainly there to make sure we change anything in the way we merge we realize it early +Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i == null ? 1 : i.intValue() + 1); +if (timesCached > 1) { + assert norms == false :"[" + field + "] norms must not be cached twice"; Review comment: Ah I had forgotten we were doing things this way. Then ignore my comment! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14890) Refactor code to use annotations for configset API
[ https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-14890: -- Summary: Refactor code to use annotations for configset API (was: Refactor code to use annotations for cluster API) > Refactor code to use annotations for configset API > -- > > Key: SOLR-14890 > URL: https://issues.apache.org/jira/browse/SOLR-14890 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul merged pull request #1911: SOLR-14890: Refactor code to use annotations for configset API
noblepaul merged pull request #1911: URL: https://github.com/apache/lucene-solr/pull/1911 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14890) Refactor code to use annotations for configset API
[ https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200779#comment-17200779 ] ASF subversion and git services commented on SOLR-14890: Commit fd0c08615df9440061e5ae664dcfa3f5a7600568 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd0c086 ] SOLR-14890: Refactor code to use annotations for configset API (#1911) > Refactor code to use annotations for configset API > -- > > Key: SOLR-14890 > URL: https://issues.apache.org/jira/browse/SOLR-14890 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke opened a new pull request #1913: SOLR-11167: Avoid $SOLR_STOP_WAIT use during 'bin/solr start' if $SOLR_START_WAIT is supplied.
cpoerschke opened a new pull request #1913: URL: https://github.com/apache/lucene-solr/pull/1913 https://issues.apache.org/jira/browse/SOLR-11167 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14890) Refactor code to use annotations for configset API
[ https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200784#comment-17200784 ] ASF subversion and git services commented on SOLR-14890: Commit fd0c08615df9440061e5ae664dcfa3f5a7600568 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd0c086 ] SOLR-14890: Refactor code to use annotations for configset API (#1911) > Refactor code to use annotations for configset API > -- > > Key: SOLR-14890 > URL: https://issues.apache.org/jira/browse/SOLR-14890 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14890) Refactor code to use annotations for configset API
[ https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200783#comment-17200783 ] ASF subversion and git services commented on SOLR-14890: Commit fd0c08615df9440061e5ae664dcfa3f5a7600568 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd0c086 ] SOLR-14890: Refactor code to use annotations for configset API (#1911) > Refactor code to use annotations for configset API > -- > > Key: SOLR-14890 > URL: https://issues.apache.org/jira/browse/SOLR-14890 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start
[ https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200786#comment-17200786 ] Christine Poerschke commented on SOLR-11167: Oops, a three year old ticket, not quite sure what happened here, apologies [~omar_abdelnabi]. Thanks for attaching a patch! The patch after all this time unfortunately doesn't apply to the current master branch anymore. Hence I've replaced it with [https://github.com/apache/lucene-solr/pull/1913] instead, with two small differences: * {{solr.in.cmd}} changes left out of scope i.e. since it does not yet use $SOLR_STOP_WAIT currently it would be clearer to separately add $SOLR_START_WAIT and $SOLR_STOP_WAIT support for {{solr.cmd}} * instead of {{SOLR_START_WAIT=180}} initialisation (if no SOLR_START_WAIT was supplied) using {{SOLR_START_WAIT=$SOLR_STOP_WAIT}} will help ensure backwards compatibility for users that currently customise SOLR_STOP_WAIT e.g. anyone is currently setting {{SOLR_STOP_WAIT=42}} then they will continue to see 42s used for both stop and start even if they don't explicitly configure {{SOLR_START_WAIT=42}} > bin/solr uses $SOLR_STOP_WAIT during start > -- > > Key: SOLR-11167 > URL: https://issues.apache.org/jira/browse/SOLR-11167 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Christine Poerschke >Priority: Minor > Attachments: SOLR-11167.patch > > Time Spent: 10m > Remaining Estimate: 0h > > bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would > be clearer to have a separate $SOLR_START_WAIT variable. > related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in > solr.in.cmd equivalent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start
[ https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke reassigned SOLR-11167: -- Assignee: Christine Poerschke > bin/solr uses $SOLR_STOP_WAIT during start > -- > > Key: SOLR-11167 > URL: https://issues.apache.org/jira/browse/SOLR-11167 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Attachments: SOLR-11167.patch > > Time Spent: 10m > Remaining Estimate: 0h > > bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would > be clearer to have a separate $SOLR_START_WAIT variable. > related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in > solr.in.cmd equivalent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start
[ https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-11167: --- Fix Version/s: 8.7 master (9.0) > bin/solr uses $SOLR_STOP_WAIT during start > -- > > Key: SOLR-11167 > URL: https://issues.apache.org/jira/browse/SOLR-11167 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Fix For: master (9.0), 8.7 > > Attachments: SOLR-11167.patch > > Time Spent: 10m > Remaining Estimate: 0h > > bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would > be clearer to have a separate $SOLR_START_WAIT variable. > related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in > solr.in.cmd equivalent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9539) Improve memory footprint of SortingCodecReader
[ https://issues.apache.org/jira/browse/LUCENE-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200789#comment-17200789 ] ASF subversion and git services commented on LUCENE-9539: - Commit 17c285d61743da0c06735e06235b20bd5aac4e14 in lucene-solr's branch refs/heads/master from Simon Willnauer [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=17c285d ] LUCENE-9539: Remove caches from SortingCodecReader (#1909) SortingCodecReader keeps all docvalues in memory that are loaded from this reader. Yet, this reader should only be used for merging which happens sequentially. This makes caching docvalues unnecessary. Co-authored-by: Jim Ferenczi > Improve memory footprint of SortingCodecReader > -- > > Key: LUCENE-9539 > URL: https://issues.apache.org/jira/browse/LUCENE-9539 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > SortingCodecReader is a very memory heavy since it needs to re-sort and load > large parts of the index into memory. We can try to make it more efficient by > using more compact internal data-structures, remove the caches it uses > provided we define it's usage as a merge only reader wrapper. Ultimately we > need to find a way to allow the reader or some other structure to minimize > its heap memory. One way is to slice existing readers and merge them in > multiple steps. There will be multiple steps towards a more useable version > of this class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw merged pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader
s1monw merged pull request #1909: URL: https://github.com/apache/lucene-solr/pull/1909 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9539) Improve memory footprint of SortingCodecReader
[ https://issues.apache.org/jira/browse/LUCENE-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200795#comment-17200795 ] ASF subversion and git services commented on LUCENE-9539: - Commit 427e11c7f644a05be93bb801ca394b90dccf8df6 in lucene-solr's branch refs/heads/branch_8x from Simon Willnauer [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=427e11c ] LUCENE-9539: Remove caches from SortingCodecReader (#1909) SortingCodecReader keeps all docvalues in memory that are loaded from this reader. Yet, this reader should only be used for merging which happens sequentially. This makes caching docvalues unnecessary. Co-authored-by: Jim Ferenczi > Improve memory footprint of SortingCodecReader > -- > > Key: LUCENE-9539 > URL: https://issues.apache.org/jira/browse/LUCENE-9539 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Simon Willnauer >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > SortingCodecReader is a very memory heavy since it needs to re-sort and load > large parts of the index into memory. We can try to make it more efficient by > using more compact internal data-structures, remove the caches it uses > provided we define it's usage as a merge only reader wrapper. Ultimately we > need to find a way to allow the reader or some other structure to minimize > its heap memory. One way is to slice existing readers and merge them in > multiple steps. There will be multiple steps towards a more useable version > of this class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
[ https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-14503: --- Assignee: Munendra S N > Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property > --- > > Key: SOLR-14503 > URL: https://issues.apache.org/jira/browse/SOLR-14503 > Project: Solr > Issue Type: Bug >Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, > 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1 >Reporter: Colvin Cowie >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-14503.patch, SOLR-14503.patch > > > When starting Solr in cloud mode, if zookeeper is not available within 30 > seconds, then core container intialization fails and the node will not > recover when zookeeper is available. > > I believe SOLR-5129 should have addressed this issue, however it doesn't > quite do so for two reasons: > # > [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297] > it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} > rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int > zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds > is used even when you specify a different waitForZk value > # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK > environment property > [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but > there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK > appears in the solr.in.cmd as an example. > > I will attach a patch that fixes the above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
[ https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200808#comment-17200808 ] Munendra S N commented on SOLR-14503: - I'm planning to commit current patch and handle other cases of zkClientTimeout usage in a separate issue > Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property > --- > > Key: SOLR-14503 > URL: https://issues.apache.org/jira/browse/SOLR-14503 > Project: Solr > Issue Type: Bug >Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, > 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1 >Reporter: Colvin Cowie >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-14503.patch, SOLR-14503.patch > > > When starting Solr in cloud mode, if zookeeper is not available within 30 > seconds, then core container intialization fails and the node will not > recover when zookeeper is available. > > I believe SOLR-5129 should have addressed this issue, however it doesn't > quite do so for two reasons: > # > [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297] > it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} > rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int > zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds > is used even when you specify a different waitForZk value > # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK > environment property > [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but > there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK > appears in the solr.in.cmd as an example. > > I will attach a patch that fixes the above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14333) Implement toString() in CollapsingPostFilter
[ https://issues.apache.org/jira/browse/SOLR-14333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N reassigned SOLR-14333: --- Assignee: Munendra S N > Implement toString() in CollapsingPostFilter > > > Key: SOLR-14333 > URL: https://issues.apache.org/jira/browse/SOLR-14333 > Project: Solr > Issue Type: Improvement >Reporter: Munendra S N >Assignee: Munendra S N >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > {{toString()}} is not overridden in CollapsingPostFilter. Debug component > returns {{parsed_filter_queries}}, for multiple CollapsingPostFilter in > request, value in {{parsed_filter_queries}} is always > {{CollapsingPostFilter()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
[ https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200811#comment-17200811 ] Colvin Cowie commented on SOLR-14503: - Hi [~munendrasn], thanks. Sorry I've not got any time at the moment. Thanks > Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property > --- > > Key: SOLR-14503 > URL: https://issues.apache.org/jira/browse/SOLR-14503 > Project: Solr > Issue Type: Bug >Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, > 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1 >Reporter: Colvin Cowie >Assignee: Munendra S N >Priority: Minor > Attachments: SOLR-14503.patch, SOLR-14503.patch > > > When starting Solr in cloud mode, if zookeeper is not available within 30 > seconds, then core container intialization fails and the node will not > recover when zookeeper is available. > > I believe SOLR-5129 should have addressed this issue, however it doesn't > quite do so for two reasons: > # > [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297] > it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} > rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int > zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds > is used even when you specify a different waitForZk value > # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK > environment property > [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but > there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK > appears in the solr.in.cmd as an example. > > I will attach a patch that fixes the above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1903: Fix bug in sort optimization
mayya-sharipova commented on a change in pull request #1903: URL: https://github.com/apache/lucene-solr/pull/1903#discussion_r493568956 ## File path: lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java ## @@ -432,7 +439,48 @@ public void testDocSortOptimization() throws IOException { assertTrue(topDocs.totalHits.value < 10); // assert that very few docs were collected } +reader.close(); +dir.close(); + } + + /** + * Test that sorting on _doc works correctly. + * This test goes through DefaultBulkSorter::scoreRange, where scorerIterator is BitSetIterator. + * As a conjunction of this BitSetIterator with DocComparator's iterator, we get BitSetConjunctionDISI. + * BitSetConjuctionDISI advances based on the DocComparator's iterator, and doesn't consider + * that its BitSetIterator may have advanced passed a certain doc. Review comment: I will create an issue for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova merged pull request #1903: Fix bug in sort optimization
mayya-sharipova merged pull request #1903: URL: https://github.com/apache/lucene-solr/pull/1903 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] munendrasn commented on pull request #1900: SOLR-14036: Remove explicit distrib=false from /terms handler
munendrasn commented on pull request #1900: URL: https://github.com/apache/lucene-solr/pull/1900#issuecomment-697360102 I have included the changes and upgrade entry. Instead of adding upgrade entry to `solr-upgrade-notes.adoc`, I have added to `major-changes-in-solr-9.adoc` as mentioned the former doc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] munendrasn opened a new pull request #1914: Move 9x upgrade notes out of changes.txt
munendrasn opened a new pull request #1914: URL: https://github.com/apache/lucene-solr/pull/1914 Upgrade notes have been moved out of changes.txt. While working on PR #1900, I found there were few entries which were still present in changes.txt (most likely added at a later time) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser
[ https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200828#comment-17200828 ] Gus Heck commented on SOLR-14787: - I have found something interesting WRT the failing case you mention... it only fails when I run the test in my IDE. If I use the ant build it passes. I notice some interesting differences in startup for these two scenarios... build: {code:java} [junit4] Suite: org.apache.solr.search.TestPayloadCheckQParserPlugin [junit4] 2> 1454 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to test-framework derived value of '/home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/server/solr/configsets/_default/conf' [junit4] 2> 1475 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 Created dataDir: /home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/build/solr-core/test/J0/temp/solr.search.TestPayloadCheckQParserPlugin_AB5E0FC0380BB866-001/data-dir-1-001 [junit4] 2> 1551 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 Using TrieFields (NUMERIC_POINTS_SYSPROP=false) w/NUMERIC_DOCVALUES_SYSPROP=true [junit4] 2> 1592 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.e.j.u.log Logging initialized @1620ms to org.eclipse.jetty.util.log.Slf4jLog [junit4] 2> 1597 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) via: @org.apache.solr.util.RandomizeSSL(reason=, ssl=NaN, value=NaN, clientAuth=NaN) [junit4] 2> 1621 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 SecureRandom sanity checks: test.solr.allowed.securerandom=null & java.security.egd=file:/dev/./urandom [junit4] 2> 1626 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.SolrTestCaseJ4 initCore [junit4] 2> 1757 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrConfig Using Lucene MatchVersion: 8.7.0 [junit4] 2> 1901 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.s.IndexSchema Schema name=example [junit4] 2> 1931 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieIntField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1936 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieFloatField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1940 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieLongField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1944 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieDoubleField]. Please consult documentation how to replace it accordingly. [junit4] 2> 1966 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.TrieDateField]. Please consult documentation how to replace it accordingly. [junit4] 2> 2202 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.GeoHashField]. Please consult documentation how to replace it accordingly. [junit4] 2> 2208 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.LatLonType]. Please consult documentation how to replace it accordingly. [junit4] 2> 2217 WARN (SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class [solr.EnumField]. Please consult documentation how to replace it accordingly. {code} IDE (Intellij) {code:java} 1172 INFO (SUITE-TestPayloadCheckQParserPlugin-seed#[5A2517E33080AEE6]-worker) [ ] o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to test-framework derived value of '/home/gus/projects/apache/lucene-solr/fork/lucene-solr/solr/server/solr/configsets/_default/conf' 1190 INFO (SUITE-TestPayloa
[GitHub] [lucene-solr] munendrasn commented on pull request #1914: Move 9x upgrade notes out of changes.txt
munendrasn commented on pull request #1914: URL: https://github.com/apache/lucene-solr/pull/1914#issuecomment-697368032 @noblepaul @sigram Please review. I have moved the entries added by you guys, so would prefer your reviews This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common
mocobeta commented on pull request #1836: URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-697375125 @uschindler seems busy. I don't want to maintain this branch for very long (the diff is so large), but I need at least one reviewer to proceed this. @dweiss would you take care this, if you have some time? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9541) BitSetConjunctionDISI can advance backwards from its components
Mayya Sharipova created LUCENE-9541: --- Summary: BitSetConjunctionDISI can advance backwards from its components Key: LUCENE-9541 URL: https://issues.apache.org/jira/browse/LUCENE-9541 Project: Lucene - Core Issue Type: Bug Reporter: Mayya Sharipova Not completely sure if this is a bug. BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents. This behaviour was exposed in this PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI can advance to docs before its components
[ https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova updated LUCENE-9541: Summary: BitSetConjunctionDISI can advance to docs before its components (was: BitSetConjunctionDISI can advance backwards from its components) > BitSetConjunctionDISI can advance to docs before its components > --- > > Key: LUCENE-9541 > URL: https://issues.apache.org/jira/browse/LUCENE-9541 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Minor > > Not completely sure if this is a bug. > BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, > and doesn't consider that its another component – BitSetIterator may have > already advanced passed a certain doc. This may result in duplicate documents. > This behaviour was exposed in this PR. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI can advance to docs before its components
[ https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova updated LUCENE-9541: Description: Not completely sure if this is a bug. BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents. This behaviour was exposed in this [PR|https://github.com/apache/lucene-solr/pull/1903]. was: Not completely sure if this is a bug. BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents. This behaviour was exposed in this PR. > BitSetConjunctionDISI can advance to docs before its components > --- > > Key: LUCENE-9541 > URL: https://issues.apache.org/jira/browse/LUCENE-9541 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Minor > > Not completely sure if this is a bug. > BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, > and doesn't consider that its another component – BitSetIterator may have > already advanced passed a certain doc. This may result in duplicate documents. > This behaviour was exposed in this > [PR|https://github.com/apache/lucene-solr/pull/1903]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI can advance to docs before its components
[ https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova updated LUCENE-9541: Description: Not completely sure if this is a bug. BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents. For example if BitSetConjuctionDISI _disi_ is composed of DocIdSetIterator _a_ of docs [0,1] and BitSetIterator _b_ of docs [0,1]. Doing `b.nextDoc()` we are collecting doc0, doing `disi.nextDoc` we again collecting the same doc0. It seems that other conjunction iterators don't have this behaviour, if we are advancing any of their component pass a certain document, the whole conjunction iterator will also be advanced pass this document. This behaviour was exposed in this [PR|https://github.com/apache/lucene-solr/pull/1903]. was: Not completely sure if this is a bug. BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents. This behaviour was exposed in this [PR|https://github.com/apache/lucene-solr/pull/1903]. > BitSetConjunctionDISI can advance to docs before its components > --- > > Key: LUCENE-9541 > URL: https://issues.apache.org/jira/browse/LUCENE-9541 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Minor > > Not completely sure if this is a bug. > BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, > and doesn't consider that its another component – BitSetIterator may have > already advanced passed a certain doc. This may result in duplicate documents. > For example if BitSetConjuctionDISI _disi_ is composed of DocIdSetIterator > _a_ of docs [0,1] and BitSetIterator _b_ of docs [0,1]. Doing `b.nextDoc()` > we are collecting doc0, doing `disi.nextDoc` we again collecting the same > doc0. > It seems that other conjunction iterators don't have this behaviour, if we > are advancing any of their component pass a certain document, the whole > conjunction iterator will also be advanced pass this document. > > This behaviour was exposed in this > [PR|https://github.com/apache/lucene-solr/pull/1903]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI doesn't advance based on its components
[ https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayya Sharipova updated LUCENE-9541: Summary: BitSetConjunctionDISI doesn't advance based on its components (was: BitSetConjunctionDISI can advance to docs before its components) > BitSetConjunctionDISI doesn't advance based on its components > - > > Key: LUCENE-9541 > URL: https://issues.apache.org/jira/browse/LUCENE-9541 > Project: Lucene - Core > Issue Type: Bug >Reporter: Mayya Sharipova >Priority: Minor > > Not completely sure if this is a bug. > BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, > and doesn't consider that its another component – BitSetIterator may have > already advanced passed a certain doc. This may result in duplicate documents. > For example if BitSetConjuctionDISI _disi_ is composed of DocIdSetIterator > _a_ of docs [0,1] and BitSetIterator _b_ of docs [0,1]. Doing `b.nextDoc()` > we are collecting doc0, doing `disi.nextDoc` we again collecting the same > doc0. > It seems that other conjunction iterators don't have this behaviour, if we > are advancing any of their component pass a certain document, the whole > conjunction iterator will also be advanced pass this document. > > This behaviour was exposed in this > [PR|https://github.com/apache/lucene-solr/pull/1903]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1903: Fix bug in sort optimization
mayya-sharipova commented on a change in pull request #1903: URL: https://github.com/apache/lucene-solr/pull/1903#discussion_r493610854 ## File path: lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java ## @@ -432,7 +439,48 @@ public void testDocSortOptimization() throws IOException { assertTrue(topDocs.totalHits.value < 10); // assert that very few docs were collected } +reader.close(); +dir.close(); + } + + /** + * Test that sorting on _doc works correctly. + * This test goes through DefaultBulkSorter::scoreRange, where scorerIterator is BitSetIterator. + * As a conjunction of this BitSetIterator with DocComparator's iterator, we get BitSetConjunctionDISI. + * BitSetConjuctionDISI advances based on the DocComparator's iterator, and doesn't consider + * that its BitSetIterator may have advanced passed a certain doc. Review comment: Issue created: https://issues.apache.org/jira/browse/LUCENE-9541 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
arafalov commented on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697406760 Ok, I am glad we are on the same page that the current (let's call it _Add_) solution is rather bad despite all the great work put into it. Let's now get onto the same page about the next step you are actually proposing. I can read the rest of your statement in one of the following ways: 1. Neither original _Add_ nor proposed _Guess_ solutions will address problem. **Next step: that discussion is not about code and should be taken up in the parent JIRA**. That's exactly what it is there for and this code/PR is here to push the discussion from theoretical to practical. 2. _Guess_ approach is ok overall, but the schema creation is still bad, could it return schema generation commands instead. I just double-checked code and there is no way for the current architecture to return non-error feedback (from either processCommit or SimplePostTool side). **Next step: Propose a way this could be done.** Do note that the reason we are still an URP is because any schema guessing or creation depends on previous chain URPs to be always enabled (e.g. for custom dates formats); that is one of the things really broken with enable/disable flag for _Add_ solution and why I am doing the single-URP level flag. 3. We need some other Guess approach. **Next action: propose alternative architecture, preferably as straw-man implementation**. This would give people on JIRA a chance to select from TWO ways forward, that would be amazing whether we end on one, another or merged solution. 4. ??? Use veto and keep status quo until somebody yet different has a much better idea than people in last 3 JIRA? 5. ??? (I don't claim to read your mind, but I want to move this discussion forward in concrete non-blocking steps) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200848#comment-17200848 ] Gus Heck commented on SOLR-8281: This seems related to something I wanted to do for a client... I had reduce with group() and I wanted to then feed the groups to an arbitrary streaming expression for further processing, and have the result show up in the groups (result would have been a matrix). Problem I stopped on was how to express the stream to process the group without having a source (the source is the group). > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] HoustonPutman commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
HoustonPutman commented on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697420014 Purely responding to the URP response part, it’s definitely not possible for URP to send non-error responses. I do think its something we should implement though, since it will expand the use cases that URPs can solve. Ill create a JIRA for it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova opened a new pull request #1915: Fix bug in sort optimization (#1903)
mayya-sharipova opened a new pull request #1915: URL: https://github.com/apache/lucene-solr/pull/1915 Fix bug how iterator with skipping functionality advances and produces docs Relates to #1725 Backport for #1903 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova merged pull request #1915: Fix bug in sort optimization (#1903)
mayya-sharipova merged pull request #1915: URL: https://github.com/apache/lucene-solr/pull/1915 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dalbani opened a new pull request #1916: Fix minor typo
dalbani opened a new pull request #1916: URL: https://github.com/apache/lucene-solr/pull/1916 Ignoring the default issue template given that this PR is about a tiny fix for a typo. Right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true
Hronom commented on a change in pull request #1864: URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797 ## File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java ## @@ -94,6 +94,12 @@ protected ShardRequest doRetrieveStatsRequest(ResponseBuilder rb) { protected void doMergeToGlobalStats(SolrQueryRequest req, List responses) { Set allTerms = new HashSet<>(); for (ShardResponse r : responses) { + if ("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && r.getException() != null) { Review comment: @sigram @madrob I added test that reproduces the problem in `TestExactStatsCache`. Please can you adjust it(if needed) to nicely feet in solr tests suits. The trick here with this issue, that is reproducible only when at least one shard is fully down. This is why I didn't use `setDistributedParams`, since it's add's one work replica, so all shards is healthy and there no situation when one shard is completely down. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true
Hronom commented on a change in pull request #1864: URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797 ## File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java ## @@ -94,6 +94,12 @@ protected ShardRequest doRetrieveStatsRequest(ResponseBuilder rb) { protected void doMergeToGlobalStats(SolrQueryRequest req, List responses) { Set allTerms = new HashSet<>(); for (ShardResponse r : responses) { + if ("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && r.getException() != null) { Review comment: @sigram @madrob I added test that reproduces the problem in `TestExactStatsCache`. Please can you adjust it(if needed) to nicely fit in solr tests suits. The trick here with this issue, that is reproducible only when at least one shard is fully down. This is why I didn't use `setDistributedParams`, since it's add's one work replica, so all shards is healthy and there no situation when one shard is completely down. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true
Hronom commented on a change in pull request #1864: URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797 ## File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java ## @@ -94,6 +94,12 @@ protected ShardRequest doRetrieveStatsRequest(ResponseBuilder rb) { protected void doMergeToGlobalStats(SolrQueryRequest req, List responses) { Set allTerms = new HashSet<>(); for (ShardResponse r : responses) { + if ("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && r.getException() != null) { Review comment: @sigram @madrob I added test that reproduces the problem in `TestExactStatsCache`. And it fails with null exception if you remove my fix. Please can you adjust it(if needed) to nicely fit in solr tests suits. The trick here with this issue, that its reproducible only when at least one shard is fully down(no healthy replica there). This is why I didn't use `setDistributedParams`, since it's add's one work replica, so all shards is healthy and there no situation when one shard is completely down. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true
Hronom commented on a change in pull request #1864: URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797 ## File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java ## @@ -94,6 +94,12 @@ protected ShardRequest doRetrieveStatsRequest(ResponseBuilder rb) { protected void doMergeToGlobalStats(SolrQueryRequest req, List responses) { Set allTerms = new HashSet<>(); for (ShardResponse r : responses) { + if ("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && r.getException() != null) { Review comment: @sigram @madrob I added test that reproduces the problem in `TestExactStatsCache`. Please can you adjust it(if needed) to nicely fit in solr tests suits. The trick here with this issue, that its reproducible only when at least one shard is fully down(no healthy replica there). This is why I didn't use `setDistributedParams`, since it's add's one work replica, so all shards is healthy and there no situation when one shard is completely down. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true
Hronom commented on a change in pull request #1864: URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797 ## File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java ## @@ -94,6 +94,12 @@ protected ShardRequest doRetrieveStatsRequest(ResponseBuilder rb) { protected void doMergeToGlobalStats(SolrQueryRequest req, List responses) { Set allTerms = new HashSet<>(); for (ShardResponse r : responses) { + if ("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && r.getException() != null) { Review comment: @sigram @madrob I added test that reproduces the problem in `TestExactStatsCache`. And it fails with null exception if you remove my fix. Please can you adjust it(if needed) to nicely fit in solr tests suits, I set now `Allow edits by maintainers`. The trick here with this issue, that its reproducible only when at least one shard is fully down(no healthy replica there). This is why I didn't use `setDistributedParams`, since it's add's one work replica, so all shards is healthy and there no situation when one shard is completely down. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #1916: Fix minor typo
madrob commented on pull request #1916: URL: https://github.com/apache/lucene-solr/pull/1916#issuecomment-697511015 Thank you for finding and correcting this! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #1916: Fix minor typo
madrob merged pull request #1916: URL: https://github.com/apache/lucene-solr/pull/1916 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200886#comment-17200886 ] Joel Bernstein commented on SOLR-8281: -- [~gus], feel free to send me an email to discuss. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.
jpountz opened a new pull request #1917: URL: https://github.com/apache/lucene-solr/pull/1917 This is called transitively from `DocumentsWriterFlushControl#doAfterDocument` which is synchronized and appears to be a point of contention. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200946#comment-17200946 ] Adrien Grand commented on LUCENE-9535: -- I might have found something. When profiling indexing I noticed some contention in {{DocumentsWriterFlushControl#doAfterDocument}}, which happens to transitively call {{IndexingChain#ramBytesUsed}}, which was changed in LUCENE-9511 to call {{StoredFieldsWriter#ramBytesUsed}}. And {{StoredFieldsWriter#ramBytesUsed}} calls {{ByteBuffersDataOutput#ramBytesUsed}} which is a bit slow since it iterates over all pages. So we might have increased contention on {{DocumentsWriterFlushControl#doAfterDocument}} in LUCENE-9511, and this is only noticeable on Mike's beast because of the very high number of indexing threads (36). I opened https://github.com/apache/lucene-solr/pull/1917. > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > Time Spent: 20m > Remaining Estimate: 0h > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz merged pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.
jpountz merged pull request #1917: URL: https://github.com/apache/lucene-solr/pull/1917 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201008#comment-17201008 ] ASF subversion and git services commented on LUCENE-9535: - Commit d226abd4481a5bd837264a7c53d1b13f417842ad in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d226abd ] LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. (#1917) > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > Time Spent: 0.5h > Remaining Estimate: 0h > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201011#comment-17201011 ] ASF subversion and git services commented on LUCENE-9535: - Commit a83c2c2ab00fea84ea48053a53276db905f05000 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a83c2c2 ] LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. (#1917) > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > Time Spent: 0.5h > Remaining Estimate: 0h > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201010#comment-17201010 ] ASF subversion and git services commented on LUCENE-9535: - Commit d226abd4481a5bd837264a7c53d1b13f417842ad in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d226abd ] LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. (#1917) > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > Time Spent: 0.5h > Remaining Estimate: 0h > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201013#comment-17201013 ] ASF subversion and git services commented on LUCENE-9535: - Commit a83c2c2ab00fea84ea48053a53276db905f05000 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a83c2c2 ] LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. (#1917) > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > Time Spent: 0.5h > Remaining Estimate: 0h > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents
[ https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201012#comment-17201012 ] ASF subversion and git services commented on LUCENE-9535: - Commit d226abd4481a5bd837264a7c53d1b13f417842ad in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d226abd ] LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. (#1917) > Investigate recent indexing slowdown for wikimedium documents > - > > Key: LUCENE-9535 > URL: https://issues.apache.org/jira/browse/LUCENE-9535 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Attachments: cpu_profile.svg > > Time Spent: 0.5h > Remaining Estimate: 0h > > Nightly benchmarks report a ~10% slowdown for 1kB documents as of September > 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html]. > On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I > first thought this could be due to smaller flushed segments and more merging, > but I still wonder whether there's something else. The benchmark runs with > 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 > = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get > full at the same time. Stored fields account for about 0.7MB of memory, or 1% > of the indexing buffer size. How can a 1% reduction of buffering capacity > explain a 10% indexing slowdown? I looked into this further by running > indexing benchmarks locally with 8 indexing threads and 128MB of indexing > buffer memory, which would make this issue even more apparent if the smaller > RAM buffer was the cause, but I'm not seeing a regression and actually I'm > seeing similar number of flushes when I disabled memory accounting for stored > fields. > I ran indexing under a profiler to see whether something else could cause > this slowdown, e.g. slow implementations of ramBytesUsed on stored fields > writers, but nothing surprising showed up and the profile looked just like I > would have expected. > Another question I have is why the 4kB benchmark is not affected at all. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode
arafalov commented on pull request #1863: URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697846565 > Purely responding to the URP response part, it’s definitely not possible for URP to send non-error responses. I do think its something we should implement though, since it will expand the use cases that URPs can solve. Ill create a JIRA for it. It may be possible to future proof this implementation by making **guess-schema** being a mode switch, instead of current present/absent flag. So, maybe rename it to **guess-mode** instead with options of - **update** - current (only) option basically, - **show** - (if/when there is a way to return suggested JSON), - **update-all** - (if we wanted to - sometimes - have specific fields even if dynamicField definition matches; could be done now if useful, - **none** to support tools easier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw opened a new pull request #1918: LUCENE-9535: Commit DWPT bytes used before locking indexing
s1monw opened a new pull request #1918: URL: https://github.com/apache/lucene-solr/pull/1918 Currently we calcualte the ramBytesUsed by the DWPT under the flushControl lock. We can do this caculation safely outside of the lock without any downside. The FlushControl lock should be used with care since it's a central part of indexing and might block all indexing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14892) shards.info with shards.tolerant can yield an empty key
David Smiley created SOLR-14892: --- Summary: shards.info with shards.tolerant can yield an empty key Key: SOLR-14892 URL: https://issues.apache.org/jira/browse/SOLR-14892 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: search Reporter: David Smiley When using shards.tolerant=true and shards.info=true when a shard isn't available (and maybe other circumstances), the shards.info section of the response may have an empty-string key child with a value that is ambiguous as to which shard(s) couldn't be reached. This problem can be revealed by modifying org.apache.solr.cloud.TestDownShardTolerantSearch#searchingShouldFailWithoutTolerantSearchSetToTrue to add shards.info and then examine the response in a debugger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14892) shards.info with shards.tolerant can yield an empty key
[ https://issues.apache.org/jira/browse/SOLR-14892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-14892: Attachment: solr14892.png > shards.info with shards.tolerant can yield an empty key > --- > > Key: SOLR-14892 > URL: https://issues.apache.org/jira/browse/SOLR-14892 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Priority: Minor > Attachments: solr14892.png > > > When using shards.tolerant=true and shards.info=true when a shard isn't > available (and maybe other circumstances), the shards.info section of the > response may have an empty-string key child with a value that is ambiguous as > to which shard(s) couldn't be reached. > This problem can be revealed by modifying > org.apache.solr.cloud.TestDownShardTolerantSearch#searchingShouldFailWithoutTolerantSearchSetToTrue > to add shards.info and then examine the response in a debugger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14893) Allow UpdateRequestProcessors to add non-error messages to the response
Houston Putman created SOLR-14893: - Summary: Allow UpdateRequestProcessors to add non-error messages to the response Key: SOLR-14893 URL: https://issues.apache.org/jira/browse/SOLR-14893 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: UpdateRequestProcessors Reporter: Houston Putman There are many reasons why a UpdateRequestProcessor would want to send a response back to the user: * Informing the user on the results when they use schema-guessing mode (SOLR-14701) * Building a new Processor that uses the lucene monitor library to alert on incoming documents that match saved queries * The Language detection URPs could respond with the languages selected for each document. Currently URPs can be passed in the Response object via the URPFactory that creates it. However, whenever the URP is placed in the chain after the DistributedURP, the response that it sends back will be dismissed by the DURP and not merged and sent back to the user. The bulk of the logic here would be to add logic in the DURP to accept custom messages in the responses of the updates it sends, and then merge those into an overall response to send to the user. Each URP could be responsible for merging its section of responses, because that will likely contain business logic for the URP that the DURP is not aware of. The SolrJ classes would also need updates to give the user an easy way to read response messages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14892) shards.info with shards.tolerant can yield an empty key
[ https://issues.apache.org/jira/browse/SOLR-14892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201038#comment-17201038 ] David Smiley commented on SOLR-14892: - I chased this down to org.apache.solr.handler.component.HttpShardHandler#createSliceShardsStr which when given an empty list, returns an empty string. It should probably return null. But using But null has ripple effects in many places which assume non-null values and maybe were written without shards.tolerant in mind. Lets say it remains an empty string. SearchHandler.handleRequestBody loops over "sreq.actualShards" which can yield that empty string. I hoped simply "continue"-ing this loop on this occurrence may help but it led to some other mystery. The code involved in general here is awfully messy. > shards.info with shards.tolerant can yield an empty key > --- > > Key: SOLR-14892 > URL: https://issues.apache.org/jira/browse/SOLR-14892 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Priority: Minor > Attachments: solr14892.png > > > When using shards.tolerant=true and shards.info=true when a shard isn't > available (and maybe other circumstances), the shards.info section of the > response may have an empty-string key child with a value that is ambiguous as > to which shard(s) couldn't be reached. > This problem can be revealed by modifying > org.apache.solr.cloud.TestDownShardTolerantSearch#searchingShouldFailWithoutTolerantSearchSetToTrue > to add shards.info and then examine the response in a debugger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common
dweiss commented on pull request #1836: URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-697932731 Hi Tomoko. The patch looks good to me (precommit doesn't pass though). I would commit it in once you get precommit to work - this issue has been out there for a while, nobody objected. If there is a need for changes (on master), we'll just follow-up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2
dweiss commented on a change in pull request #1905: URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r493853135 ## File path: lucene/build.gradle ## @@ -15,8 +15,56 @@ * limitations under the License. */ +// Should we do this as :lucene:packaging similar to how Solr does it? +// Or is this fine here? + +plugins { + id 'distribution' +} + description = 'Parent project for Apache Lucene Core' subprojects { group "org.apache.lucene" -} \ No newline at end of file +} + +distributions { + main { + // This is empirically wrong, but it is mostly a copy from `ant package-zip` Review comment: Haven't forgotten about it, just busy with work. Those release scripts will have to be adjusted to Solr and Lucene released independently in the future. Which requires independent builds, which requires repo split. Will have to get to it, eventually. Sigh. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1900: SOLR-14036: Remove explicit distrib=false from /terms handler
dsmiley commented on a change in pull request #1900: URL: https://github.com/apache/lucene-solr/pull/1900#discussion_r493862362 ## File path: solr/solr-ref-guide/src/major-changes-in-solr-9.adoc ## @@ -128,6 +128,8 @@ _(raw; not yet edited)_ * SOLR-14510: The `writeStartDocumentList` in `TextResponseWriter` now receives an extra boolean parameter representing the "exactness" of the numFound value (exact vs approximation). Any custom response writer extending `TextResponseWriter` will need to implement this abstract method now (instead previous with the same name but without the new boolean parameter). +* SOLR-14036: Implicit /terms handler now supports distributed search by default, when running in cloud mode. Review comment: Reworded to help a user think through upgrading: ```suggestion * SOLR-14036: Implicit /terms handler now returns terms across all shards in SolrCloud instead of only the local core. Users/apps may be assuming the old behavior. A request can be modified via the standard distrib=false param to only use the local core receiving the request. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async
[ https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201067#comment-17201067 ] Ishan Chattopadhyaya commented on SOLR-14354: - This doesn't have associated performance benchmarks for 8.7. bq. Would you recommend reverting from 8x? I'm not sure; it hasn't been shown to cause test failures that we can attribute here so seems safe from that end. At least where I work, it's something we'll use in our 8x fork and can serve as a canary. We need to stop treating our users as guinea pigs. -1 for 8.7 unless this is somehow made optional or there are performance benchmarks to prove its efficiency. > HttpShardHandler send requests in async > --- > > Key: SOLR-14354 > URL: https://issues.apache.org/jira/browse/SOLR-14354 > Project: Solr > Issue Type: Improvement >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Blocker > Fix For: master (9.0), 8.7 > > Attachments: image-2020-03-23-10-04-08-399.png, > image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png > > Time Spent: 4h > Remaining Estimate: 0h > > h2. 1. Current approach (problem) of Solr > Below is the diagram describe the model on how currently handling a request. > !image-2020-03-23-10-04-08-399.png! > The main-thread that handles the search requests, will submit n requests (n > equals to number of shards) to an executor. So each request will correspond > to a thread, after sending a request that thread basically do nothing just > waiting for response from other side. That thread will be swapped out and CPU > will try to handle another thread (this is called context switch, CPU will > save the context of the current thread and switch to another one). When some > data (not all) come back, that thread will be called to parsing these data, > then it will wait until more data come back. So there will be lots of context > switching in CPU. That is quite inefficient on using threads.Basically we > want less threads and most of them must busy all the time, because threads > are not free as well as context switching. That is the main idea behind > everything, like executor > h2. 2. Async call of Jetty HttpClient > Jetty HttpClient offers async API like this. > {code:java} > httpClient.newRequest("http://domain.com/path";) > // Add request hooks > .onRequestQueued(request -> { ... }) > .onRequestBegin(request -> { ... }) > // Add response hooks > .onResponseBegin(response -> { ... }) > .onResponseHeaders(response -> { ... }) > .onResponseContent((response, buffer) -> { ... }) > .send(result -> { ... }); {code} > Therefore after calling {{send()}} the thread will return immediately without > any block. Then when the client received the header from other side, it will > call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not > all response) from the data it will call {{onContent(buffer)}} listeners. > When everything finished it will call {{onComplete}} listeners. One main > thing that will must notice here is all listeners should finish quick, if the > listener block, all further data of that request won’t be handled until the > listener finish. > h2. 3. Solution 1: Sending requests async but spin one thread per response > Jetty HttpClient already provides several listeners, one of them is > InputStreamResponseListener. This is how it is get used > {code:java} > InputStreamResponseListener listener = new InputStreamResponseListener(); > client.newRequest(...).send(listener); > // Wait for the response headers to arrive > Response response = listener.get(5, TimeUnit.SECONDS); > if (response.getStatus() == 200) { > // Obtain the input stream on the response content > try (InputStream input = listener.getInputStream()) { > // Read the response content > } > } {code} > In this case, there will be 2 thread > * one thread trying to read the response content from InputStream > * one thread (this is a short-live task) feeding content to above > InputStream whenever some byte[] is available. Note that if this thread > unable to feed data into InputStream, this thread will wait. > By using this one, the model of HttpShardHandler can be written into > something like this > {code:java} > handler.sendReq(req, (is) -> { > executor.submit(() -> > try (is) { > // Read the content from InputStream > } > ) > }) {code} > The first diagram will be changed into this > !image-2020-03-23-10-09-10-221.png! > Notice that although “sending req to shard1” is wide, it won’t take long time > since sending req is a very quick operation. With this operation, handling > threads won’t be spin up until first bytes are sent back. Notice that i
[GitHub] [lucene-solr] madrob commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2
madrob commented on a change in pull request #1905: URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r493884949 ## File path: lucene/build.gradle ## @@ -15,8 +15,56 @@ * limitations under the License. */ +// Should we do this as :lucene:packaging similar to how Solr does it? +// Or is this fine here? + +plugins { + id 'distribution' +} + description = 'Parent project for Apache Lucene Core' subprojects { group "org.apache.lucene" -} \ No newline at end of file +} + +distributions { + main { + // This is empirically wrong, but it is mostly a copy from `ant package-zip` Review comment: My goal here with getting things releasable is to also turn the smoke tester back on so that we can hopefully catch issues before we actually go to do the release. I understand there’s going to be more split related work, but that shouldn’t stop us from working on the pieces that we can work on before that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9537) Add Indri Search Engine Functionality to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201089#comment-17201089 ] Cameron VandenBerg commented on LUCENE-9537: Hi Adrien, Unfortunately, the smoothing score that we use is document specific, so I am not sure if I could make it "transferable". I am definitely interested in brainstorming ways that we can make Indri fit into the Lucene architecture better though. Perhaps an example of how Indri smoothing scores would be helpful. Supposed we have an index with 4 documents (so sorry for the political nature of the documents... it's just what I can easily think of at the moment): 1) Donald Trump is the president of the United States. 2) There are three branches of government. The president is the head of the executive branch. 3) Jane Doe is president of the PTO. 4) Trump was elected in the 2016 election. Say that the query is: President Trump. In this index, the term president occurs more than the term Trump. The smoothing score acts like and idf for the query terms so that documents with just the term Trump will be ranked higher than documents with just the term president. Consider documents 3&4, which have the same length and each have one search term, but Document 4 has the more rare search term. Therefore the smoothing score for the term Trump in Document 3, will be lower than the smoothing score for the term president in Document 4. The addition of the smoothing scores for the terms that don't exist allows Document 4 to get a higher score and be ranked above Document 3. Let me know whether this example makes sense. Can you see a way that I can refactor the smoothing score so that it better fits into Lucene's existing architecture? Or let me know if I misunderstood your comment and you still feel that what you suggested will work. Thank you! > Add Indri Search Engine Functionality to Lucene > --- > > Key: LUCENE-9537 > URL: https://issues.apache.org/jira/browse/LUCENE-9537 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Cameron VandenBerg >Priority: Major > Labels: patch > Attachments: LUCENE-INDRI.patch > > > Indri ([http://lemurproject.org/indri.php]) is an academic search engine > developed by The University of Massachusetts and Carnegie Mellon University. > The major difference between Lucene and Indri is that Indri will give a > document a "smoothing score" to a document that does not contain the search > term, which has improved the search ranking accuracy in our experiments. I > have created an Indri patch, which adds the search code needed to implement > the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13682) command line option to export data to a file
[ https://issues.apache.org/jira/browse/SOLR-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201092#comment-17201092 ] David Smiley commented on SOLR-13682: - The ref-guide addition to solr-control-script-reference.adoc is nice, but I was unable to find it there as a user. I only found it using my committer sleuthing experience. My first action as a user was to search the ref guide search box for the word "export" which uncovered exporting-result-sets.adoc. That page definitely seemed like it was spot-on, yet it didn't have information about this new cool tool. Can you add a link there [~noble.paul]? > command line option to export data to a file > > > Key: SOLR-13682 > URL: https://issues.apache.org/jira/browse/SOLR-13682 > Project: Solr > Issue Type: Improvement >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: 8.3 > > Time Spent: 20m > Remaining Estimate: 0h > > example > {code:java} > bin/solr export -url http://localhost:8983/solr/gettingstarted > {code} > This will export all the docs in a collection called {{gettingstarted}} into > a file called {{gettingstarted.json}} > additional options are > * {{format}} : {{jsonl}} (default) or {{javabin}} > * {{out}} : export file name > * {{query}} : a custom query , default is **:** > * {{fields}}: a comma separated list of fields to be exported > * {{limit}} : no:of docs. default is 100 , send {{-1}} to import all the > docs > h2. Importing using {{curl}} > importing json file > {code:java} > curl -X POST -d @gettingstarted.json > http://localhost:18983/solr/gettingstarted/update/json/docs?commit=true > {code} > importing javabin format file > {code:java} > curl -X POST --header "Content-Type: application/javabin" --data-binary > @gettingstarted.javabin > http://localhost:7574/solr/gettingstarted/update?commit=true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common
uschindler commented on pull request #1836: URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-697990504 > Create fake factory base classes in o.a.l.a.util for backward compatibility (?) We do this only in Lucene 9, so more important to add all changes to MIGRATE.md > Fix tests I mentioned this, as the META-INF/services files are not updated. This makes renamed analyzers not load, as SPI can't find them As said before we need an SPI load test that ensures that all analyzer coponents have a factory that loads successfully with SPI. Maybe move that test (abstract) to test-framework and create a test implementation instance for each module containing factories. The test in analysis/common is not enough anymore. > Fix gradle scripts (?) jflex regenerate may need to be adapted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] goankur commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…
goankur commented on a change in pull request #1893: URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r493919329 ## File path: lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyLabels.java ## @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.taxonomy; + +import org.apache.lucene.document.Document; +import org.apache.lucene.facet.FacetField; +import org.apache.lucene.facet.FacetTestCase; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.FacetsCollector.MatchingDocs; +import org.apache.lucene.facet.FacetsConfig; +import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader; +import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.index.RandomIndexWriter; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.MatchAllDocsQuery; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Set; +import java.util.stream.Collectors; + +public class TestTaxonomyLabels extends FacetTestCase { + + private List prepareDocuments() { +List docs = new ArrayList<>(); + +Document doc = new Document(); +doc.add(new FacetField("Author", "Bob")); +doc.add(new FacetField("Publish Date", "2010", "10", "15")); +docs.add(doc); + +doc = new Document(); +doc.add(new FacetField("Author", "Lisa")); +doc.add(new FacetField("Publish Date", "2010", "10", "20")); +docs.add(doc); + +doc = new Document(); +doc.add(new FacetField("Author", "Tom")); +doc.add(new FacetField("Publish Date", "2012", "1", "1")); +docs.add(doc); + +doc = new Document(); +doc.add(new FacetField("Author", "Susan")); +doc.add(new FacetField("Publish Date", "2012", "1", "7")); +docs.add(doc); + +doc = new Document(); +doc.add(new FacetField("Author", "Frank")); +doc.add(new FacetField("Publish Date", "1999", "5", "5")); +docs.add(doc); + +return docs; + } + + private List allDocIds(MatchingDocs m, boolean decreasingDocIds) throws IOException { +DocIdSetIterator disi = m.bits.iterator(); +List docIds = new ArrayList<>(); +while (disi.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { + docIds.add(disi.docID()); +} + +if (decreasingDocIds == true) { + Collections.reverse(docIds); +} +return docIds; + } + + private List lookupFacetLabels(TaxonomyFacetLabels taxoLabels, + List matchingDocs) throws IOException { +return lookupFacetLabels(taxoLabels, matchingDocs, null, false); + } + + private List lookupFacetLabels(TaxonomyFacetLabels taxoLabels, + List matchingDocs, + String dimension) throws IOException { +return lookupFacetLabels(taxoLabels, matchingDocs, dimension, false); + } + + private List lookupFacetLabels(TaxonomyFacetLabels taxoLabels, List matchingDocs, String dimension, + boolean decreasingDocIds) throws IOException { +List facetLabels = new ArrayList<>(); + +for (MatchingDocs m : matchingDocs) { + TaxonomyFacetLabels.FacetLabelReader facetLabelReader = taxoLabels.getFacetLabelReader(m.context); + List docIds = allDocIds(m, decreasingDocIds); + FacetLabel facetLabel; + for (Integer docId : docIds) { +while (true) { + if (dimension != null) { +facetLabel = facetLabelReader.nextFacetLabel(docId, dimension); + } else { +facetLabel = facetLabelReader.nextFacetLabel(docId); + } + + if (facetLabel == null) { +break; + } + facetLabels.add(facetLabel); +} + } +} + +return facetLabels; + } + + + public void testBasic() throws Exception { Review comment: Done in this revision
[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter updated SOLR-14889: -- Attachment: SOLR-14889.patch Status: Open (was: Open) I thought this would be straight forward, but there's clearly still a lot about the gradle lifecycle / order-of-evaluation that i don't udnerstand The key change in the attached patch that this whole idea hinges on is.. {noformat} -expand(templateProps) +expand( templateProps.collectEntries({ k, v -> [k, v.replaceAll("'","''")]}) ) {noformat} But for reasons i don't understand, this seems to bypass the changes made to {{templateProps}} in ' {{setupLazyProps.doFirst}} ', where the ivy version values are added... {noformat} Execution failed for task ':solr:solr-ref-guide:prepareSources'. > Could not copy file > '/home/hossman/lucene/dev/solr/solr-ref-guide/src/_config.yml.template' to > '/home/hossman/lucene/dev/solr/solr-ref-guide/build/content/_config.yml'. > Missing property (ivyCommonsCodec) for Groovy template expansion. Defined keys [javadocLink, solrGuideDraftStatus, solrRootPath, solrDocsVersion, solrGuideVersionPath, htmlSolrJavadocs, htmlLuceneJavadocs, buildDate, buildYear, out]. {noformat} (I'm also not clear where that 'out' key is coming from, but i have no idea if that pre-dates this change) I experimented with adding a {{doFirst}} block to {{prepareSources}} that would copy the (escaped) templateProps into a newly defined Map in that task, that would be used in the {{expand(...)}} call – but that still seemed to result in the {{expand(..)}} being evaluated before the {{doFirst}} modified the map (see big commented out nocommit block in the patch for what i mean) [~uschindler] / [~dweiss] - can you help me understand what's going on here and how to do this "the right way" ? > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201157#comment-17201157 ] Uwe Schindler commented on SOLR-14889: -- That's very easy to explain: The expansion is done when the project is configured! Previously it was working because you just set a pointer to the (still changing props). Here the problem is that the collect loop is running during configuration phase. To fix this the whole expand must be delayed using lazy evaluation. It's later, will try before going to bed. > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201157#comment-17201157 ] Uwe Schindler edited comment on SOLR-14889 at 9/23/20, 11:13 PM: - That's very easy to explain: The expansion is done when the project is configured! Previously it was working because you just set a pointer to the (still changing) props. Here the problem is that the collect loop is running during configuration phase and you set a pointer to the result during configuration. To fix this the whole expand must be delayed using lazy evaluation. It's later, will try before going to bed. was (Author: thetaphi): That's very easy to explain: The expansion is done when the project is configured! Previously it was working because you just set a pointer to the (still changing props). Here the problem is that the collect loop is running during configuration phase. To fix this the whole expand must be delayed using lazy evaluation. It's later, will try before going to bed. > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-14889: - Attachment: SOLR-14889.patch > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch, SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201162#comment-17201162 ] Uwe Schindler commented on SOLR-14889: -- Here is my fix: [^SOLR-14889.patch] You need to create the expty map first and then populate it with escaped properties in doFirst. During configuration, the expand() method gets the empty map, which is populated in doFirst. This is a quick hack; I don't like it. Maybe I have an idea this night. > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch, SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201162#comment-17201162 ] Uwe Schindler edited comment on SOLR-14889 at 9/23/20, 11:27 PM: - Here is my fix: [^SOLR-14889.patch] You need to create the empty map first and then populate it with escaped properties in doFirst. During configuration, the expand() method gets the empty map, which is populated in doFirst. This is a quick hack; I don't like it. Maybe I have an idea this night. was (Author: thetaphi): Here is my fix: [^SOLR-14889.patch] You need to create the expty map first and then populate it with escaped properties in doFirst. During configuration, the expand() method gets the empty map, which is populated in doFirst. This is a quick hack; I don't like it. Maybe I have an idea this night. > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch, SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201164#comment-17201164 ] Uwe Schindler commented on SOLR-14889: -- I also changed the logger.warn to logger.lifecycle when outputting the properties. > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch, SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-14889: - Attachment: SOLR-14889.patch > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml
[ https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201177#comment-17201177 ] Uwe Schindler commented on SOLR-14889: -- Small update: [^SOLR-14889.patch] "replaceAll" is wrong, must be "replace" (as we dont use a regex). Typical Java error! > improve templated variable escaping in ref-guide _config.yml > > > Key: SOLR-14889 > URL: https://issues.apache.org/jira/browse/SOLR-14889 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch > > > SOLR-14824 ran into windows failures when we switching from using a hardcoded > "relative" path to the solrRootPath to using groovy/project variables to get > the path. the reason for the failures was that the path us used as a > variable tempted into {{_config.yml.template}} to build the {{_config.yml}} > file, but on windows the path seperater of '\' was being parsed by > jekyll/YAML as a string escape character. > (This wasn't a problem we ran into before, even on windows, prior to the > SOLR-14824 changes, because the hardcoded relative path only used '/' > delimiters, which (j)ruby was happy to work with, even on windows. > As Uwe pointed out when hotfixing this... > {quote}Problem was that backslashes are used to escape strings, but windows > paths also have those. Fix was to add StringEscapeUtils, but I don't like > this too much. Maybe we find a better solution to make special characters in > those properties escaped correctly when used in strings inside templates. > {quote} > ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this > one variable -- doesn't really protect other variables that might have > special charactes in them down the road, and while "escapeJava" work ok for > the "\" issue, it isn't neccessarily consistent with all YAML escapse, which > could lead to even weird bugs/cofusion down the road. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async
[ https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201179#comment-17201179 ] Cao Manh Dat commented on SOLR-14354: - Thank Mark for your nice words. [~ichattopadhyaya] I will try to do benchmark based on your project above. If I'm not be able to finish it before 8.7 release then reverting it will be a good option. > HttpShardHandler send requests in async > --- > > Key: SOLR-14354 > URL: https://issues.apache.org/jira/browse/SOLR-14354 > Project: Solr > Issue Type: Improvement >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Blocker > Fix For: master (9.0), 8.7 > > Attachments: image-2020-03-23-10-04-08-399.png, > image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png > > Time Spent: 4h > Remaining Estimate: 0h > > h2. 1. Current approach (problem) of Solr > Below is the diagram describe the model on how currently handling a request. > !image-2020-03-23-10-04-08-399.png! > The main-thread that handles the search requests, will submit n requests (n > equals to number of shards) to an executor. So each request will correspond > to a thread, after sending a request that thread basically do nothing just > waiting for response from other side. That thread will be swapped out and CPU > will try to handle another thread (this is called context switch, CPU will > save the context of the current thread and switch to another one). When some > data (not all) come back, that thread will be called to parsing these data, > then it will wait until more data come back. So there will be lots of context > switching in CPU. That is quite inefficient on using threads.Basically we > want less threads and most of them must busy all the time, because threads > are not free as well as context switching. That is the main idea behind > everything, like executor > h2. 2. Async call of Jetty HttpClient > Jetty HttpClient offers async API like this. > {code:java} > httpClient.newRequest("http://domain.com/path";) > // Add request hooks > .onRequestQueued(request -> { ... }) > .onRequestBegin(request -> { ... }) > // Add response hooks > .onResponseBegin(response -> { ... }) > .onResponseHeaders(response -> { ... }) > .onResponseContent((response, buffer) -> { ... }) > .send(result -> { ... }); {code} > Therefore after calling {{send()}} the thread will return immediately without > any block. Then when the client received the header from other side, it will > call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not > all response) from the data it will call {{onContent(buffer)}} listeners. > When everything finished it will call {{onComplete}} listeners. One main > thing that will must notice here is all listeners should finish quick, if the > listener block, all further data of that request won’t be handled until the > listener finish. > h2. 3. Solution 1: Sending requests async but spin one thread per response > Jetty HttpClient already provides several listeners, one of them is > InputStreamResponseListener. This is how it is get used > {code:java} > InputStreamResponseListener listener = new InputStreamResponseListener(); > client.newRequest(...).send(listener); > // Wait for the response headers to arrive > Response response = listener.get(5, TimeUnit.SECONDS); > if (response.getStatus() == 200) { > // Obtain the input stream on the response content > try (InputStream input = listener.getInputStream()) { > // Read the response content > } > } {code} > In this case, there will be 2 thread > * one thread trying to read the response content from InputStream > * one thread (this is a short-live task) feeding content to above > InputStream whenever some byte[] is available. Note that if this thread > unable to feed data into InputStream, this thread will wait. > By using this one, the model of HttpShardHandler can be written into > something like this > {code:java} > handler.sendReq(req, (is) -> { > executor.submit(() -> > try (is) { > // Read the content from InputStream > } > ) > }) {code} > The first diagram will be changed into this > !image-2020-03-23-10-09-10-221.png! > Notice that although “sending req to shard1” is wide, it won’t take long time > since sending req is a very quick operation. With this operation, handling > threads won’t be spin up until first bytes are sent back. Notice that in this > approach we still have active threads waiting for more data from InputStream > h2. 4. Solution 2: Buffering data and handle it inside jetty’s thread. > Jetty have another listener called BufferingResponseListener. This is how it > is get used > {code:java} > client.newRequest(