[jira] [Commented] (SOLR-14345) Error messages are not properly propagated with non-default response parsers
[ https://issues.apache.org/jira/browse/SOLR-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069267#comment-17069267 ] Munendra S N commented on SOLR-14345: - The latest patch is better shape. If there are no objections, I'm planning to commit it in few days and take up NoOpResponseParser handling separately > Error messages are not properly propagated with non-default response parsers > > > Key: SOLR-14345 > URL: https://issues.apache.org/jira/browse/SOLR-14345 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Munendra S N >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-14345.patch, SOLR-14345.patch, SOLR-14345.patch > > > Default {{ResponsParseer}} is {{BinaryResponseParser}}. when non-default > response parser is specified in the request then, the error message is > propagated to user. This happens in solrCloud mode. > I came across this problem when working on adding some test which uses > {{SolrTestCaseHS}} but similar problem exists with SolrJ client > Also, same problem exists in both HttpSolrClient and Http2SolrClient -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14007) Difference response format for percentile aggregation
[ https://issues.apache.org/jira/browse/SOLR-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069269#comment-17069269 ] Munendra S N commented on SOLR-14007: - [~ysee...@gmail.com] Hopefully, I have answered most of your questions above. Let me know if we can commit this or needs further changes. > Difference response format for percentile aggregation > - > > Key: SOLR-14007 > URL: https://issues.apache.org/jira/browse/SOLR-14007 > Project: Solr > Issue Type: Sub-task > Components: Facet Module >Reporter: Munendra S N >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-14007.patch > > > For percentile, > In Stats component, the response format for percentile is {{NamedList}} but > in JSON facet, the format is either array or single value depending on number > of percentiles specified. > Even if JSON percentile doesn't use NamedList, response format shouldn't > change based on number of percentiles -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11775) json.facet can use inconsistent Long/Integer for "count" depending on shard count
[ https://issues.apache.org/jira/browse/SOLR-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069271#comment-17069271 ] Munendra S N commented on SOLR-11775: - If there are no objections, I'm planning to commit(only master) this in the coming week > json.facet can use inconsistent Long/Integer for "count" depending on shard > count > - > > Key: SOLR-11775 > URL: https://issues.apache.org/jira/browse/SOLR-11775 > Project: Solr > Issue Type: Bug > Components: Facet Module >Reporter: Chris M. Hostetter >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-11775.patch, SOLR-11775.patch > > > (NOTE: I noticed this while working on a test for {{type: range}} but it's > possible other facet types may be affected as well) > When dealing with a single core request -- either standalone or a collection > with only one shard -- json.facet seems to use "Integer" objects to return > the "count" of facet buckets, however if the shard count is increased then > the end client gets a "Long" object for the "count" > (This isn't noticable when using {{wt=json}} but can be very problematic when > trying to write client code using {{wt=xml}} or SolrJ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul opened a new pull request #1386: SOLR-14275 Policy calculations are very slow for large clusters and large operations
noblepaul opened a new pull request #1386: SOLR-14275 Policy calculations are very slow for large clusters and large operations URL: https://github.com/apache/lucene-solr/pull/1386 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14365) CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values
[ https://issues.apache.org/jira/browse/SOLR-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069285#comment-17069285 ] Shalin Shekhar Mangar commented on SOLR-14365: -- I think we should add another method and make it configurable. > CollapsingQParser - Avoiding always allocate int[] and float[] with size > equals to number of unique values > -- > > Key: SOLR-14365 > URL: https://issues.apache.org/jira/browse/SOLR-14365 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.4.1 >Reporter: Cao Manh Dat >Assignee: Cao Manh Dat >Priority: Major > > Since Collapsing is a PostFilter, documents reach Collapsing must match with > all filters and queries, so the number of documents Collapsing need to > collect/compute score is a small fraction of the total number documents in > the index. So why do we need to always consume the memory (for int[] and > float[] array) for all unique values of the collapsed field? If the number of > unique values of the collapsed field found in the documents that match > queries and filters is 300 then we only need int[] and float[] array with > size of 300 and not 1.2 million in size. However, we don't know which value > of the collapsed field will show up in the results so we cannot use a smaller > array. > The easy fix for this problem is using as much as we need by using IntIntMap > and IntFloatMap that hold primitives and are much more space efficient than > the Java HashMap. These maps can be slower (10x or 20x) than plain int[] and > float[] if matched documents is large (almost all documents matched queries > and other filters). But our belief is that does not happen that frequently > (how frequently do we run collapsing on the entire index?). > For this issue I propose adding 2 methods for collapsing which is > * array : which is current implementation > * hash : which is new approach and will be default method > later we can add another method {{smart}} which is automatically pick method > based on comparision between {{number of docs matched queries and filters}} > and {{number of unique values of the field}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup
[ https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069290#comment-17069290 ] Munendra S N commented on SOLR-13492: - I had suggested [~kgsdora] to pick this up. So, trying to address/answer some of the above concerns I have tried with jcmd, jconsole, visualvm and [jmxterm|https://github.com/jiaqi/jmxterm] and with only master branch (java 11) In case of troubleshooting memory issues, it could be either local debugging or remote debugging. For production system, usually only remote debugging be possible. h2. Local Debugging h3. jcmd * Force triggering GC works even with {{DisableExplicitGC}}. We could run {{GC.run}} (with disable works only with jdk > 10 https://bugs.openjdk.java.net/browse/JDK-8186902) to force gc. There are other commands to force gc which works in java 8 too {{GC.class_histogram}} and {{GC.class_stats}} h3. jconsole and visualvm * Both comes with GUI. These identify local processes by checking hsperfdata_{{yourusername}} in tmp directory for pids. At present, GC config contains {{-XX:+PerfDisableSharedMem}} due to which pids [won't be present| http://jtuts.com/2017/02/04/jconsole-not-showing-local-processes/] in above folder. So, with default settings shipped with solr jconsole and visualvm can't identify local processes * I tested with removing the above flag and adding {{-XX:+DisableExplicitGC}}, gc now button in jconsole and visualvm doesn't work h3. jmxterm * This needs to jmx enabled h2. Remote debugging h3. jcmd * jcmd needs process id. Not sure if remote debugging is possible h3. jconsole and visualvm * If the process has jmx monitoring enabled then, remote debugging is possible. Solr ships with jmx disabled by default * I tried enabling and with {{-XX:+DisableExplicitGC}} and gc now button won't trigger gc * For visualvm, there is option to connect using jstatd but I haven't tried it h4. jmxterm (terminal tool) * This needs jmx monitoring enabled * This would behave similar to jcmd for local even with {{-XX:+DisableExplicitGC}}, gc can forced I checked the usage of {{System.gc()}} in lucene/solr. It is used in 1/2 lucene tests and lucene benchmark. Also, checked potential problems for disabling explicit gc and found [this|https://stackoverflow.com/questions/32912702/impact-of-setting-xxdisableexplicitgc-when-nio-direct-buffers-are-used?rq=1]. With the current default which Solr is shipped with both local or remote debugging is not possible via jconsole. With all things considered, I still think, shipping with {{-XX:+DisableExplicitGC}} is good choice and there are ways to force gc even with the above JVM flag but I haven't yet found GUI tool for this. [~erickerickson] If there are still concerns or objections I would be happy to answer them. Alternative solution is to add {{-XX:+ExplicitGCInvokesConcurrent}} so that any force gc is triggered concurrently > Disallow explicit GC by default during Solr startup > --- > > Key: SOLR-13492 > URL: https://issues.apache.org/jira/browse/SOLR-13492 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Shawn Heisey >Assignee: Shawn Heisey >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Solr should use the -XX:+DisableExplicitGC option as part of its default GC > tuning. > None of Solr's stock code uses explicit GCs, so that option will have no > effect on most installs. The effective result of this is that if somebody > adds custom code to Solr and THAT code does an explicit GC, it won't be > allowed to function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9198) Remove news section from TLP website
[ https://issues.apache.org/jira/browse/LUCENE-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069376#comment-17069376 ] Jan Høydahl commented on LUCENE-9198: - I’m fine with removing the old combined TLP release news, do you want to do it Alan? I hope to continue with releaseWizard update to reflect new procedure some time. > Remove news section from TLP website > > > Key: LUCENE-9198 > URL: https://issues.apache.org/jira/browse/LUCENE-9198 > Project: Lucene - Core > Issue Type: Improvement > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: new-tlp-conditional-news.png, > new-tlp-frontpage-layout.png, new-tlp-frontpage-layout.png > > Time Spent: 20m > Remaining Estimate: 0h > > On the front page [https://lucene.apache.org|https://lucene.apache.org/] we > today show a list of TLP news. > For every release we author one news article for Solr, one news article for > LuceneCore, and one news article for TLP site, combining the two. > In all these years we have never published a news item to TLP that is not a > release announcement, except in 2014 when we announced that OpenRelevance sub > project closed. > I thus propose to remove this news section, and replace it with two widgets > that automatically display the last 5 news headings from LuceneCore, Solr and > PyLucene sub projects. > If we have an important TLP announcement to make at some point, that can be > done right there on the front page, not? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14371) Zk StatusHandler should know about dynamic zk config
Jan Høydahl created SOLR-14371: -- Summary: Zk StatusHandler should know about dynamic zk config Key: SOLR-14371 URL: https://issues.apache.org/jira/browse/SOLR-14371 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Jan Høydahl With zk 3.5 it supports dynamic reconfig, which is used by the solr-operator for Kubernetes. Then Solr is given a zkHost of one url pointing to a LB (Service) in front of all zookeepers, and the zkclient will then fetch list of all zookeepers from special zknode /zookeeper/config and reconfigure itself with connection to all zk nodes listed. So you can then scale up/down number of zk nodes dynamically without restarting solr. However, the Admin UI displays errors since it believes it is connected to only one zk, which is contradictory to what zk itself reports. We need to make ZookeeperStatusHandler aware of dynamic reconfig so it asks zkclient what current zkHost is instead of relying on Zk_HOST static setting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14371) Zk StatusHandler should know about dynamic zk config
[ https://issues.apache.org/jira/browse/SOLR-14371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069380#comment-17069380 ] Jan Høydahl commented on SOLR-14371: [~houston] FYI > Zk StatusHandler should know about dynamic zk config > > > Key: SOLR-14371 > URL: https://issues.apache.org/jira/browse/SOLR-14371 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jan Høydahl >Priority: Major > > With zk 3.5 it supports dynamic reconfig, which is used by the solr-operator > for Kubernetes. Then Solr is given a zkHost of one url pointing to a LB > (Service) in front of all zookeepers, and the zkclient will then fetch list > of all zookeepers from special zknode /zookeeper/config and reconfigure > itself with connection to all zk nodes listed. So you can then scale up/down > number of zk nodes dynamically without restarting solr. > However, the Admin UI displays errors since it believes it is connected to > only one zk, which is contradictory to what zk itself reports. We need to > make ZookeeperStatusHandler aware of dynamic reconfig so it asks zkclient > what current zkHost is instead of relying on Zk_HOST static setting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14356) PeerSync with hanging nodes
[ https://issues.apache.org/jira/browse/SOLR-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069426#comment-17069426 ] Shalin Shekhar Mangar commented on SOLR-14356: -- Okay, yes let's add the connect timeout exception and discuss a better fix in SOLR-14368 > PeerSync with hanging nodes > --- > > Key: SOLR-14356 > URL: https://issues.apache.org/jira/browse/SOLR-14356 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Priority: Major > Attachments: SOLR-14356.patch > > > Right now in {{PeerSync}} (during leader election), in case of exception on > requesting versions to a node, we will skip that node if exception is one the > following type > * ConnectTimeoutException > * NoHttpResponseException > * SocketException > Sometime the other node basically hang but still accept connection. In that > case SocketTimeoutException is thrown and we consider the {{PeerSync}} > process as failed and the whole shard just basically leaderless forever (as > long as the hang node still there). > We can't just blindly adding {{SocketTimeoutException}} to above list, since > [~shalin] mentioned that sometimes timeout can happen because of genuine > reasons too e.g. temporary GC pause. > I think the general idea here is we obey {{leaderVoteWait}} restriction and > retry doing sync with others in case of connection/timeout exception happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup
[ https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069440#comment-17069440 ] Erick Erickson commented on SOLR-13492: --- [~munendrasn] Thanks for being so thorough. I should have been more explicit, I didn't mean to cause extra work. I don't particularly care if jconsole or visualVM allow explicit GC, I do care that there's _some_ way to trigger GC without restarting Solr. So since jcmd will do the trick, I'm back to +/-0. Having to ssh over to the machine running the Solr instance in question and executing this isn't onerous (although I think the JDK needs to be installed). I'm still lukewarm to protecting all Solr installations from what is a naive coding error, but practically I don't see that it makes enough difference to argue about ;) > Disallow explicit GC by default during Solr startup > --- > > Key: SOLR-13492 > URL: https://issues.apache.org/jira/browse/SOLR-13492 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Shawn Heisey >Assignee: Shawn Heisey >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Solr should use the -XX:+DisableExplicitGC option as part of its default GC > tuning. > None of Solr's stock code uses explicit GCs, so that option will have no > effect on most installs. The effective result of this is that if somebody > adds custom code to Solr and THAT code does an explicit GC, it won't be > allowed to function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup
[ https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069443#comment-17069443 ] Munendra S N commented on SOLR-13492: - Thanks [~erickerickson]. We will wait for few more days for others to review. Currently, idea is to go ahead DisableExplicitGC. > Disallow explicit GC by default during Solr startup > --- > > Key: SOLR-13492 > URL: https://issues.apache.org/jira/browse/SOLR-13492 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Shawn Heisey >Assignee: Shawn Heisey >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Solr should use the -XX:+DisableExplicitGC option as part of its default GC > tuning. > None of Solr's stock code uses explicit GCs, so that option will have no > effect on most installs. The effective result of this is that if somebody > adds custom code to Solr and THAT code does an explicit GC, it won't be > allowed to function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup
[ https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069446#comment-17069446 ] Munendra S N commented on SOLR-13492: - [^SOLR-13492.patch] Attaching patch to validate against precommit build > Disallow explicit GC by default during Solr startup > --- > > Key: SOLR-13492 > URL: https://issues.apache.org/jira/browse/SOLR-13492 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Shawn Heisey >Assignee: Shawn Heisey >Priority: Major > Attachments: SOLR-13492.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Solr should use the -XX:+DisableExplicitGC option as part of its default GC > tuning. > None of Solr's stock code uses explicit GCs, so that option will have no > effect on most installs. The effective result of this is that if somebody > adds custom code to Solr and THAT code does an explicit GC, it won't be > allowed to function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13492) Disallow explicit GC by default during Solr startup
[ https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13492: Status: Patch Available (was: Open) > Disallow explicit GC by default during Solr startup > --- > > Key: SOLR-13492 > URL: https://issues.apache.org/jira/browse/SOLR-13492 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Shawn Heisey >Assignee: Shawn Heisey >Priority: Major > Attachments: SOLR-13492.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Solr should use the -XX:+DisableExplicitGC option as part of its default GC > tuning. > None of Solr's stock code uses explicit GCs, so that option will have no > effect on most installs. The effective result of this is that if somebody > adds custom code to Solr and THAT code does an explicit GC, it won't be > allowed to function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13492) Disallow explicit GC by default during Solr startup
[ https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Munendra S N updated SOLR-13492: Attachment: SOLR-13492.patch > Disallow explicit GC by default during Solr startup > --- > > Key: SOLR-13492 > URL: https://issues.apache.org/jira/browse/SOLR-13492 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Shawn Heisey >Assignee: Shawn Heisey >Priority: Major > Attachments: SOLR-13492.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Solr should use the -XX:+DisableExplicitGC option as part of its default GC > tuning. > None of Solr's stock code uses explicit GCs, so that option will have no > effect on most installs. The effective result of this is that if somebody > adds custom code to Solr and THAT code does an explicit GC, it won't be > allowed to function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9297) Index has about 600+ columns,average size of doc is relatively big, Lucene firstly obtain the original doc from disk and then merge the old and the updating coulmns to a
[ https://issues.apache.org/jira/browse/LUCENE-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaihe updated LUCENE-9297: -- Description: Index has about 600+ columns,average size of doc is relatively big, Lucene firstly obtain the original doc from disk and then merge the old and the updating coulmns to a new one,finally flush to disk. The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly updating requests call frequently. I want to optimize the partly updating strategy, only partly columns instead of all are obtained and merge into a new one while partly updating request calls,in purpose of cuting down disk io usage rate. is there any suggestions? > Index has about 600+ columns,average size of doc is relatively big, Lucene > firstly obtain the original doc from disk and then merge the old and the > updating coulmns to a new one,finally flush to disk.The dsik io usage rate of > our 150+ nodes always reach > -- > > Key: LUCENE-9297 > URL: https://issues.apache.org/jira/browse/LUCENE-9297 > Project: Lucene - Core > Issue Type: New Feature >Reporter: kaihe >Priority: Major > > Index has about 600+ columns,average size of doc is relatively big, Lucene > firstly obtain the original doc from disk and then merge the old and the > updating coulmns to a new one,finally flush to disk. > The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly > updating requests call frequently. > I want to optimize the partly updating strategy, only partly columns instead > of all are obtained and merge into a new one while partly updating request > calls,in purpose of cuting down disk io usage rate. > is there any suggestions? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9297) Index has about 600+ columns,average size of doc is relatively big, Lucene firstly obtain the original doc from disk and then merge the old and the updating coulmns to a
kaihe created LUCENE-9297: - Summary: Index has about 600+ columns,average size of doc is relatively big, Lucene firstly obtain the original doc from disk and then merge the old and the updating coulmns to a new one,finally flush to disk.The dsik io usage rate of our 150+ nodes always reach Key: LUCENE-9297 URL: https://issues.apache.org/jira/browse/LUCENE-9297 Project: Lucene - Core Issue Type: New Feature Reporter: kaihe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9297) partly updating strategy
[ https://issues.apache.org/jira/browse/LUCENE-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kaihe updated LUCENE-9297: -- Summary: partly updating strategy (was: Index has about 600+ columns,average size of doc is relatively big, Lucene firstly obtain the original doc from disk and then merge the old and the updating coulmns to a new one,finally flush to disk.The dsik io usage rate of our 150+ nodes always reach ) > partly updating strategy > > > Key: LUCENE-9297 > URL: https://issues.apache.org/jira/browse/LUCENE-9297 > Project: Lucene - Core > Issue Type: New Feature >Reporter: kaihe >Priority: Major > > Index has about 600+ columns,average size of doc is relatively big, Lucene > firstly obtain the original doc from disk and then merge the old and the > updating coulmns to a new one,finally flush to disk. > The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly > updating requests call frequently. > I want to optimize the partly updating strategy, only partly columns instead > of all are obtained and merge into a new one while partly updating request > calls,in purpose of cuting down disk io usage rate. > is there any suggestions? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11775) json.facet can use inconsistent Long/Integer for "count" depending on shard count
[ https://issues.apache.org/jira/browse/SOLR-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069943#comment-17069943 ] Mikhail Khludnev commented on SOLR-11775: - +1 > json.facet can use inconsistent Long/Integer for "count" depending on shard > count > - > > Key: SOLR-11775 > URL: https://issues.apache.org/jira/browse/SOLR-11775 > Project: Solr > Issue Type: Bug > Components: Facet Module >Reporter: Chris M. Hostetter >Assignee: Munendra S N >Priority: Major > Attachments: SOLR-11775.patch, SOLR-11775.patch > > > (NOTE: I noticed this while working on a test for {{type: range}} but it's > possible other facet types may be affected as well) > When dealing with a single core request -- either standalone or a collection > with only one shard -- json.facet seems to use "Integer" objects to return > the "count" of facet buckets, however if the shard count is increased then > the end client gets a "Long" object for the "count" > (This isn't noticable when using {{wt=json}} but can be very problematic when > trying to write client code using {{wt=xml}} or SolrJ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9297) partly updating strategy
[ https://issues.apache.org/jira/browse/LUCENE-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved LUCENE-9297. Resolution: Invalid Please raise questions like this on the user's list, we try to reserve JIRAs for known bugs/enhancements rather than usage questions. See: [http://lucene.apache.org/solr/community.html#mailing-lists-irc,] there are links to both Lucene and Solr user's lists. A _lot_ more people will see your question on that list and may be able to help more quickly. If it's determined that this really is a code issue or enhancement to Lucene or Solr and not a configuration/usage problem, we can raise a new JIRA or reopen this one. > partly updating strategy > > > Key: LUCENE-9297 > URL: https://issues.apache.org/jira/browse/LUCENE-9297 > Project: Lucene - Core > Issue Type: New Feature >Reporter: kaihe >Priority: Major > > Index has about 600+ columns,average size of doc is relatively big, Lucene > firstly obtain the original doc from disk and then merge the old and the > updating coulmns to a new one,finally flush to disk. > The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly > updating requests call frequently. > I want to optimize the partly updating strategy, only partly columns instead > of all are obtained and merge into a new one while partly updating request > calls,in purpose of cuting down disk io usage rate. > is there any suggestions? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit
mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit URL: https://github.com/apache/lucene-solr/pull/1361#discussion_r399689324 ## File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java ## @@ -435,55 +436,81 @@ private void ensureInitialized(ThreadState state) throws IOException { long updateDocuments(final Iterable> docs, final Analyzer analyzer, final DocumentsWriterDeleteQueue.Node delNode) throws IOException { boolean hasEvents = preUpdate(); - final ThreadState perThread = flushControl.obtainAndLock(); DocumentsWriterPerThread flushingDWPT = null; +final boolean isUpdate = delNode != null && delNode.isDelete(); +final int numDocsBefore = perThread.dwpt == null ? 0 : perThread.dwpt.getNumDocsInRAM(); final long seqNo; try { - try { -// This must happen after we've pulled the ThreadState because IW.close -// waits for all ThreadStates to be released: -ensureOpen(); -ensureInitialized(perThread); -assert perThread.isInitialized(); -final DocumentsWriterPerThread dwpt = perThread.dwpt; -final int dwptNumDocs = dwpt.getNumDocsInRAM(); -try { - seqNo = dwpt.updateDocuments(docs, analyzer, delNode, flushNotifications); - perThread.updateLastSeqNo(seqNo); -} finally { - // We don't know how many documents were actually - // counted as indexed, so we must subtract here to - // accumulate our separate counter: - numDocsInRAM.addAndGet(dwpt.getNumDocsInRAM() - dwptNumDocs); - if (dwpt.isAborted()) { -flushControl.doOnAbort(perThread); - } else if (dwpt.getNumDocsInRAM() > 0) { -// we need to check if we have at least one doc in the DWPT. This can be 0 if we fail -// due to exceeding total number of docs etc. -final boolean isUpdate = delNode != null && delNode.isDelete(); + innerUpdateDocuments(perThread, docs, analyzer, delNode); + flushingDWPT = flushControl.doAfterDocument(perThread, isUpdate); +} catch (MaxBufferSizeExceededException ex) { + if (perThread.dwpt.isAborted()) { +throw ex; + } else { +// we hit an exception but still need to flush this DWPT +// let's run postUpdate to make sure we flush stuff to disk +// in the case we exceed ram limits etc. +hasEvents = doAfterDocumentRejected(perThread, isUpdate, hasEvents); +// we retry if we the DWPT had more than one document indexed and was flushed +boolean shouldRetry = perThread.dwpt == null && numDocsBefore > 0; +if (shouldRetry) { + try { +// we retry into a brand new DWPT, if it doesn't fit in here we can't index the document Review comment: Oh, I see: we create a new DWPT and send the doc there, ok. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit
mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit URL: https://github.com/apache/lucene-solr/pull/1361#discussion_r399689335 ## File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java ## @@ -435,55 +436,81 @@ private void ensureInitialized(ThreadState state) throws IOException { long updateDocuments(final Iterable> docs, final Analyzer analyzer, final DocumentsWriterDeleteQueue.Node delNode) throws IOException { boolean hasEvents = preUpdate(); - final ThreadState perThread = flushControl.obtainAndLock(); DocumentsWriterPerThread flushingDWPT = null; +final boolean isUpdate = delNode != null && delNode.isDelete(); +final int numDocsBefore = perThread.dwpt == null ? 0 : perThread.dwpt.getNumDocsInRAM(); final long seqNo; try { - try { -// This must happen after we've pulled the ThreadState because IW.close -// waits for all ThreadStates to be released: -ensureOpen(); -ensureInitialized(perThread); -assert perThread.isInitialized(); -final DocumentsWriterPerThread dwpt = perThread.dwpt; -final int dwptNumDocs = dwpt.getNumDocsInRAM(); -try { - seqNo = dwpt.updateDocuments(docs, analyzer, delNode, flushNotifications); - perThread.updateLastSeqNo(seqNo); -} finally { - // We don't know how many documents were actually - // counted as indexed, so we must subtract here to - // accumulate our separate counter: - numDocsInRAM.addAndGet(dwpt.getNumDocsInRAM() - dwptNumDocs); - if (dwpt.isAborted()) { -flushControl.doOnAbort(perThread); - } else if (dwpt.getNumDocsInRAM() > 0) { -// we need to check if we have at least one doc in the DWPT. This can be 0 if we fail -// due to exceeding total number of docs etc. -final boolean isUpdate = delNode != null && delNode.isDelete(); + innerUpdateDocuments(perThread, docs, analyzer, delNode); + flushingDWPT = flushControl.doAfterDocument(perThread, isUpdate); +} catch (MaxBufferSizeExceededException ex) { + if (perThread.dwpt.isAborted()) { +throw ex; + } else { +// we hit an exception but still need to flush this DWPT +// let's run postUpdate to make sure we flush stuff to disk +// in the case we exceed ram limits etc. +hasEvents = doAfterDocumentRejected(perThread, isUpdate, hasEvents); +// we retry if we the DWPT had more than one document indexed and was flushed +boolean shouldRetry = perThread.dwpt == null && numDocsBefore > 0; +if (shouldRetry) { Review comment: Got it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mikemccand commented on issue #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit
mikemccand commented on issue #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit URL: https://github.com/apache/lucene-solr/pull/1361#issuecomment-605494944 Another thing we could consider is changing DWPT's postings addressing from `int` to `long` so we don't need retry logic. We could do it, always, increasing the per-unique-term memory cost. Or we could maybe find a way to do it conditionally, when a given DWPT wants to exceed the 2.1 GB limit, but that'd be trickier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup
[ https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069974#comment-17069974 ] Lucene/Solr QA commented on SOLR-13492: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || || || || || {color:brown} master Compile Tests {color} || || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 0m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate ref guide {color} | {color:green} 0m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:black}{color} | {color:black} {color} | {color:black} 1m 23s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13492 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12998109/SOLR-13492.patch | | Optional Tests | validatesourcepatterns ratsources validaterefguide | | uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 9de68117067 | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | modules | C: solr solr/solr-ref-guide U: solr | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/728/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Disallow explicit GC by default during Solr startup > --- > > Key: SOLR-13492 > URL: https://issues.apache.org/jira/browse/SOLR-13492 > Project: Solr > Issue Type: Improvement > Components: scripts and tools >Reporter: Shawn Heisey >Assignee: Shawn Heisey >Priority: Major > Attachments: SOLR-13492.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Solr should use the -XX:+DisableExplicitGC option as part of its default GC > tuning. > None of Solr's stock code uses explicit GCs, so that option will have no > effect on most installs. The effective result of this is that if somebody > adds custom code to Solr and THAT code does an explicit GC, it won't be > allowed to function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672 @msokolov Thank for suggesting additional benchmarks that we can use. Below are the results on the dataset `wikimedium10m`. First I will repeat the results from the previous round of benchmarking: topN=10, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 147.64 | (11.5%) | 547.80 |(6.6%) | | HighTermMonthSort | 147.85 | (12.2%) | 239.28 |(7.3%) | | HighTermDayOfYearSort |74.44 |(7.7%) | 42.56 | (12.1%) | --- topN=10, **taskRepeatCount = 500**, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 184.60 |(8.2%) | 3046.19 |(4.4%) | | HighTermMonthSort | 209.43 |(6.5%) | 253.90 | (10.5%) | | HighTermDayOfYearSort | 130.97 |(5.8%) | 73.25 | (11.8%) | This seemed to speed up all operations, and here the speedups for `TermDTSort` even bigger: 16.5x times. There is also seems to be more regression for `HighTermDayOfYearSort`. --- **topN=500**, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 210.24 |(9.7%) | 537.65 |(6.7%) | | HighTermMonthSort | 116.02 |(8.9%) | 189.96 | (13.5%) | | HighTermDayOfYearSort |42.33 |(7.6%) | 67.93 |(9.3%) | With increased `topN` the sort optimization has less speedups up to 2x, as it is expected as it will be possible to run it only after collecting `topN` docs. --- topN=10, taskRepeatCount = 20, **concurrentSearchers = True** | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 132.09 | (14.3%) | 287.93 | (11.8%) | | HighTermMonthSort | 211.01 | (12.2%) | 116.46 |(7.1%) | | HighTermDayOfYearSort |72.28 |(6.1%) | 68.21 | (11.4%) | With the concurrent searchers the speedups are also smaller up to 2x. This is expected as now segments are spread between several TopFieldCollects/Comparators and they don't exchange bottom values. As a follow-up on this PR, we can think how we can have a global bottom value similar how `MaxScoreAccumulator` is used to set up a global competitive min score. --- with **indexSort='lastModNDV:long'** topN=10, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 314.78 | (11.6%) | 111.80 | (13.3%) | | HighTermMonthSort | 114.77 | (13.1%) | 78.22 |(7.5%) | | HighTermDayOfYearSort |46.82 |(5.7%) | 33.68 |(6.1%) | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents
mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672 @msokolov Thank for suggesting additional benchmarks that we can use. Below are the results on the dataset `wikimedium10m`. First I will repeat the results from the previous round of benchmarking: topN=10, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 147.64 | (11.5%) | 547.80 |(6.6%) | | HighTermMonthSort | 147.85 | (12.2%) | 239.28 |(7.3%) | | HighTermDayOfYearSort |74.44 |(7.7%) | 42.56 | (12.1%) | --- topN=10, **taskRepeatCount = 500**, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 184.60 |(8.2%) | 3046.19 |(4.4%) | | HighTermMonthSort | 209.43 |(6.5%) | 253.90 | (10.5%) | | HighTermDayOfYearSort | 130.97 |(5.8%) | 73.25 | (11.8%) | This seemed to speed up all operations, and here the speedups for `TermDTSort` even bigger: 16.5x times. There is also seems to be more regression for `HighTermDayOfYearSort`. --- **topN=500**, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 210.24 |(9.7%) | 537.65 |(6.7%) | | HighTermMonthSort | 116.02 |(8.9%) | 189.96 | (13.5%) | | HighTermDayOfYearSort |42.33 |(7.6%) | 67.93 |(9.3%) | With increased `topN` the sort optimization has less speedups up to 2x, as it is expected as it will be possible to run it only after collecting `topN` docs. --- topN=10, taskRepeatCount = 20, **concurrentSearchers = True** | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 132.09 | (14.3%) | 287.93 | (11.8%) | | HighTermMonthSort | 211.01 | (12.2%) | 116.46 |(7.1%) | | HighTermDayOfYearSort |72.28 |(6.1%) | 68.21 | (11.4%) | With the concurrent searchers the speedups are also smaller up to 2x. This is expected as now segments are spread between several TopFieldCollects/Comparators and they don't exchange bottom values. As a follow-up on this PR, we can think how we can have a global bottom value similar how `MaxScoreAccumulator` is used to set up a global competitive min score. --- with **indexSort='lastModNDV:long'** topN=10, taskRepeatCount = 20, concurrentSearchers = False | TaskQPS | baseline QPS | StdDevQPS | my_modified_version QPS | StdDevQPS | | - | ---: | : | --: | : | | **TermDTSort**| 321.75 | (11.5%) | 364.83 |(7.8%) | | HighTermMonthSort | 205.20 |(5.7%) | 178.16 |(7.8%) | | HighTermDayOfYearSort |66.07 | (12.0%) | 58.84 |(9.3%) | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle
[ https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070056#comment-17070056 ] Mike Drob commented on LUCENE-9266: --- [smoker] FAILED: [smoker] ./gradle/wrapper/gradle-wrapper.jar > ant nightly-smoke fails due to presence of build.gradle > --- > > Key: LUCENE-9266 > URL: https://issues.apache.org/jira/browse/LUCENE-9266 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Seen on Jenkins - > [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console] > > Reproduced locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle
[ https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070056#comment-17070056 ] Mike Drob edited comment on LUCENE-9266 at 3/28/20, 10:39 PM: -- [smoker] [|] [/] [-] [\] unpack solr-9.0.0-src.tgz... [smoker] make sure no JARs/WARs in src dist... [smoker] FAILED: [smoker] ./gradle/wrapper/gradle-wrapper.jar was (Author: mdrob): [smoker] FAILED: [smoker] ./gradle/wrapper/gradle-wrapper.jar > ant nightly-smoke fails due to presence of build.gradle > --- > > Key: LUCENE-9266 > URL: https://issues.apache.org/jira/browse/LUCENE-9266 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Seen on Jenkins - > [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console] > > Reproduced locally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #1387: SOLR-14210: Include replica health in healtcheck handler
janhoy opened a new pull request #1387: SOLR-14210: Include replica health in healtcheck handler URL: https://github.com/apache/lucene-solr/pull/1387 See https://issues.apache.org/jira/browse/SOLR-14210 WIP, not tests yet, not even tested manually This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14210) Introduce Node-level status handler for replicas
[ https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070166#comment-17070166 ] Jan Høydahl commented on SOLR-14210: See https://github.com/apache/lucene-solr/pull/1387 for a first attempt of this. If param {{&failWhenRecovering=true}} is passed to {{/api/node/health}} then it will return 503 if one or more cores on the node are in states {{RECOVERY}} or {{CONSTRUCTION}}. > Introduce Node-level status handler for replicas > > > Key: SOLR-14210 > URL: https://issues.apache.org/jira/browse/SOLR-14210 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (9.0), 8.5 >Reporter: Houston Putman >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > h2. Background > As was brought up in SOLR-13055, in order to run Solr in a more cloud-native > way, we need some additional features around node-level healthchecks. > {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe > explained in > [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n] > determine if a node is live and ready to serve live traffic. > {quote} > > However there are issues around kubernetes managing it's own rolling > restarts. With the current healthcheck setup, it's easy to envision a > scenario in which Solr reports itself as "healthy" when all of its replicas > are actually recovering. Therefore kubernetes, seeing a healthy pod would > then go and restart the next Solr node. This can happen until all replicas > are "recovering" and none are healthy. (maybe the last one restarted will be > "down", but still there are no "active" replicas) > h2. Proposal > I propose we make an additional healthcheck handler that returns whether all > replicas hosted by that Solr node are healthy and "active". That way we will > be able to use the [default kubernetes rolling restart > logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies] > with Solr. > To add on to [Jan's point > here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559], > this handler should be more friendly for other Content-Types and should use > bettter HTTP response statuses. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down
[ https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070173#comment-17070173 ] ASF subversion and git services commented on SOLR-14317: Commit 782ded2d7ab10f6eea0468a9b0e49a94b2ce6c0b in lucene-solr's branch refs/heads/master from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=782ded2 ] SOLR-14317: HttpClusterStateProvider throws exception when only one node down (Closes #1342) > HttpClusterStateProvider throws exception when only one node down > - > > Key: SOLR-14317 > URL: https://issues.apache.org/jira/browse/SOLR-14317 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.1, 7.7.2 >Reporter: Lyle >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-14317.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When create a CloudSolrClient with solrUrls, if the first url in the solrUrls > list is invalid or server is down, it will throw exception directly rather > than try remaining url. > In > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65], > if fetchLiveNodes(initialClient) have any IOException, in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648], > exceptions will be caught and throw SolrServerException to the upper caller, > while no IOExceptioin will be caught in > HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200). > The SolrServerException should be caught as well in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69], > so that if first node provided in solrUrs down, we can try to use the second > to fetch live nodes. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit closed pull request #1342: SOLR-14317: HttpClusterStateProvider throws exception when only one node down
asfgit closed pull request #1342: SOLR-14317: HttpClusterStateProvider throws exception when only one node down URL: https://github.com/apache/lucene-solr/pull/1342 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down
[ https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070174#comment-17070174 ] ASF subversion and git services commented on SOLR-14317: Commit 5f6efb000fb6a4b23a67eb23f8a463c2ece6706b in lucene-solr's branch refs/heads/branch_8x from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f6efb0 ] SOLR-14317: HttpClusterStateProvider throws exception when only one node down (Closes #1342) > HttpClusterStateProvider throws exception when only one node down > - > > Key: SOLR-14317 > URL: https://issues.apache.org/jira/browse/SOLR-14317 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.1, 7.7.2 >Reporter: Lyle >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-14317.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When create a CloudSolrClient with solrUrls, if the first url in the solrUrls > list is invalid or server is down, it will throw exception directly rather > than try remaining url. > In > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65], > if fetchLiveNodes(initialClient) have any IOException, in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648], > exceptions will be caught and throw SolrServerException to the upper caller, > while no IOExceptioin will be caught in > HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200). > The SolrServerException should be caught as well in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69], > so that if first node provided in solrUrs down, we can try to use the second > to fetch live nodes. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down
[ https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ishan Chattopadhyaya resolved SOLR-14317. - Fix Version/s: 8.6 Resolution: Fixed Thanks [~lyle_wang]! > HttpClusterStateProvider throws exception when only one node down > - > > Key: SOLR-14317 > URL: https://issues.apache.org/jira/browse/SOLR-14317 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.1, 7.7.2 >Reporter: Lyle >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.6 > > Attachments: SOLR-14317.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When create a CloudSolrClient with solrUrls, if the first url in the solrUrls > list is invalid or server is down, it will throw exception directly rather > than try remaining url. > In > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65], > if fetchLiveNodes(initialClient) have any IOException, in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648], > exceptions will be caught and throw SolrServerException to the upper caller, > while no IOExceptioin will be caught in > HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200). > The SolrServerException should be caught as well in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69], > so that if first node provided in solrUrs down, we can try to use the second > to fetch live nodes. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down
[ https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070176#comment-17070176 ] Ishan Chattopadhyaya commented on SOLR-14317: - Haven't ported to 7.7 yet. Please attach patch if you feel it is needed. [~noble] (since, you're the RM for the next 7.7 release), do you think this should be included? > HttpClusterStateProvider throws exception when only one node down > - > > Key: SOLR-14317 > URL: https://issues.apache.org/jira/browse/SOLR-14317 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.1, 7.7.2 >Reporter: Lyle >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.6 > > Attachments: SOLR-14317.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When create a CloudSolrClient with solrUrls, if the first url in the solrUrls > list is invalid or server is down, it will throw exception directly rather > than try remaining url. > In > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65], > if fetchLiveNodes(initialClient) have any IOException, in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648], > exceptions will be caught and throw SolrServerException to the upper caller, > while no IOExceptioin will be caught in > HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200). > The SolrServerException should be caught as well in > [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69], > so that if first node provided in solrUrs down, we can try to use the second > to fetch live nodes. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14170) Tag package feature as experimental
[ https://issues.apache.org/jira/browse/SOLR-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070178#comment-17070178 ] Ishan Chattopadhyaya commented on SOLR-14170: - {quote}Not yet recommended for production use {quote} I don't see why this shouldn't be recommended for production use. There are plenty of security related warnings added to the reference guide for this feature. WDYT, [~noble.paul] ? > Tag package feature as experimental > --- > > Key: SOLR-14170 > URL: https://issues.apache.org/jira/browse/SOLR-14170 > Project: Solr > Issue Type: Test > Components: documentation >Reporter: Jan Høydahl >Assignee: Ishan Chattopadhyaya >Priority: Major > Fix For: 8.6 > > > The new package store and package installation feature introduced in 8.4 was > supposed to be tagged as lucene.experimental with a clear warning in > ref-guide "Not yet recommended for production use" > Let's add that for 8.5 so there is no doubt that if you use the feature you > know the risks. Once the APIs have stabilized and there are a number of > packages available "in the wild", we can decide to release it as a "GA" > feature, but not yet! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org