[jira] [Commented] (SOLR-14345) Error messages are not properly propagated with non-default response parsers

2020-03-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069267#comment-17069267
 ] 

Munendra S N commented on SOLR-14345:
-

The  latest patch is better shape. If there are no objections, I'm planning to 
commit it in few days and take up NoOpResponseParser handling separately

> Error messages are not properly propagated with non-default response parsers
> 
>
> Key: SOLR-14345
> URL: https://issues.apache.org/jira/browse/SOLR-14345
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Munendra S N
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-14345.patch, SOLR-14345.patch, SOLR-14345.patch
>
>
> Default {{ResponsParseer}} is {{BinaryResponseParser}}. when non-default 
> response parser is specified in the request then, the error message is 
> propagated to user. This happens in solrCloud mode.
> I came across this problem when working on adding some test which uses 
> {{SolrTestCaseHS}} but similar problem exists with SolrJ client
> Also, same problem exists in both HttpSolrClient and Http2SolrClient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14007) Difference response format for percentile aggregation

2020-03-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069269#comment-17069269
 ] 

Munendra S N commented on SOLR-14007:
-

[~ysee...@gmail.com]
Hopefully, I have answered most of your questions above. Let me know if we can 
commit this or needs further changes.

> Difference response format for percentile aggregation
> -
>
> Key: SOLR-14007
> URL: https://issues.apache.org/jira/browse/SOLR-14007
> Project: Solr
>  Issue Type: Sub-task
>  Components: Facet Module
>Reporter: Munendra S N
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-14007.patch
>
>
> For percentile,
> In Stats component, the response format for percentile is {{NamedList}} but 
> in JSON facet, the format is either array or single value depending on number 
> of percentiles specified.
> Even if JSON percentile doesn't use NamedList, response format shouldn't 
> change based on number of percentiles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11775) json.facet can use inconsistent Long/Integer for "count" depending on shard count

2020-03-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069271#comment-17069271
 ] 

Munendra S N commented on SOLR-11775:
-

If there are no objections, I'm  planning to commit(only  master) this in the 
coming week

> json.facet can use inconsistent Long/Integer for "count" depending on shard 
> count
> -
>
> Key: SOLR-11775
> URL: https://issues.apache.org/jira/browse/SOLR-11775
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-11775.patch, SOLR-11775.patch
>
>
> (NOTE: I noticed this while working on a test for {{type: range}} but it's 
> possible other facet types may be affected as well)
> When dealing with a single core request -- either standalone or a collection 
> with only one shard -- json.facet seems to use "Integer" objects to return 
> the "count" of facet buckets, however if the shard count is increased then 
> the end client gets a "Long" object for the "count"
> (This isn't noticable when using {{wt=json}} but can be very problematic when 
> trying to write client code using {{wt=xml}} or SolrJ



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul opened a new pull request #1386: SOLR-14275 Policy calculations are very slow for large clusters and large operations

2020-03-28 Thread GitBox
noblepaul opened a new pull request #1386: SOLR-14275 Policy calculations are 
very slow for large clusters and large operations
URL: https://github.com/apache/lucene-solr/pull/1386
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14365) CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values

2020-03-28 Thread Shalin Shekhar Mangar (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069285#comment-17069285
 ] 

Shalin Shekhar Mangar commented on SOLR-14365:
--

I think we should add another method and make it configurable.

> CollapsingQParser - Avoiding always allocate int[] and float[] with size 
> equals to number of unique values
> --
>
> Key: SOLR-14365
> URL: https://issues.apache.org/jira/browse/SOLR-14365
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.4.1
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Major
>
> Since Collapsing is a PostFilter, documents reach Collapsing must match with 
> all filters and queries, so the number of documents Collapsing need to 
> collect/compute score is a small fraction of the total number documents in 
> the index. So why do we need to always consume the memory (for int[] and 
> float[] array) for all unique values of the collapsed field? If the number of 
> unique values of the collapsed field found in the documents that match 
> queries and filters is 300 then we only need int[] and float[] array with 
> size of 300 and not 1.2 million in size. However, we don't know which value 
> of the collapsed field will show up in the results so we cannot use a smaller 
> array.
> The easy fix for this problem is using as much as we need by using IntIntMap 
> and IntFloatMap that hold primitives and are much more space efficient than 
> the Java HashMap. These maps can be slower (10x or 20x) than plain int[] and 
> float[] if matched documents is large (almost all documents matched queries 
> and other filters). But our belief is that does not happen that frequently 
> (how frequently do we run collapsing on the entire index?).
> For this issue I propose adding 2 methods for collapsing which is
> * array : which is current implementation
> * hash : which is new approach and will be default method
> later we can add another method {{smart}} which is automatically pick method 
> based on comparision between {{number of docs matched queries and filters}} 
> and {{number of unique values of the field}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup

2020-03-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069290#comment-17069290
 ] 

Munendra S N commented on SOLR-13492:
-

I had suggested [~kgsdora] to pick this up. So, trying  to address/answer some 
of the above concerns

I have tried with jcmd, jconsole, visualvm and 
[jmxterm|https://github.com/jiaqi/jmxterm] and with only master branch (java 11)

In case of troubleshooting memory issues, it could be either local debugging or 
remote debugging. For production system, usually only remote debugging be 
possible.

h2. Local Debugging

h3. jcmd
* Force triggering GC works even with {{DisableExplicitGC}}. We could run 
{{GC.run}} (with disable works only with  jdk > 10 
https://bugs.openjdk.java.net/browse/JDK-8186902) to force gc. There are other 
commands to force gc which works in java 8 too {{GC.class_histogram}} and 
{{GC.class_stats}}

h3. jconsole and visualvm
* Both comes  with GUI. These identify local processes by checking 
hsperfdata_{{yourusername}}  in tmp directory for pids. At present, GC config 
contains {{-XX:+PerfDisableSharedMem}} due to which pids [won't be present| 
http://jtuts.com/2017/02/04/jconsole-not-showing-local-processes/] in above 
folder. So, with default settings shipped with solr jconsole and visualvm can't 
identify local processes
* I tested with removing the above flag and adding {{-XX:+DisableExplicitGC}}, 
gc now button in jconsole and visualvm doesn't work

h3. jmxterm
*  This needs to jmx enabled

h2. Remote debugging

h3. jcmd
* jcmd needs process id. Not sure if remote debugging is possible

h3. jconsole and visualvm
* If the process has jmx monitoring enabled then, remote  debugging  is 
possible. Solr ships with jmx disabled by default
* I tried enabling and with  {{-XX:+DisableExplicitGC}} and gc now button won't 
trigger gc
* For visualvm, there is option to connect using jstatd but I haven't tried it

h4. jmxterm (terminal tool)
* This needs jmx monitoring enabled
* This would behave similar to jcmd for local even with 
{{-XX:+DisableExplicitGC}}, gc can forced

I checked the usage of {{System.gc()}} in lucene/solr. It is used in 1/2 lucene 
tests and lucene benchmark. Also, checked potential problems for disabling 
explicit gc and  found 
[this|https://stackoverflow.com/questions/32912702/impact-of-setting-xxdisableexplicitgc-when-nio-direct-buffers-are-used?rq=1].

With the current default which Solr is shipped with both local  or remote 
debugging is not  possible via jconsole. With all things considered, I still 
think, shipping with {{-XX:+DisableExplicitGC}} is good choice and there are 
ways to force gc even with the above JVM flag but I haven't yet found GUI tool 
for this.

[~erickerickson]
If there are still concerns or objections I would be happy  to answer them. 
Alternative solution is to add {{-XX:+ExplicitGCInvokesConcurrent}}  so that 
any force gc is triggered concurrently


 

> Disallow explicit GC by default during Solr startup
> ---
>
> Key: SOLR-13492
> URL: https://issues.apache.org/jira/browse/SOLR-13492
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Solr should use the -XX:+DisableExplicitGC option as part of its default GC 
> tuning.
> None of Solr's stock code uses explicit GCs, so that option will have no 
> effect on most installs.  The effective result of this is that if somebody 
> adds custom code to Solr and THAT code does an explicit GC, it won't be 
> allowed to function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9198) Remove news section from TLP website

2020-03-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069376#comment-17069376
 ] 

Jan Høydahl commented on LUCENE-9198:
-

I’m fine with removing the old combined TLP release news, do you want to do it 
Alan? I hope to continue with releaseWizard update to reflect new procedure 
some time.

> Remove news section from TLP website
> 
>
> Key: LUCENE-9198
> URL: https://issues.apache.org/jira/browse/LUCENE-9198
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: new-tlp-conditional-news.png, 
> new-tlp-frontpage-layout.png, new-tlp-frontpage-layout.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On the front page [https://lucene.apache.org|https://lucene.apache.org/] we 
> today show a list of TLP news.
> For every release we author one news article for Solr, one news article for 
> LuceneCore, and one news article for TLP site, combining the two.
> In all these years we have never published a news item to TLP that is not a 
> release announcement, except in 2014 when we announced that OpenRelevance sub 
> project closed.
> I thus propose to remove this news section, and replace it with two widgets 
> that automatically display the last 5 news headings from LuceneCore, Solr and 
> PyLucene sub projects.
> If we have an important TLP announcement to make at some point, that can be 
> done right there on the front page, not?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14371) Zk StatusHandler should know about dynamic zk config

2020-03-28 Thread Jira
Jan Høydahl created SOLR-14371:
--

 Summary: Zk StatusHandler should know about dynamic zk config
 Key: SOLR-14371
 URL: https://issues.apache.org/jira/browse/SOLR-14371
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Jan Høydahl


With zk 3.5 it supports dynamic reconfig, which is used by the solr-operator 
for Kubernetes. Then Solr is given a zkHost of one url pointing to a LB 
(Service) in front of all zookeepers, and the zkclient will then fetch list of 
all zookeepers from special zknode /zookeeper/config and reconfigure itself 
with connection to all zk nodes listed. So you can then scale up/down number of 
zk nodes dynamically without restarting solr.

However, the Admin UI displays errors since it believes it is connected to only 
one zk, which is contradictory to what zk itself reports. We need to make 
ZookeeperStatusHandler aware of dynamic reconfig so it asks zkclient what 
current zkHost is instead of relying on Zk_HOST static setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14371) Zk StatusHandler should know about dynamic zk config

2020-03-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069380#comment-17069380
 ] 

Jan Høydahl commented on SOLR-14371:


[~houston] FYI 

> Zk StatusHandler should know about dynamic zk config
> 
>
> Key: SOLR-14371
> URL: https://issues.apache.org/jira/browse/SOLR-14371
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jan Høydahl
>Priority: Major
>
> With zk 3.5 it supports dynamic reconfig, which is used by the solr-operator 
> for Kubernetes. Then Solr is given a zkHost of one url pointing to a LB 
> (Service) in front of all zookeepers, and the zkclient will then fetch list 
> of all zookeepers from special zknode /zookeeper/config and reconfigure 
> itself with connection to all zk nodes listed. So you can then scale up/down 
> number of zk nodes dynamically without restarting solr.
> However, the Admin UI displays errors since it believes it is connected to 
> only one zk, which is contradictory to what zk itself reports. We need to 
> make ZookeeperStatusHandler aware of dynamic reconfig so it asks zkclient 
> what current zkHost is instead of relying on Zk_HOST static setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14356) PeerSync with hanging nodes

2020-03-28 Thread Shalin Shekhar Mangar (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069426#comment-17069426
 ] 

Shalin Shekhar Mangar commented on SOLR-14356:
--

Okay, yes let's add the connect timeout exception and discuss a better fix in 
SOLR-14368

> PeerSync with hanging nodes
> ---
>
> Key: SOLR-14356
> URL: https://issues.apache.org/jira/browse/SOLR-14356
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Priority: Major
> Attachments: SOLR-14356.patch
>
>
> Right now in {{PeerSync}} (during leader election), in case of exception on 
> requesting versions to a node, we will skip that node if exception is one the 
> following type
> * ConnectTimeoutException
> * NoHttpResponseException
> * SocketException
> Sometime the other node basically hang but still accept connection. In that 
> case SocketTimeoutException is thrown and we consider the {{PeerSync}} 
> process as failed and the whole shard just basically leaderless forever (as 
> long as the hang node still there).
> We can't just blindly adding {{SocketTimeoutException}} to above list, since 
> [~shalin] mentioned that sometimes timeout can happen because of genuine 
> reasons too e.g. temporary GC pause.
> I think the general idea here is we obey {{leaderVoteWait}} restriction and 
> retry doing sync with others in case of connection/timeout exception happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup

2020-03-28 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069440#comment-17069440
 ] 

Erick Erickson commented on SOLR-13492:
---

[~munendrasn] Thanks for being so thorough. I should have been more explicit, I 
didn't mean to cause extra work. I don't particularly care if jconsole or 
visualVM allow explicit GC, I do care that there's _some_ way to trigger GC 
without restarting Solr.

So since jcmd will do the trick, I'm back to +/-0. Having to ssh over to the 
machine running the Solr instance in question and executing this isn't onerous 
(although I think the JDK needs to be installed). I'm still lukewarm to 
protecting all Solr installations from what is a naive coding error, but 
practically I don't see that it makes enough difference to argue about ;) 

 

> Disallow explicit GC by default during Solr startup
> ---
>
> Key: SOLR-13492
> URL: https://issues.apache.org/jira/browse/SOLR-13492
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Solr should use the -XX:+DisableExplicitGC option as part of its default GC 
> tuning.
> None of Solr's stock code uses explicit GCs, so that option will have no 
> effect on most installs.  The effective result of this is that if somebody 
> adds custom code to Solr and THAT code does an explicit GC, it won't be 
> allowed to function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup

2020-03-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069443#comment-17069443
 ] 

Munendra S N commented on SOLR-13492:
-

Thanks [~erickerickson]. We will wait for few more days for others to review. 
Currently, idea is to go ahead DisableExplicitGC. 

> Disallow explicit GC by default during Solr startup
> ---
>
> Key: SOLR-13492
> URL: https://issues.apache.org/jira/browse/SOLR-13492
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Solr should use the -XX:+DisableExplicitGC option as part of its default GC 
> tuning.
> None of Solr's stock code uses explicit GCs, so that option will have no 
> effect on most installs.  The effective result of this is that if somebody 
> adds custom code to Solr and THAT code does an explicit GC, it won't be 
> allowed to function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup

2020-03-28 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069446#comment-17069446
 ] 

Munendra S N commented on SOLR-13492:
-

 [^SOLR-13492.patch] 
Attaching patch to validate against precommit build

> Disallow explicit GC by default during Solr startup
> ---
>
> Key: SOLR-13492
> URL: https://issues.apache.org/jira/browse/SOLR-13492
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Major
> Attachments: SOLR-13492.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Solr should use the -XX:+DisableExplicitGC option as part of its default GC 
> tuning.
> None of Solr's stock code uses explicit GCs, so that option will have no 
> effect on most installs.  The effective result of this is that if somebody 
> adds custom code to Solr and THAT code does an explicit GC, it won't be 
> allowed to function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13492) Disallow explicit GC by default during Solr startup

2020-03-28 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13492:

Status: Patch Available  (was: Open)

> Disallow explicit GC by default during Solr startup
> ---
>
> Key: SOLR-13492
> URL: https://issues.apache.org/jira/browse/SOLR-13492
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Major
> Attachments: SOLR-13492.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Solr should use the -XX:+DisableExplicitGC option as part of its default GC 
> tuning.
> None of Solr's stock code uses explicit GCs, so that option will have no 
> effect on most installs.  The effective result of this is that if somebody 
> adds custom code to Solr and THAT code does an explicit GC, it won't be 
> allowed to function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13492) Disallow explicit GC by default during Solr startup

2020-03-28 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13492:

Attachment: SOLR-13492.patch

> Disallow explicit GC by default during Solr startup
> ---
>
> Key: SOLR-13492
> URL: https://issues.apache.org/jira/browse/SOLR-13492
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Major
> Attachments: SOLR-13492.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Solr should use the -XX:+DisableExplicitGC option as part of its default GC 
> tuning.
> None of Solr's stock code uses explicit GCs, so that option will have no 
> effect on most installs.  The effective result of this is that if somebody 
> adds custom code to Solr and THAT code does an explicit GC, it won't be 
> allowed to function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9297) Index has about 600+ columns,average size of doc is relatively big, Lucene firstly obtain the original doc from disk and then merge the old and the updating coulmns to a

2020-03-28 Thread kaihe (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaihe updated LUCENE-9297:
--
Description: 
Index has about 600+ columns,average size of doc is relatively big, Lucene 
firstly obtain the original doc from disk and then merge the old and the 
updating coulmns to a new one,finally flush to disk.
The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly 
updating requests call frequently.

I want to optimize the partly updating strategy, only partly columns instead of 
all are obtained and merge into a new one while partly updating request 
calls,in purpose of cuting down disk io usage rate.
is there any suggestions?

> Index has about 600+ columns,average size of doc is relatively big, Lucene 
> firstly obtain the original doc from disk and then merge the old and the 
> updating coulmns to a new one,finally flush to disk.The dsik io usage rate of 
> our 150+ nodes always reach 
> --
>
> Key: LUCENE-9297
> URL: https://issues.apache.org/jira/browse/LUCENE-9297
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: kaihe
>Priority: Major
>
> Index has about 600+ columns,average size of doc is relatively big, Lucene 
> firstly obtain the original doc from disk and then merge the old and the 
> updating coulmns to a new one,finally flush to disk.
> The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly 
> updating requests call frequently.
> I want to optimize the partly updating strategy, only partly columns instead 
> of all are obtained and merge into a new one while partly updating request 
> calls,in purpose of cuting down disk io usage rate.
> is there any suggestions?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9297) Index has about 600+ columns,average size of doc is relatively big, Lucene firstly obtain the original doc from disk and then merge the old and the updating coulmns to a

2020-03-28 Thread kaihe (Jira)
kaihe created LUCENE-9297:
-

 Summary: Index has about 600+ columns,average size of doc is 
relatively big, Lucene firstly obtain the original doc from disk and then merge 
the old and the updating coulmns to a new one,finally flush to disk.The dsik io 
usage rate of our 150+ nodes always reach 
 Key: LUCENE-9297
 URL: https://issues.apache.org/jira/browse/LUCENE-9297
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: kaihe






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9297) partly updating strategy

2020-03-28 Thread kaihe (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaihe updated LUCENE-9297:
--
Summary: partly updating strategy  (was: Index has about 600+ 
columns,average size of doc is relatively big, Lucene firstly obtain the 
original doc from disk and then merge the old and the updating coulmns to a new 
one,finally flush to disk.The dsik io usage rate of our 150+ nodes always reach 
)

> partly updating strategy
> 
>
> Key: LUCENE-9297
> URL: https://issues.apache.org/jira/browse/LUCENE-9297
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: kaihe
>Priority: Major
>
> Index has about 600+ columns,average size of doc is relatively big, Lucene 
> firstly obtain the original doc from disk and then merge the old and the 
> updating coulmns to a new one,finally flush to disk.
> The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly 
> updating requests call frequently.
> I want to optimize the partly updating strategy, only partly columns instead 
> of all are obtained and merge into a new one while partly updating request 
> calls,in purpose of cuting down disk io usage rate.
> is there any suggestions?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11775) json.facet can use inconsistent Long/Integer for "count" depending on shard count

2020-03-28 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069943#comment-17069943
 ] 

Mikhail Khludnev commented on SOLR-11775:
-

+1

> json.facet can use inconsistent Long/Integer for "count" depending on shard 
> count
> -
>
> Key: SOLR-11775
> URL: https://issues.apache.org/jira/browse/SOLR-11775
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-11775.patch, SOLR-11775.patch
>
>
> (NOTE: I noticed this while working on a test for {{type: range}} but it's 
> possible other facet types may be affected as well)
> When dealing with a single core request -- either standalone or a collection 
> with only one shard -- json.facet seems to use "Integer" objects to return 
> the "count" of facet buckets, however if the shard count is increased then 
> the end client gets a "Long" object for the "count"
> (This isn't noticable when using {{wt=json}} but can be very problematic when 
> trying to write client code using {{wt=xml}} or SolrJ



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9297) partly updating strategy

2020-03-28 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved LUCENE-9297.

Resolution: Invalid

Please raise questions like this on the user's list, we try to reserve JIRAs 
for known bugs/enhancements rather than usage questions.

See: 
[http://lucene.apache.org/solr/community.html#mailing-lists-irc,] there are 
links to both Lucene and Solr user's lists.

A _lot_ more people will see your question on that list and may be able to help 
more quickly.

If it's determined that this really is a code issue or enhancement to Lucene or 
Solr and not a configuration/usage problem, we can raise a new JIRA or reopen 
this one.

> partly updating strategy
> 
>
> Key: LUCENE-9297
> URL: https://issues.apache.org/jira/browse/LUCENE-9297
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: kaihe
>Priority: Major
>
> Index has about 600+ columns,average size of doc is relatively big, Lucene 
> firstly obtain the original doc from disk and then merge the old and the 
> updating coulmns to a new one,finally flush to disk.
> The dsik io usage rate of our 150+ nodes always reach nearly 99% while partly 
> updating requests call frequently.
> I want to optimize the partly updating strategy, only partly columns instead 
> of all are obtained and merge into a new one while partly updating request 
> calls,in purpose of cuting down disk io usage rate.
> is there any suggestions?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit

2020-03-28 Thread GitBox
mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw 
exception if DWPT grows beyond it's maximum ram limit
URL: https://github.com/apache/lucene-solr/pull/1361#discussion_r399689324
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java
 ##
 @@ -435,55 +436,81 @@ private void ensureInitialized(ThreadState state) throws 
IOException {
   long updateDocuments(final Iterable> docs, final Analyzer analyzer,
final DocumentsWriterDeleteQueue.Node delNode) 
throws IOException {
 boolean hasEvents = preUpdate();
-
 final ThreadState perThread = flushControl.obtainAndLock();
 DocumentsWriterPerThread flushingDWPT = null;
+final boolean isUpdate = delNode != null && delNode.isDelete();
+final int numDocsBefore = perThread.dwpt == null ? 0 : 
perThread.dwpt.getNumDocsInRAM();
 final long seqNo;
 try {
-  try {
-// This must happen after we've pulled the ThreadState because IW.close
-// waits for all ThreadStates to be released:
-ensureOpen();
-ensureInitialized(perThread);
-assert perThread.isInitialized();
-final DocumentsWriterPerThread dwpt = perThread.dwpt;
-final int dwptNumDocs = dwpt.getNumDocsInRAM();
-try {
-  seqNo = dwpt.updateDocuments(docs, analyzer, delNode, 
flushNotifications);
-  perThread.updateLastSeqNo(seqNo);
-} finally {
-  // We don't know how many documents were actually
-  // counted as indexed, so we must subtract here to
-  // accumulate our separate counter:
-  numDocsInRAM.addAndGet(dwpt.getNumDocsInRAM() - dwptNumDocs);
-  if (dwpt.isAborted()) {
-flushControl.doOnAbort(perThread);
-  } else if (dwpt.getNumDocsInRAM() > 0) {
-// we need to check if we have at least one doc in the DWPT. This 
can be 0 if we fail
-// due to exceeding total number of docs etc.
-final boolean isUpdate = delNode != null && delNode.isDelete();
+  innerUpdateDocuments(perThread, docs, analyzer, delNode);
+  flushingDWPT = flushControl.doAfterDocument(perThread, isUpdate);
+} catch (MaxBufferSizeExceededException ex) {
+  if (perThread.dwpt.isAborted()) {
+throw ex;
+  } else {
+// we hit an exception but still need to flush this DWPT
+// let's run postUpdate to make sure we flush stuff to disk
+// in the case we exceed ram limits etc.
+hasEvents = doAfterDocumentRejected(perThread, isUpdate, hasEvents);
+// we retry if we the DWPT had more than one document indexed and was 
flushed
+boolean shouldRetry = perThread.dwpt == null && numDocsBefore > 0;
+if (shouldRetry) {
+  try {
+// we retry into a brand new DWPT, if it doesn't fit in here we 
can't index the document
 
 Review comment:
   Oh, I see: we create a new DWPT and send the doc there, ok.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit

2020-03-28 Thread GitBox
mikemccand commented on a change in pull request #1361: LUCENE-8118: Throw 
exception if DWPT grows beyond it's maximum ram limit
URL: https://github.com/apache/lucene-solr/pull/1361#discussion_r399689335
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java
 ##
 @@ -435,55 +436,81 @@ private void ensureInitialized(ThreadState state) throws 
IOException {
   long updateDocuments(final Iterable> docs, final Analyzer analyzer,
final DocumentsWriterDeleteQueue.Node delNode) 
throws IOException {
 boolean hasEvents = preUpdate();
-
 final ThreadState perThread = flushControl.obtainAndLock();
 DocumentsWriterPerThread flushingDWPT = null;
+final boolean isUpdate = delNode != null && delNode.isDelete();
+final int numDocsBefore = perThread.dwpt == null ? 0 : 
perThread.dwpt.getNumDocsInRAM();
 final long seqNo;
 try {
-  try {
-// This must happen after we've pulled the ThreadState because IW.close
-// waits for all ThreadStates to be released:
-ensureOpen();
-ensureInitialized(perThread);
-assert perThread.isInitialized();
-final DocumentsWriterPerThread dwpt = perThread.dwpt;
-final int dwptNumDocs = dwpt.getNumDocsInRAM();
-try {
-  seqNo = dwpt.updateDocuments(docs, analyzer, delNode, 
flushNotifications);
-  perThread.updateLastSeqNo(seqNo);
-} finally {
-  // We don't know how many documents were actually
-  // counted as indexed, so we must subtract here to
-  // accumulate our separate counter:
-  numDocsInRAM.addAndGet(dwpt.getNumDocsInRAM() - dwptNumDocs);
-  if (dwpt.isAborted()) {
-flushControl.doOnAbort(perThread);
-  } else if (dwpt.getNumDocsInRAM() > 0) {
-// we need to check if we have at least one doc in the DWPT. This 
can be 0 if we fail
-// due to exceeding total number of docs etc.
-final boolean isUpdate = delNode != null && delNode.isDelete();
+  innerUpdateDocuments(perThread, docs, analyzer, delNode);
+  flushingDWPT = flushControl.doAfterDocument(perThread, isUpdate);
+} catch (MaxBufferSizeExceededException ex) {
+  if (perThread.dwpt.isAborted()) {
+throw ex;
+  } else {
+// we hit an exception but still need to flush this DWPT
+// let's run postUpdate to make sure we flush stuff to disk
+// in the case we exceed ram limits etc.
+hasEvents = doAfterDocumentRejected(perThread, isUpdate, hasEvents);
+// we retry if we the DWPT had more than one document indexed and was 
flushed
+boolean shouldRetry = perThread.dwpt == null && numDocsBefore > 0;
+if (shouldRetry) {
 
 Review comment:
   Got it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on issue #1361: LUCENE-8118: Throw exception if DWPT grows beyond it's maximum ram limit

2020-03-28 Thread GitBox
mikemccand commented on issue #1361: LUCENE-8118: Throw exception if DWPT grows 
beyond it's maximum ram limit
URL: https://github.com/apache/lucene-solr/pull/1361#issuecomment-605494944
 
 
   Another thing we could consider is changing DWPT's postings addressing from 
`int` to `long` so we don't need retry logic.
   
   We could do it, always, increasing the per-unique-term memory cost.  Or we 
could maybe find a way to do it conditionally, when a given DWPT wants to 
exceed the 2.1 GB limit, but that'd be trickier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13492) Disallow explicit GC by default during Solr startup

2020-03-28 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17069974#comment-17069974
 ] 

Lucene/Solr QA commented on SOLR-13492:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m  2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m  2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate ref guide {color} | 
{color:green}  0m  2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:black}{color} | {color:black} {color} | {color:black}  1m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13492 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12998109/SOLR-13492.patch |
| Optional Tests |  validatesourcepatterns  ratsources  validaterefguide  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 9de68117067 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| modules | C: solr solr/solr-ref-guide U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/728/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Disallow explicit GC by default during Solr startup
> ---
>
> Key: SOLR-13492
> URL: https://issues.apache.org/jira/browse/SOLR-13492
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Major
> Attachments: SOLR-13492.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Solr should use the -XX:+DisableExplicitGC option as part of its default GC 
> tuning.
> None of Solr's stock code uses explicit GCs, so that option will have no 
> effect on most installs.  The effective result of this is that if somebody 
> adds custom code to Solr and THAT code does an explicit GC, it won't be 
> allowed to function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-03-28 Thread GitBox
mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to 
skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672
 
 
   @msokolov Thank for suggesting additional benchmarks that we can use.
   Below are the results on the dataset `wikimedium10m`.
   
   First I will repeat the results from the previous round of benchmarking:
   
   topN=10, taskRepeatCount = 20, concurrentSearchers = False
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   147.64 |   (11.5%) |  547.80 
|(6.6%) |
   | HighTermMonthSort |   147.85 |   (12.2%) |  239.28 
|(7.3%) |
   | HighTermDayOfYearSort |74.44 |(7.7%) |   42.56 
|   (12.1%) |
   
   
   
   ---
 topN=10, **taskRepeatCount = 500**, concurrentSearchers = False
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   184.60 |(8.2%) | 3046.19 
|(4.4%) |
   | HighTermMonthSort |   209.43 |(6.5%) |  253.90 
|   (10.5%) |
   | HighTermDayOfYearSort |   130.97 |(5.8%) |   73.25 
|   (11.8%) |
   
   This seemed to speed up all operations, and here the speedups for 
`TermDTSort` even bigger: 16.5x times. There is also seems to be more 
regression for `HighTermDayOfYearSort`.
   
   ---
 **topN=500**,  taskRepeatCount = 20, concurrentSearchers = False
   
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   210.24 |(9.7%) |  537.65 
|(6.7%) |
   | HighTermMonthSort |   116.02 |(8.9%) |  189.96 
|   (13.5%) |
   | HighTermDayOfYearSort |42.33 |(7.6%) |   67.93 
|(9.3%) |
   
   With increased `topN` the sort optimization has less speedups up to 2x, as 
it is expected as it will be possible to run it only after collecting `topN` 
docs.
   
   ---
   topN=10, taskRepeatCount = 20, **concurrentSearchers = True**
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   132.09 |   (14.3%) |  287.93 
|   (11.8%) |
   | HighTermMonthSort |   211.01 |   (12.2%) |  116.46 
|(7.1%) |
   | HighTermDayOfYearSort |72.28 |(6.1%) |   68.21 
|   (11.4%) |
   
   With the concurrent searchers the speedups are also smaller up to 2x. This 
is expected as now segments are spread between several 
TopFieldCollects/Comparators and they don't exchange bottom values.  As a 
follow-up on this PR, we can think how we can have a global bottom value 
similar how `MaxScoreAccumulator` is used to set up a global competitive min 
score. 
   
   ---
   with **indexSort='lastModNDV:long'**  topN=10, taskRepeatCount = 20, 
concurrentSearchers = False
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   314.78 |   (11.6%) |  111.80 
|   (13.3%) |
   | HighTermMonthSort |   114.77 |   (13.1%) |   78.22 
|(7.5%) |
   | HighTermDayOfYearSort |46.82 |(5.7%) |   33.68 
|(6.1%) |


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-03-28 Thread GitBox
mayya-sharipova edited a comment on issue #1351: LUCENE-9280: Collectors to 
skip noncompetitive documents
URL: https://github.com/apache/lucene-solr/pull/1351#issuecomment-605327672
 
 
   @msokolov Thank for suggesting additional benchmarks that we can use.
   Below are the results on the dataset `wikimedium10m`.
   
   First I will repeat the results from the previous round of benchmarking:
   
   topN=10, taskRepeatCount = 20, concurrentSearchers = False
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   147.64 |   (11.5%) |  547.80 
|(6.6%) |
   | HighTermMonthSort |   147.85 |   (12.2%) |  239.28 
|(7.3%) |
   | HighTermDayOfYearSort |74.44 |(7.7%) |   42.56 
|   (12.1%) |
   
   
   
   ---
 topN=10, **taskRepeatCount = 500**, concurrentSearchers = False
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   184.60 |(8.2%) | 3046.19 
|(4.4%) |
   | HighTermMonthSort |   209.43 |(6.5%) |  253.90 
|   (10.5%) |
   | HighTermDayOfYearSort |   130.97 |(5.8%) |   73.25 
|   (11.8%) |
   
   This seemed to speed up all operations, and here the speedups for 
`TermDTSort` even bigger: 16.5x times. There is also seems to be more 
regression for `HighTermDayOfYearSort`.
   
   ---
 **topN=500**,  taskRepeatCount = 20, concurrentSearchers = False
   
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   210.24 |(9.7%) |  537.65 
|(6.7%) |
   | HighTermMonthSort |   116.02 |(8.9%) |  189.96 
|   (13.5%) |
   | HighTermDayOfYearSort |42.33 |(7.6%) |   67.93 
|(9.3%) |
   
   With increased `topN` the sort optimization has less speedups up to 2x, as 
it is expected as it will be possible to run it only after collecting `topN` 
docs.
   
   ---
   topN=10, taskRepeatCount = 20, **concurrentSearchers = True**
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   132.09 |   (14.3%) |  287.93 
|   (11.8%) |
   | HighTermMonthSort |   211.01 |   (12.2%) |  116.46 
|(7.1%) |
   | HighTermDayOfYearSort |72.28 |(6.1%) |   68.21 
|   (11.4%) |
   
   With the concurrent searchers the speedups are also smaller up to 2x. This 
is expected as now segments are spread between several 
TopFieldCollects/Comparators and they don't exchange bottom values.  As a 
follow-up on this PR, we can think how we can have a global bottom value 
similar how `MaxScoreAccumulator` is used to set up a global competitive min 
score. 
   
   ---
   with **indexSort='lastModNDV:long'**  topN=10, taskRepeatCount = 20, 
concurrentSearchers = False
   
   | TaskQPS   | baseline QPS | StdDevQPS | my_modified_version QPS 
| StdDevQPS |
   | - | ---: | : | --: 
| : |
   | **TermDTSort**|   321.75 |   (11.5%) |  364.83 
|(7.8%) |
   | HighTermMonthSort |   205.20 |(5.7%) |  178.16 
|(7.8%) |
   | HighTermDayOfYearSort |66.07 |   (12.0%) |   58.84 
|(9.3%) |


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle

2020-03-28 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070056#comment-17070056
 ] 

Mike Drob commented on LUCENE-9266:
---

   [smoker]     FAILED:
   [smoker]       ./gradle/wrapper/gradle-wrapper.jar

> ant nightly-smoke fails due to presence of build.gradle
> ---
>
> Key: LUCENE-9266
> URL: https://issues.apache.org/jira/browse/LUCENE-9266
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Seen on Jenkins - 
> [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console]
>  
> Reproduced locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle

2020-03-28 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070056#comment-17070056
 ] 

Mike Drob edited comment on LUCENE-9266 at 3/28/20, 10:39 PM:
--

   [smoker]  [|]         [/]         [-]         [\]                  unpack 
solr-9.0.0-src.tgz...
   [smoker]     make sure no JARs/WARs in src dist...
   [smoker]     FAILED:
   [smoker]       ./gradle/wrapper/gradle-wrapper.jar


was (Author: mdrob):
   [smoker]     FAILED:
   [smoker]       ./gradle/wrapper/gradle-wrapper.jar

> ant nightly-smoke fails due to presence of build.gradle
> ---
>
> Key: LUCENE-9266
> URL: https://issues.apache.org/jira/browse/LUCENE-9266
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Seen on Jenkins - 
> [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console]
>  
> Reproduced locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy opened a new pull request #1387: SOLR-14210: Include replica health in healtcheck handler

2020-03-28 Thread GitBox
janhoy opened a new pull request #1387: SOLR-14210: Include replica health in 
healtcheck handler
URL: https://github.com/apache/lucene-solr/pull/1387
 
 
   See https://issues.apache.org/jira/browse/SOLR-14210
   
   WIP, not tests yet, not even tested manually


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14210) Introduce Node-level status handler for replicas

2020-03-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070166#comment-17070166
 ] 

Jan Høydahl commented on SOLR-14210:


See https://github.com/apache/lucene-solr/pull/1387 for a first attempt of 
this. If param {{&failWhenRecovering=true}} is passed to {{/api/node/health}} 
then it will return 503 if one or more cores on the node are in states 
{{RECOVERY}} or {{CONSTRUCTION}}.

> Introduce Node-level status handler for replicas
> 
>
> Key: SOLR-14210
> URL: https://issues.apache.org/jira/browse/SOLR-14210
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (9.0), 8.5
>Reporter: Houston Putman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Background
> As was brought up in SOLR-13055, in order to run Solr in a more cloud-native 
> way, we need some additional features around node-level healthchecks.
> {quote}Like in Kubernetes we need 'liveliness' and 'readiness' probe 
> explained in 
> [https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n]
>  determine if a node is live and ready to serve live traffic.
> {quote}
>  
> However there are issues around kubernetes managing it's own rolling 
> restarts. With the current healthcheck setup, it's easy to envision a 
> scenario in which Solr reports itself as "healthy" when all of its replicas 
> are actually recovering. Therefore kubernetes, seeing a healthy pod would 
> then go and restart the next Solr node. This can happen until all replicas 
> are "recovering" and none are healthy. (maybe the last one restarted will be 
> "down", but still there are no "active" replicas)
> h2. Proposal
> I propose we make an additional healthcheck handler that returns whether all 
> replicas hosted by that Solr node are healthy and "active". That way we will 
> be able to use the [default kubernetes rolling restart 
> logic|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#update-strategies]
>  with Solr.
> To add on to [Jan's point 
> here|https://issues.apache.org/jira/browse/SOLR-13055?focusedCommentId=16716559&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16716559],
>  this handler should be more friendly for other Content-Types and should use 
> bettter HTTP response statuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070173#comment-17070173
 ] 

ASF subversion and git services commented on SOLR-14317:


Commit 782ded2d7ab10f6eea0468a9b0e49a94b2ce6c0b in lucene-solr's branch 
refs/heads/master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=782ded2 ]

SOLR-14317: HttpClusterStateProvider throws exception when only one node down 
(Closes #1342)


> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-14317.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] asfgit closed pull request #1342: SOLR-14317: HttpClusterStateProvider throws exception when only one node down

2020-03-28 Thread GitBox
asfgit closed pull request #1342: SOLR-14317: HttpClusterStateProvider throws 
exception when only one node down
URL: https://github.com/apache/lucene-solr/pull/1342
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-28 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070174#comment-17070174
 ] 

ASF subversion and git services commented on SOLR-14317:


Commit 5f6efb000fb6a4b23a67eb23f8a463c2ece6706b in lucene-solr's branch 
refs/heads/branch_8x from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f6efb0 ]

SOLR-14317: HttpClusterStateProvider throws exception when only one node down 
(Closes #1342)


> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-14317.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-28 Thread Ishan Chattopadhyaya (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya resolved SOLR-14317.
-
Fix Version/s: 8.6
   Resolution: Fixed

Thanks [~lyle_wang]!

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.6
>
> Attachments: SOLR-14317.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-28 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070176#comment-17070176
 ] 

Ishan Chattopadhyaya commented on SOLR-14317:
-

Haven't ported to 7.7 yet. Please attach patch if you feel it is needed. 
[~noble] (since, you're the RM for the next 7.7 release), do you think this 
should be included?

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.6
>
> Attachments: SOLR-14317.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14170) Tag package feature as experimental

2020-03-28 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070178#comment-17070178
 ] 

Ishan Chattopadhyaya commented on SOLR-14170:
-

{quote}Not yet recommended for production use
{quote}
I don't see why this shouldn't be recommended for production use. There are 
plenty of security related warnings added to the reference guide for this 
feature. WDYT, [~noble.paul] ?

> Tag package feature as experimental
> ---
>
> Key: SOLR-14170
> URL: https://issues.apache.org/jira/browse/SOLR-14170
> Project: Solr
>  Issue Type: Test
>  Components: documentation
>Reporter: Jan Høydahl
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Fix For: 8.6
>
>
> The new package store and package installation feature introduced in 8.4 was 
> supposed to be tagged as lucene.experimental with a clear warning in 
> ref-guide "Not yet recommended for production use"
> Let's add that for 8.5 so there is no doubt that if you use the feature you 
> know the risks. Once the APIs have stabilized and there are a number of 
> packages available "in the wild", we can decide to release it as a "GA" 
> feature, but not yet!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org