date:20201202

Feng Guo created LUCENE-9629:


 Summary: Use computed mask values in ForUtil
 Key: LUCENE-9629
 URL: https://issues.apache.org/jira/browse/LUCENE-9629
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


In the class ForkUtil, mask values have been computed and stored in static 
final vailables, but they are recomputed for every encoding, which may be 
unnecessary. 

anther small fix is that `remainingBitsPerValue > remainingBitsPerLong` to 
'remainingBitsPerValue >= remainingBitsPerLong', otherwise

```

if (remainingBitsPerValue == 0) {
 idx++;
 remainingBitsPerValue = bitsPerValue;
 }

```

these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9629) Use computed mask values in ForUtil



 [ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-9629:
-
Description: 
In the class ForkUtil, mask values have been computed and stored in static 
final vailables, but they are recomputed for every encoding, which may be 
unnecessary. 

anther small fix is that change
{code:java}
remainingBitsPerValue > remainingBitsPerLong{code}
 to
{code:java}
remainingBitsPerValue >= remainingBitsPerLong{code}
otherwise

 
{code:java}
if (remainingBitsPerValue == 0) {
 idx++;
 remainingBitsPerValue = bitsPerValue; 
}
{code}
 

these code will never be used.

  was:
In the class ForkUtil, mask values have been computed and stored in static 
final vailables, but they are recomputed for every encoding, which may be 
unnecessary. 

anther small fix is that `remainingBitsPerValue > remainingBitsPerLong` to 
'remainingBitsPerValue >= remainingBitsPerLong', otherwise

```

if (remainingBitsPerValue == 0) {
 idx++;
 remainingBitsPerValue = bitsPerValue;
 }

```

these code will never be used.


> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gf2121 opened a new pull request #2113: LUCENE-9629: use computed masks



gf2121 opened a new pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113


   # Description
   
   In the class ForUtil, mask values have been computed and stored in static 
final vailables, but they are recomputed when encoding, may be we can avoid 
this~
   
   # Solution
   
   use the computed mask values
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties

2020-12-02 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242285#comment-17242285
 ] 

Andrzej Bialecki commented on SOLR-14182:
-

I'd like to start working on this. As I see it this issue needs to address the 
following:
 * mark {{solr.xml:/solr/metrics}} as deprecated and remove in 9.1.
 * general metrics configuration (such as enable/disable, metric suppliers 
options) should move to {{/clusterprops.json:/metrics}}
 * metric reporters configuration should be moved to container-level plugins, 
ie. {{/clusterprops.json:/plugin}} and the corresponding API. This will make 
the reporters easier to configure and change dynamically without restarting 
Solr nodes.
 * precedence: {{MetricsConfig}} will be initialized from {{solr.xml}} as 
before. Then, if any clusterprops configuration is present then it will REPLACE 
the one from {{solr.xml}} - I don't want to attempt any fusion of these two, 
and I think it's easier to migrate if you don't merge these configs. This 
approach means that defining anything using the new locations will 
automatically turn off the old {{solr.xml}} config.

> Move metric reporters config from solr.xml to ZK cluster properties
> ---
>
> Key: SOLR-14182
> URL: https://issues.apache.org/jira/browse/SOLR-14182
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.4
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Metric reporters are currently configured statically in solr.xml, which makes 
> it difficult to change dynamically or in a containerized environment.
> We should move this section to ZK /cluster.properties and add a back-compat 
> migration shim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures

2020-12-02 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242331#comment-17242331
 ] 

Erick Erickson commented on SOLR-14992:
---

[~mdrob] No magic, I just use Mark Miller's "beasting" script. This fails about 
20% of the time for me (MBP). I really hate these, you change code, beast for a 
while and are never completely sure you've found the problem...

If you have a possibility for a fix, I'd be glad to beast it, even if it's just 
a wild stab...

Next time it's nasty out side I'll be taking a closer look at it.

> TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures
> --
>
> Key: SOLR-14992
> URL: https://issues.apache.org/jira/browse/SOLR-14992
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Minor
>
> I've noticed this test started failing very frequently with an error like:
> {noformat}
> Error Message:
> Error from server at http://127.0.0.1:39037/solr: Cannot create collection 
> pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
> maxShardsPerNode is 1, and the number of nodes currently live or live and 
> part of your createNodeSet is 3. This allows a maximum of 3 to be created. 
> Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 
> 0 and value of pullReplicas is 1. This requires 4 shards to be created 
> (higher than the allowed number)
> Stack Trace:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://127.0.0.1:39037/solr: Cannot create collection 
> pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
> maxShardsPerNode is 1, and the number of nodes currently live or live and 
> part of your createNodeSet is 3. This allows a maximum of 3 to be created. 
> Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 
> 0 and value of pullReplicas is 1. This requires 4 shards to be created 
> (higher than the allowed number)
> at 
> __randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
> at 
> org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure

[jira] [Created] (SOLR-15023) Timeout Issue with Solr Metrics API

Dinesh Kumar created SOLR-15023:
---

 Summary: Timeout Issue with Solr Metrics API
 Key: SOLR-15023
 URL: https://issues.apache.org/jira/browse/SOLR-15023
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI, metrics
Affects Versions: 8.2
Reporter: Dinesh Kumar


Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API. 

*Detail Analysis:*
We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking 20K ms but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost" 

I could see the AdminHandlerProxy timeout warning in solr logs. Attached the 
error msg from Solr logs: 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API



 [ 
https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Kumar updated SOLR-15023:

Description: 
Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API. 

*Detailed Analysis:*
We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking 20K ms but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost" 

I could see the AdminHandlerProxy timeout warning in solr logs. Attached the 
error msg from Solr logs: 



  was:
Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API. 

*Detail Analysis:*
We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking 20K ms but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost" 

I could see the AdminHandlerProxy timeout warning in solr logs. Attached the 
error msg from Solr logs: 




> Timeout Issue with Solr Metrics API
> ---
>
> Key: SOLR-15023
> URL: https://issues.apache.org/jira/browse/SOLR-15023
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, metrics
>Affects Versions: 8.2
>Reporter: Dinesh Kumar
>Priority: Major
>
> Hi Team,
> We are facing "connection lost error" in Solr admin page. While debugging, we 
> found an issue with admin/metrics API. 
> *Detailed Analysis:*
> We have around 200+ collections on a Solr cloud cluster which is having 4 
> Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit 
> "cloud" page on the Solr admin UI it results in "Connection to Solr lost" 
> error in few seconds.
> When we tried to debug the same, we found Solr admin/metrics API that is 
> called internally is taking *20K ms* and leads to time out which results in 
> Connection to Solr lost. On the other hand, I tried to hit the same query 
> separately on a browser which still taking 20K ms but I get a proper response.
> When the admin/metrics API call is happening from the admin console, the 
> first call is taking a too long time and finally fails to load the response. 
> As a result, concurrent multiple calls were made for the same API and throws 
> "Connection to Solr lost" 
> I could see the AdminHandlerProxy timeout warning in solr logs. Attached the 
> error msg from Solr logs: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API



 [ 
https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Kumar updated SOLR-15023:

Description: 
Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API. 

*Detailed Analysis:*
We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking *20K ms* but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost" 

I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
error msg from Solr logs: 



  was:
Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API. 

*Detailed Analysis:*
We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking 20K ms but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost" 

I could see the AdminHandlerProxy timeout warning in solr logs. Attached the 
error msg from Solr logs: 




> Timeout Issue with Solr Metrics API
> ---
>
> Key: SOLR-15023
> URL: https://issues.apache.org/jira/browse/SOLR-15023
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, metrics
>Affects Versions: 8.2
>Reporter: Dinesh Kumar
>Priority: Major
>
> Hi Team,
> We are facing "connection lost error" in Solr admin page. While debugging, we 
> found an issue with admin/metrics API. 
> *Detailed Analysis:*
> We have around 200+ collections on a Solr cloud cluster which is having 4 
> Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit 
> "cloud" page on the Solr admin UI it results in "Connection to Solr lost" 
> error in few seconds.
> When we tried to debug the same, we found Solr admin/metrics API that is 
> called internally is taking *20K ms* and leads to time out which results in 
> Connection to Solr lost. On the other hand, I tried to hit the same query 
> separately on a browser which still taking *20K ms* but I get a proper 
> response.
> When the admin/metrics API call is happening from the admin console, the 
> first call is taking a too long time and finally fails to load the response. 
> As a result, concurrent multiple calls were made for the same API and throws 
> "Connection to Solr lost" 
> I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
> error msg from Solr logs: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil

2020-12-02 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242358#comment-17242358
 ] 

Adrien Grand commented on LUCENE-9629:
--

Thanks for catching this unused code block.

I'm unsure whether we should move forward with the other part of the change 
that makes sure we precompute all masks. Have you been able to measure a 
speedup with your change? It brings some more lines of code for the write path, 
which is less performance-sensitive than the read path so we usually care less 
about optimizing it. This is e.g. why the read path specializes code for every 
number of bits per value while the write path doesn't.

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API



 [ 
https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Kumar updated SOLR-15023:

Attachment: Error.pdf

> Timeout Issue with Solr Metrics API
> ---
>
> Key: SOLR-15023
> URL: https://issues.apache.org/jira/browse/SOLR-15023
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, metrics
>Affects Versions: 8.2
>Reporter: Dinesh Kumar
>Priority: Major
> Attachments: Error.pdf
>
>
> Hi Team,
> We are facing "connection lost error" in Solr admin page. While debugging, we 
> found an issue with admin/metrics API. 
> *Detailed Analysis:*
> We have around 200+ collections on a Solr cloud cluster which is having 4 
> Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit 
> "cloud" page on the Solr admin UI it results in "Connection to Solr lost" 
> error in few seconds.
> When we tried to debug the same, we found Solr admin/metrics API that is 
> called internally is taking *20K ms* and leads to time out which results in 
> Connection to Solr lost. On the other hand, I tried to hit the same query 
> separately on a browser which still taking *20K ms* but I get a proper 
> response.
> When the admin/metrics API call is happening from the admin console, the 
> first call is taking a too long time and finally fails to load the response. 
> As a result, concurrent multiple calls were made for the same API and throws 
> "Connection to Solr lost" 
> I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
> error msg from Solr logs: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API



 [ 
https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Kumar updated SOLR-15023:

Description: 
Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API.

*Detail Analysis:*
 We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking *20K ms* but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost"

We tried a few ways to disable this API call or to increase the timeout, but 
nothing works. 

I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
error msg from Solr logs:

  was:
Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API. 

*Detailed Analysis:*
We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking *20K ms* but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost" 

I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
error msg from Solr logs: 




> Timeout Issue with Solr Metrics API
> ---
>
> Key: SOLR-15023
> URL: https://issues.apache.org/jira/browse/SOLR-15023
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, metrics
>Affects Versions: 8.2
>Reporter: Dinesh Kumar
>Priority: Major
> Attachments: Error.pdf
>
>
> Hi Team,
> We are facing "connection lost error" in Solr admin page. While debugging, we 
> found an issue with admin/metrics API.
> *Detail Analysis:*
>  We have around 200+ collections on a Solr cloud cluster which is having 4 
> Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit 
> "cloud" page on the Solr admin UI it results in "Connection to Solr lost" 
> error in few seconds.
> When we tried to debug the same, we found Solr admin/metrics API that is 
> called internally is taking *20K ms* and leads to time out which results in 
> Connection to Solr lost. On the other hand, I tried to hit the same query 
> separately on a browser which still taking *20K ms* but I get a proper 
> response.
> When the admin/metrics API call is happening from the admin console, the 
> first call is taking a too long time and finally fails to load the response. 
> As a result, concurrent multiple calls were made for the same API and throws 
> "Connection to Solr lost"
> We tried a few ways to disable this API call or to increase the timeout, but 
> nothing works. 
> I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
> error msg from Solr logs:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API



 [ 
https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Kumar updated SOLR-15023:

Description: 
Hi Team,

We are facing a "Connection to Solr lost" error on the Solr admin page. While 
debugging, we found an issue with admin/metrics API.

*Detail Analysis:*
 We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the 
"cloud" page on the Solr admin UI it results in a "Connection to Solr lost" 
error within few seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost error. On the other hand, we tried to hit the same query 
separately on a browser which still takes *20K ms* but we get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a long time and finally fails to load the response. As a result, 
*concurrent multiple calls were made for the same API* and throws "Connection 
to Solr lost"

We tried a few ways to disable this API call and to increase the timeout, but 
nothing works. 

We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
error msg from Solr logs. 

  was:
Hi Team,

We are facing "connection lost error" in Solr admin page. While debugging, we 
found an issue with admin/metrics API.

*Detail Analysis:*
 We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" 
page on the Solr admin UI it results in "Connection to Solr lost" error in few 
seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost. On the other hand, I tried to hit the same query separately on a 
browser which still taking *20K ms* but I get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a too long time and finally fails to load the response. As a 
result, concurrent multiple calls were made for the same API and throws 
"Connection to Solr lost"

We tried a few ways to disable this API call or to increase the timeout, but 
nothing works. 

I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
error msg from Solr logs:


> Timeout Issue with Solr Metrics API
> ---
>
> Key: SOLR-15023
> URL: https://issues.apache.org/jira/browse/SOLR-15023
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, metrics
>Affects Versions: 8.2
>Reporter: Dinesh Kumar
>Priority: Major
> Attachments: Error.pdf
>
>
> Hi Team,
> We are facing a "Connection to Solr lost" error on the Solr admin page. While 
> debugging, we found an issue with admin/metrics API.
> *Detail Analysis:*
>  We have around 200+ collections on a Solr cloud cluster which is having 4 
> Solr nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the 
> "cloud" page on the Solr admin UI it results in a "Connection to Solr lost" 
> error within few seconds.
> When we tried to debug the same, we found Solr admin/metrics API that is 
> called internally is taking *20K ms* and leads to time out which results in 
> Connection to Solr lost error. On the other hand, we tried to hit the same 
> query separately on a browser which still takes *20K ms* but we get a proper 
> response.
> When the admin/metrics API call is happening from the admin console, the 
> first call is taking a long time and finally fails to load the response. As a 
> result, *concurrent multiple calls were made for the same API* and throws 
> "Connection to Solr lost"
> We tried a few ways to disable this API call and to increase the timeout, but 
> nothing works. 
> We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
> error msg from Solr logs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API



 [ 
https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Kumar updated SOLR-15023:

Description: 
Hi Team,

We are facing a "Connection to Solr lost" error on the Solr admin page. While 
debugging, we found an issue with the admin/metrics API.

*Detail Analysis:*
 We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the 
"cloud" page on the Solr admin UI it results in a "Connection to Solr lost" 
error within few seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost error. On the other hand, we tried to hit the same query 
separately on a browser which still takes *20K ms* but we get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a long time and finally fails to load the response. As a result, 
*concurrent multiple calls were made for the same API* and throws "Connection 
to Solr lost"

We tried a few ways to disable this API call and to increase the timeout, but 
nothing works. 

We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
error msg from Solr logs. 

  was:
Hi Team,

We are facing a "Connection to Solr lost" error on the Solr admin page. While 
debugging, we found an issue with admin/metrics API.

*Detail Analysis:*
 We have around 200+ collections on a Solr cloud cluster which is having 4 Solr 
nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the 
"cloud" page on the Solr admin UI it results in a "Connection to Solr lost" 
error within few seconds.

When we tried to debug the same, we found Solr admin/metrics API that is called 
internally is taking *20K ms* and leads to time out which results in Connection 
to Solr lost error. On the other hand, we tried to hit the same query 
separately on a browser which still takes *20K ms* but we get a proper response.

When the admin/metrics API call is happening from the admin console, the first 
call is taking a long time and finally fails to load the response. As a result, 
*concurrent multiple calls were made for the same API* and throws "Connection 
to Solr lost"

We tried a few ways to disable this API call and to increase the timeout, but 
nothing works. 

We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
error msg from Solr logs. 


> Timeout Issue with Solr Metrics API
> ---
>
> Key: SOLR-15023
> URL: https://issues.apache.org/jira/browse/SOLR-15023
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI, metrics
>Affects Versions: 8.2
>Reporter: Dinesh Kumar
>Priority: Major
> Attachments: Error.pdf
>
>
> Hi Team,
> We are facing a "Connection to Solr lost" error on the Solr admin page. While 
> debugging, we found an issue with the admin/metrics API.
> *Detail Analysis:*
>  We have around 200+ collections on a Solr cloud cluster which is having 4 
> Solr nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the 
> "cloud" page on the Solr admin UI it results in a "Connection to Solr lost" 
> error within few seconds.
> When we tried to debug the same, we found Solr admin/metrics API that is 
> called internally is taking *20K ms* and leads to time out which results in 
> Connection to Solr lost error. On the other hand, we tried to hit the same 
> query separately on a browser which still takes *20K ms* but we get a proper 
> response.
> When the admin/metrics API call is happening from the admin console, the 
> first call is taking a long time and finally fails to load the response. As a 
> result, *concurrent multiple calls were made for the same API* and throws 
> "Connection to Solr lost"
> We tried a few ways to disable this API call and to increase the timeout, but 
> nothing works. 
> We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the 
> error msg from Solr logs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-9812) Implement a /admin/metrics API



[ 
https://issues.apache.org/jira/browse/SOLR-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242363#comment-17242363
 ] 

Dinesh Kumar commented on SOLR-9812:


Team,

We are facing issues with the Solr admin/metrics API. Can you please help with 
this issue : SOLR-15023 - https://issues.apache.org/jira/browse/SOLR-15023

 

> Implement a /admin/metrics API
> --
>
> Key: SOLR-9812
> URL: https://issues.apache.org/jira/browse/SOLR-9812
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 6.4, 7.0
>
> Attachments: SOLR-9812.patch, SOLR-9812.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We added a bare bones metrics API in SOLR-9788 but due to limitations with 
> the metrics servlet supplied by the metrics library, it can show statistics 
> from only one metric registry. SOLR-4735 has added a hierarchy of metric 
> registries and the /admin/metrics API should support showing all of them as 
> well as be able to filter metrics from a given registry name.
> In this issue we will implement the improved /admin/metrics API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-9812) Implement a /admin/metrics API



[ 
https://issues.apache.org/jira/browse/SOLR-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242363#comment-17242363
 ] 

Dinesh Kumar edited comment on SOLR-9812 at 12/2/20, 1:33 PM:
--

Team,

We are facing issues with the Solr admin/metrics API. Can you please help with 
this issue : SOLR-15023

 


was (Author: dineshkumark):
Team,

We are facing issues with the Solr admin/metrics API. Can you please help with 
this issue : SOLR-15023 - https://issues.apache.org/jira/browse/SOLR-15023

 

> Implement a /admin/metrics API
> --
>
> Key: SOLR-9812
> URL: https://issues.apache.org/jira/browse/SOLR-9812
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 6.4, 7.0
>
> Attachments: SOLR-9812.patch, SOLR-9812.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We added a bare bones metrics API in SOLR-9788 but due to limitations with 
> the metrics servlet supplied by the metrics library, it can show statistics 
> from only one metric registry. SOLR-4735 has added a hierarchy of metric 
> registries and the /admin/metrics API should support showing all of them as 
> well as be able to filter metrics from a given registry name.
> In this issue we will implement the improved /admin/metrics API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian



iverase commented on pull request #2094:
URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-737236259


   Thanks @dweiss! I think your approach is potentially more efficient but 
harder to make it to a state where you have everything working. I am currently 
taking a different approach by increasing the version number on the codec 
files. Therefore the writers should be  mostly untouched and only the readers 
should wrap the IndexInput when the version is lower that the current one.
   
   In most of the cases the real change on the codec is a one-liner. 
Unfortunately I need to do some refactor and therefore the patch is bigger. I 
opened an issue to do the refactor on the side as I think it is valuable even 
if this PR does not succeed.
   
   The only issue left is the PackedInts algorithms as I think the need to be 
adapted. I have done that already for the DirectWriter.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



murblanc commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534205062



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -896,6 +900,9 @@ public void load() {
   
containerHandlers.getApiBag().registerObject(containerPluginsApi.readAPI);
   
containerHandlers.getApiBag().registerObject(containerPluginsApi.editAPI);
 
+  // get the placement plugin

Review comment:
   I'd rather comment "get the placement plugin **factory**"
   And possibly specify what's the plugin factory lifecycle wrt configuration? 
Now that everything is a bit more implicit than it previously was, I don't get 
(yet...) when exactly the configuration is passed. I believe comments related 
to this would be useful.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



murblanc commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534209824



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/PlacementPluginFactory.java
##
@@ -18,14 +18,22 @@
 package org.apache.solr.cluster.placement;
 
 /**
- * Factory implemented by client code and configured in {@code solr.xml} 
allowing the creation of instances of
+ * Factory implemented by client code and configured in container plugins 
allowing the creation of instances of
  * {@link PlacementPlugin} to be used for replica placement computation.
+ * Note: configurable factory implementations should also implement
+ * {@link org.apache.solr.api.ConfigurablePlugin} with the appropriate 
configuration
+ * bean type.
  */
 public interface PlacementPluginFactory {

Review comment:
   Shouldn't this interface extend `ConfigurablePlugin` so that concrete config classes such as `AffinityPlacementFactory` 
do not have to implement multiple interfaces?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



murblanc commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534210911



##
File path: 
solr/core/src/java/org/apache/solr/cluster/placement/PlacementPluginFactory.java
##
@@ -18,14 +18,22 @@
 package org.apache.solr.cluster.placement;
 
 /**
- * Factory implemented by client code and configured in {@code solr.xml} 
allowing the creation of instances of
+ * Factory implemented by client code and configured in container plugins 
allowing the creation of instances of
  * {@link PlacementPlugin} to be used for replica placement computation.
+ * Note: configurable factory implementations should also implement
+ * {@link org.apache.solr.api.ConfigurablePlugin} with the appropriate 
configuration
+ * bean type.
  */
 public interface PlacementPluginFactory {

Review comment:
   That would force every placement plugin to be configurable though I 
guess...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



murblanc commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534219720



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -257,6 +260,7 @@ public CoreLoadFailure(CoreDescriptor cd, Exception 
loadFailure) {
 
   // initially these are the same to collect the plugin-based listeners during 
init
   private ClusterEventProducer clusterEventProducer;
+  private PlacementPluginFactory placementPluginFactory;

Review comment:
   I believe we have a synchronization issue here on access to that 
variable. It is not `final` nor `volatile`, access is not synchronized but it 
is accessed from multiple threads (the command execution Overseer threads 
calling `getPlacementPluginFactory()`).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sigram commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



sigram commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534234770



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -257,6 +260,7 @@ public CoreLoadFailure(CoreDescriptor cd, Exception 
loadFailure) {
 
   // initially these are the same to collect the plugin-based listeners during 
init
   private ClusterEventProducer clusterEventProducer;
+  private PlacementPluginFactory placementPluginFactory;

Review comment:
   Since we always use `DelegatingPlacementPluginFactory` I'm going to 
create a final instance of this wrapper - and then it will be initialized once 
the plugins registry is ready.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418
 ] 

Feng Guo commented on LUCENE-9629:
--

Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*

 

 
 
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
org.apache.lucene.store.ForUtil.collapse8
org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
15.3%
8.8%|41.2%
14.8%
3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change:)

 

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/2/20, 3:05 PM:


Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change:)

 


was (Author: gf2121):
Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*

 

 
 
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
org.apache.lucene.store.ForUtil.collapse8
org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
15.3%
8.8%|41.2%
14.8%
3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change:)

 

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



murblanc commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534255115



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/AddReplicaCmd.java
##
@@ -144,7 +140,8 @@ public void call(ClusterState state, ZkNodeProps message, 
@SuppressWarnings({"ra
   }
 }
 
-List createReplicas = 
buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, 
replicaTypesVsCount)
+List createReplicas = 
buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, 
replicaTypesVsCount,
+
ocmh.overseer.getCoreContainer().getPlacementPluginFactory().createPluginInstance())

Review comment:
   Unclear to me: what happens here when placement plugins are not 
configured and we use for example the legacy assign strategy?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



murblanc commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534259966



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/AddReplicaCmd.java
##
@@ -144,7 +140,8 @@ public void call(ClusterState state, ZkNodeProps message, 
@SuppressWarnings({"ra
   }
 }
 
-List createReplicas = 
buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, 
replicaTypesVsCount)
+List createReplicas = 
buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, 
replicaTypesVsCount,
+
ocmh.overseer.getCoreContainer().getPlacementPluginFactory().createPluginInstance())

Review comment:
   Ok, I see that's how we tell which one is configured by this value being 
`null`... Can't say I really like it, but at least it should be commented (here 
and in `PlacementPluginFactory` as well?).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/2/20, 3:34 PM:


Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change

In addition, i'm a bit poor in english speaking and most of words above come 
from translate programs. if there are any word offending you, please just 
ignore it. i really admire this amazing project and just try to do my best to 
make it better:)

 


was (Author: gf2121):
Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change:)

 

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sigram commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs



sigram commented on a change in pull request #2101:
URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534293404



##
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/AddReplicaCmd.java
##
@@ -144,7 +140,8 @@ public void call(ClusterState state, ZkNodeProps message, 
@SuppressWarnings({"ra
   }
 }
 
-List createReplicas = 
buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, 
replicaTypesVsCount)
+List createReplicas = 
buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, 
replicaTypesVsCount,
+
ocmh.overseer.getCoreContainer().getPlacementPluginFactory().createPluginInstance())

Review comment:
   The plugin is null and `Assign.createAssignStrategy` provides a 
`LegacyAssignStrategy`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2114: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead ~ Backport to 8x

2020-12-02 Thread ASF subversion and git services (Jira)

thelabdude opened a new pull request #2114:
URL: https://github.com/apache/lucene-solr/pull/2114

# Description

This is the backport to 8x from master, original PR was #2010
See JIRA for description of the issue:
https://issues.apache.org/jira/browse/SOLR-12182

# Solution

This PR computes the `base_url` for a Replica using the stored `node_name`
and a global `urlScheme` rather than storing the `base_url` in `state.json`.
This avoids storing an incorrect URL scheme for replicas in persistent storage.
The `base_url` is computed when read back from ZK and dropped when marshaling
the Replica state to JSON. This also means we don't need a migration tool as
stored state is "healed" on-the-fly when read back from ZK.

The unfortunate aspect of this PR is we need to keep the URL scheme for the
cluster in a global variable (so that it is available when reading from ZK).
The global `urlScheme` still comes from the cluster property but is then stored
in a global singleton, see: `org.apache.solr.common.cloud.UrlScheme`.
Alternatively, we could just keep the `urlScheme` in a static in ZkStateReader,
I felt the global singleton `UrlScheme.INSTANCE` made it clearer that this was
a global thing but it also made more sense with my first implementation that
tried to make rolling restart upgrades to TLS less chaotic. It's a trivial
change to move all this over to ZkStateReader and remove UrlScheme.

I initially tried setting a `ThreadLocal` that gives access to the
`urlScheme` whenever we need to read these props from ZK. However, that ended
up being problematic because we tend to read ZkNodeProps from ZK in many
places. In reality, the `urlScheme` really is an immutable global variable that
should be set once during initialization by reading from the cluster property
stored in ZK. So I felt trying to treat this global as something that was
highly dynamic made the code overly cumbersome. Put simply, we shouldn't
support `urlScheme` changing in a live node after initialization, it's bad for
business.

I also tried to get rid of the `urlScheme` cluster property (re:
https://issues.apache.org/jira/browse/SOLR-10202) but I'm not sure how
SolrCloud client applications can resolve the correct `urlScheme` for the
cluster without this property? On the server-side, sure we can just get the
`urlScheme` from a Java System Property, but that won't be set for remote
client applications that initialize via a connection to ZooKeeper. So I'm
keeping the cluster property `urlScheme` for now.

We also need to consider how to enable TLS on an existing cluster (with
active collections) using a rolling restart process. The current
`org.apache.solr.cloud.SSLMigrationTest` just stopped all test nodes at once
and then brought them back with TLS enabled.

Based on feedback, I've since removed the option to pull the active
urlScheme from live nodes as we're not able to ensure zero-downtime when moving
from `http` -> `https` for clusters with existing collections and live traffic.
Put simply, the feature was a bit trappy in that it tried to reduce chaos when
doing a rolling restart to enable TLS, but it made no guarantees. Thus, users
just need to be sure to enable TLS before building production clusters!

Lastly, I've tried to clean-up some of the places that access the baseUrl on
replicas to be more consistent, so you'll see some of that in this PR as well.

# Tests

Many existing tests cover regression caused by these code changes. Added
simple unit test for UrlScheme.

# Checklist

Please review the following and check all that apply:

- [x] I have reviewed the guidelines for [How to
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms
to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request
title.
- [x] I have given Solr maintainers
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the `master` branch.
- [x] I have run `./gradlew check`.
- [x] I have added tests for my changes.
- [ ] I have added documentation for the [Ref
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide)
(for Solr changes only).

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.o

[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers



[ 
https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242546#comment-17242546
 ] 

ASF subversion and git services commented on SOLR-14934:


Commit 2e6a02394ec4eea6ba72d5bc2bf02c0139a54f39 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2e6a023 ]

SOLR-14934: Refactored duplicate "Solr Home" logic into a single place to 
eliminate risk of tests using divergent values for a single solr node.


> Multiple Code Paths for determining "solr home" can return differnet answers
> 
>
> Key: SOLR-14934
> URL: https://issues.apache.org/jira/browse/SOLR-14934
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Minor
> Attachments: SOLR-14934.poc.patch
>
>
> While looking into some possible ways to make our tests more closely match 
> "real" solr installs, I realized that we currently have 2 different methods 
> for determining the "solr home" for a node...
>  * {{SolrPaths.locateSolrHome()}}
>  ** static method that uses a hueristic that typically results in using 
> {{System.getProperty("solr.solr.home");}}
>  *** NOTE: the result is not stored in any static/final variables
>  ** this method
>  * {{SolrDispatchFilter}}
>  ** starts by checking if an explicit {{ServletContext}} attribute is 
> specified
>  *** falls back to using {{SolrPaths.locateSolrHome()}}
>  ** whatever value is found gets set on {{CoreContainer}}
> In a typical Solr install, the {{"solr.solr.home"}} system property is set by 
> {{bin/solr}} and we get a consistent value for the life of the server 
> instance regardless of code path.
> In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that 
> calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests 
> (including {{MiniSolrCloudCluster}} based tests) we rely on the 
> {{ServletContext}} attribute based approach to have a unique "Solr Home" for 
> each node. ({{JettySOlrRunner}} injects the value when wiring up the 
> {{Server}} instance)
> This means that:
>  * in jetty based test - even if it's a single jetty instance - each of the 
> node's CoreContainer has a unique value of "solr home", but any code paths in 
> solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent 
> value across all nodes (different from the value in the CoreContainer for any 
> node)
>  * allthough i don't think it happens now: a test could call 
> {{System.setProperty("solr.solr.home",...)}} while a node is running, and 
> potentially get inconsistent behavior from even a  jetty node over time.
> 
> In practice, I don't think that any of this is currently causing "real bugs" 
> in actual solr code; nor do i _think_ we're seeing any "false positives" or 
> "false failures" in tests as a result of this - but it is a big huge land 
> mine just waiting to go off if we step too close, and i think we should 
> recitfy this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude commented on pull request #2114: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead ~ Backport to 8x

2020-12-02 Thread ASF subversion and git services (Jira)



thelabdude commented on pull request #2114:
URL: https://github.com/apache/lucene-solr/pull/2114#issuecomment-737397592


   precommit and solr tests pass locally so will merge and watch for CI failures



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster



[ 
https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242579#comment-17242579
 ] 

ASF subversion and git services commented on SOLR-12182:


Commit 6af56e141a52f4e616985ca5b03dda3677889bfa in lucene-solr's branch 
refs/heads/branch_8x from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6af56e1 ]

SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute 
from node_name instead ~ Backport to 8x (#2114)



> Can not switch urlScheme in 7x if there are any cores in the cluster
> 
>
> Key: SOLR-12182
> URL: https://issues.apache.org/jira/browse/SOLR-12182
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0, 7.1, 7.2
>Reporter: Anshum Gupta
>Assignee: Timothy Potter
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I was trying to enable TLS on a cluster that was already in use i.e. had 
> existing collections and ended up with down cores, that wouldn't come up and 
> the following core init errors in the logs:
> *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> replica with coreNodeName core_node4 exists but with a different name or 
> base_url.*
> What is happening here is that the core/replica is defined in the 
> clusterstate with the urlScheme as part of it's base URL e.g. 
> *"base_url":"http:hostname:port/solr"*.
> Switching the urlScheme in Solr breaks this convention as the host now uses 
> HTTPS instead.
> Actually, I ran into this with an older version because I was running with 
> *legacyCloud=false* and then realized that we switched that to the default 
> behavior only in 7x i.e while most users did not hit this issue with older 
> versions, unless they overrode the legacyCloud value explicitly, users 
> running 7x are bound to run into this more often.
> Switching the value of legacyCloud to true, bouncing the cluster so that the 
> clusterstate gets flushed, and then setting it back to false is a workaround 
> but a bit risky one if you don't know if you have any old cores lying around.
> Ideally, I think we shouldn't prepend the urlScheme to the base_url value and 
> use the urlScheme on the fly to construct it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9628) Make sure to account for ScoreMode.TOP_DOCS in queries

2020-12-02 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242585#comment-17242585
 ] 

Adrien Grand commented on LUCENE-9628:
--

bq. In BooleanWeight#bulkScorer, we check if score mode is TOP_SCORES and if 
so, force non-bulk scoring. Should we expand this to include modes like 
TOP_DOCS?

I think so. DefaultBulkScorer seems to be the only bulk scorer which knows how 
to deal with {{LeafCollector#competitiveIterator}}, so we seem to be disabling 
the numeric sort optimization with boolean queries today? Let's switch to 
ScoreMode#isExhaustive? What do you think [~mayyas]?

bq. In ConstantScoreQuery, we create the delegate weight with a hardcoded 
COMPLETE_NO_SCORES. I'm not sure it actually causes problems, but it seems like 
this doesn't handle TOP_DOCS correctly.

I suspect that this could be a problem if the wrapped query uses the ScoreMode 
as an indication of whether it will need to handle 
{{LeafCollector#competitiveIterator}} or not, which seems to be something we'd 
like to do for boolean queries since BS1 (BooleanScorer) only really makes 
sense if we know we're going to collect all matches.

I think it'd be helpful if we improved ScoreMode javadocs to be more explicit 
regarding the expectations we have on scorers. TOP_SCORES mentions the 
relationship with {{Scorer#setMinCompetitiveScore}}, we should add something 
similar to TOP_DOCS and TOP_DOCS_WITH_SCORES regarding bulk scorers and 
{{LeafCollector#competitiveIterator}}?

> Make sure to account for ScoreMode.TOP_DOCS in queries
> --
>
> Key: LUCENE-9628
> URL: https://issues.apache.org/jira/browse/LUCENE-9628
> Project: Lucene - Core
>  Issue Type: Test
>  Components: core/search
>Reporter: Julie Tibshirani
>Priority: Minor
>
> I noticed a few places where we are directly check the {{ScoreMode}} type 
> that should perhaps be generalized. These could affect whether numeric sort 
> optimization is applied:
>  * In {{BooleanWeight#bulkScorer}}, we check if score mode is {{TOP_SCORES}} 
> and if so, force non-bulk scoring. Should we expand this to include modes 
> like {{TOP_DOCS}}?
>  * In {{ConstantScoreQuery}}, we create the delegate weight with a hardcoded 
> {{COMPLETE_NO_SCORES}}. I'm not sure it actually causes problems, but it 
> seems like this doesn't handle {{TOP_DOCS}} correctly.
> Apologies this issue isn’t more precise – I am not up-to-speed on the numeric 
> sort optimization but wanted to raise these in case they’re helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15024) Admin UI doesnt' show CharFilters correctly

2020-12-02 Thread Erick Erickson (Jira)

Erick Erickson created SOLR-15024:
-

 Summary: Admin UI doesnt' show CharFilters correctly
 Key: SOLR-15024
 URL: https://issues.apache.org/jira/browse/SOLR-15024
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI
Affects Versions: master (9.0)
Reporter: Erick Erickson
 Attachments: Screen Shot 2020-12-02 at 1.19.23 PM.png, Screen Shot 
2020-12-02 at 1.19.49 PM.png

Brought up on the user's list, I verified it on trunk. The Admin UI isn't 
showing the data correctly for either the schema page or the analysis page:

Here's the fieldType definition:
{code:java}
  









{code}
The transformations are correct, it's just that the display is messed up. See 
attached.

On the analysis page, nothing is shown for the CharFilters. For the schema 
page, only the _last_ CharFilter is shown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster



 [ 
https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-12182:
--
Fix Version/s: 8.8

> Can not switch urlScheme in 7x if there are any cores in the cluster
> 
>
> Key: SOLR-12182
> URL: https://issues.apache.org/jira/browse/SOLR-12182
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0, 7.1, 7.2
>Reporter: Anshum Gupta
>Assignee: Timothy Potter
>Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I was trying to enable TLS on a cluster that was already in use i.e. had 
> existing collections and ended up with down cores, that wouldn't come up and 
> the following core init errors in the logs:
> *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> replica with coreNodeName core_node4 exists but with a different name or 
> base_url.*
> What is happening here is that the core/replica is defined in the 
> clusterstate with the urlScheme as part of it's base URL e.g. 
> *"base_url":"http:hostname:port/solr"*.
> Switching the urlScheme in Solr breaks this convention as the host now uses 
> HTTPS instead.
> Actually, I ran into this with an older version because I was running with 
> *legacyCloud=false* and then realized that we switched that to the default 
> behavior only in 7x i.e while most users did not hit this issue with older 
> versions, unless they overrode the legacyCloud value explicitly, users 
> running 7x are bound to run into this more often.
> Switching the value of legacyCloud to true, bouncing the cluster so that the 
> clusterstate gets flushed, and then setting it back to false is a workaround 
> but a bit risky one if you don't know if you have any old cores lying around.
> Ideally, I think we shouldn't prepend the urlScheme to the base_url value and 
> use the urlScheme on the fly to construct it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster



 [ 
https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter resolved SOLR-12182.
---
Resolution: Fixed

As of 8.8, we opt'd to not store the `base_url` in persisted state in ZK. 

> Can not switch urlScheme in 7x if there are any cores in the cluster
> 
>
> Key: SOLR-12182
> URL: https://issues.apache.org/jira/browse/SOLR-12182
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0, 7.1, 7.2
>Reporter: Anshum Gupta
>Assignee: Timothy Potter
>Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I was trying to enable TLS on a cluster that was already in use i.e. had 
> existing collections and ended up with down cores, that wouldn't come up and 
> the following core init errors in the logs:
> *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> replica with coreNodeName core_node4 exists but with a different name or 
> base_url.*
> What is happening here is that the core/replica is defined in the 
> clusterstate with the urlScheme as part of it's base URL e.g. 
> *"base_url":"http:hostname:port/solr"*.
> Switching the urlScheme in Solr breaks this convention as the host now uses 
> HTTPS instead.
> Actually, I ran into this with an older version because I was running with 
> *legacyCloud=false* and then realized that we switched that to the default 
> behavior only in 7x i.e while most users did not hit this issue with older 
> versions, unless they overrode the legacyCloud value explicitly, users 
> running 7x are bound to run into this more often.
> Switching the value of legacyCloud to true, bouncing the cluster so that the 
> clusterstate gets flushed, and then setting it back to false is a workaround 
> but a bit risky one if you don't know if you have any old cores lying around.
> Ideally, I think we shouldn't prepend the urlScheme to the base_url value and 
> use the urlScheme on the fly to construct it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-10202) Auto resolve urlScheme, remove cluster property



 [ 
https://issues.apache.org/jira/browse/SOLR-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-10202:
-

Assignee: (was: Timothy Potter)

> Auto resolve urlScheme, remove cluster property
> ---
>
> Key: SOLR-10202
> URL: https://issues.apache.org/jira/browse/SOLR-10202
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-9640.
> Today we need to explicitly set {{urlScheme}} cluster property to enable SSL, 
> at the same time as we need to set all the SSL env variables on each node. As 
> discussed in SOLR-9640, we could be smarter about this so an admin only need 
> to setup {{solr.in.sh}} with keystore to enable SSL.
> h3. How
> Perhaps simplified a bit, but in principle, at node start, if 
> {{solr.jetty.keystore}} (one out of several possiilities) is defined then use 
> https, else http :-) Then, if the administrator has mixed it up and failed to 
> configure {{solr.jetty.keystore}} on one of the nodes, then that node will 
> not be able to communicate with the others over {{http}}, it will get {{curl: 
> (52) Empty reply from server}}. Opposite, an SSL enabled node trying to talk 
> to a Solr node that is not SSL enabled over {{https}}, will get {{curl: (35) 
> Unknown SSL protocol error in connection to localhost:-9847}} (not the curl 
> error of course, but similar).
> I don't think the nodes need to tell ZK about SSL at all?
> So my claim is that this will not give bigger risk of misconfiguration, cause 
> if you add a new node to the cluster without SSL, it will generate a lot of 
> BUZZ in the logs and it will never receive any unencrypted data from the 
> other nodes since connections will fail. Agree?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-10202) Auto resolve urlScheme, remove cluster property



[ 
https://issues.apache.org/jira/browse/SOLR-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242596#comment-17242596
 ] 

Timothy Potter commented on SOLR-10202:
---

Un-assigned myself on this one as I originally wanted to tackle it as part of 
SOLR-12182, but that ended up not working out. Personally I'm fine with the 
first server to come up in https mode to set the cluster property based on a 
system property but it seems like that isn't the consensus on how this global 
should be treated.

> Auto resolve urlScheme, remove cluster property
> ---
>
> Key: SOLR-10202
> URL: https://issues.apache.org/jira/browse/SOLR-10202
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-9640.
> Today we need to explicitly set {{urlScheme}} cluster property to enable SSL, 
> at the same time as we need to set all the SSL env variables on each node. As 
> discussed in SOLR-9640, we could be smarter about this so an admin only need 
> to setup {{solr.in.sh}} with keystore to enable SSL.
> h3. How
> Perhaps simplified a bit, but in principle, at node start, if 
> {{solr.jetty.keystore}} (one out of several possiilities) is defined then use 
> https, else http :-) Then, if the administrator has mixed it up and failed to 
> configure {{solr.jetty.keystore}} on one of the nodes, then that node will 
> not be able to communicate with the others over {{http}}, it will get {{curl: 
> (52) Empty reply from server}}. Opposite, an SSL enabled node trying to talk 
> to a Solr node that is not SSL enabled over {{https}}, will get {{curl: (35) 
> Unknown SSL protocol error in connection to localhost:-9847}} (not the curl 
> error of course, but similar).
> I don't think the nodes need to tell ZK about SSL at all?
> So my claim is that this will not give bigger risk of misconfiguration, cause 
> if you add a new node to the cluster without SSL, it will generate a lot of 
> BUZZ in the logs and it will never receive any unencrypted data from the 
> other nodes since connections will fail. Agree?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9619) Move Points from a visitor API to a custor-style API?

2020-12-02 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242609#comment-17242609
 ] 

Adrien Grand commented on LUCENE-9619:
--

Thanks for the feedback [~ivera]!

I initially wanted to remove the visitor pattern entirely, but this made it 
challenging to retain some optimizations we have like the one we have to do 
only one comparison in case multiple documents share the same value: 
https://github.com/apache/lucene-solr/blob/af47cb7bcdd4eb10263a0586474c6e255307/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L219-L224.

As far as implementing a DocIdSetIterator is concerned, my thinking was that 
this API could be use to fill an int[] buffer only one leaf at a time, so we 
wouldn't allocate more than an int[512] with the current Points file format. 
This wouldn't provide skipping capabilities, but at least we wouldn't need to 
maintain a giant int[] or BitSet.

> Move Points from a visitor API to a custor-style API?
> -
>
> Key: LUCENE-9619
> URL: https://issues.apache.org/jira/browse/LUCENE-9619
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Points' visitor API work well but there are a couple things we could make 
> better if we moved to a cursor API, e.g.
>  - Term queries could return a DocIdSetIterator without having to materialize 
> a BitSet.
>  - Nearest-neighbor search could work on top of the regular API instead of 
> casting to BKDReader 
> https://github.com/apache/lucene-solr/blob/6a7131ee246d700c2436a85ddc537575de2aeacf/lucene/sandbox/src/java/org/apache/lucene/sandbox/document/FloatPointNearestNeighbor.java#L296
>  - We could optimize counting the number of matches of a query by adding the 
> number of points in a leaf without visiting documents where there are no 
> deleted documents and a leaf fully matches the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15025) MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value

Mike Drob created SOLR-15025:


 Summary: MiniSolrCloudCluster.waitForAllNodes ignores passed 
timeout value
 Key: SOLR-15025
 URL: https://issues.apache.org/jira/browse/SOLR-15025
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Tests
Reporter: Mike Drob


the api could also expand to take a time unit?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14940) ReplicationHandler memory leak through SolrCore.closeHooks



 [ 
https://issues.apache.org/jira/browse/SOLR-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-14940.
--
Resolution: Fixed

re-resolving in favor of tackling it in SOLR-14992

> ReplicationHandler memory leak through SolrCore.closeHooks
> --
>
> Key: SOLR-14940
> URL: https://issues.apache.org/jira/browse/SOLR-14940
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
> Environment: Solr Cloud Cluster on v.8.6.2 configured as 3 TLOG nodes 
> with 2 cores in each JVM.
>  
>Reporter: Anver Sotnikov
>Assignee: Mike Drob
>Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: Actual references to hooks that in turn hold references 
> to ReplicationHandlers.png, Memory Analyzer SolrCore.closeHooks .png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We are experiencing a memory leak in Solr Cloud cluster configured as 3 TLOG 
> nodes.
> Leader does not seem to be affected while Followers are.
>  
> Looking at memory dump we noticed that SolrCore holds lots of references to 
> ReplicationHandler through anonymous inner classes in SolrCore.closeHooks, 
> which in turn holds ReplicationHandlers.
> ReplicationHandler registers hooks as anonymous inner classes in 
> SolrCore.closeHooks through ReplicationHandler.inform() -> 
> ReplicationHandler.registerCloseHook().
>  
> Whenever ZkController.stopReplicationFromLeader is called - it would shutdown 
> ReplicationHandler (ReplicationHandler.shutdown()), BUT reference to 
> ReplicationHandler will stay in SolrCore.closeHooks. Once replication is 
> started again on same SolrCore - new ReplicationHandler will be created and 
> registered in closeHooks.
>  
> It looks like there are few scenarios when replication is stopped and 
> restarted on same core and in our TLOG setup it shows up quite often.
>  
> Potential solutions:
>  # Allow unregistering SolrCore.closeHooks so it can be used from 
> ReplicationHandler.shutdown
>  # Hack but easier - break the link between ReplicationHandler close hooks 
> and full ReplicationHandler object so ReplicationHandler can be GCed even 
> when hooks are still registered in SolrCore.closeHooks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob opened a new pull request #2115: SOLR-14992 Wait for node down before checking for node up



madrob opened a new pull request #2115:
URL: https://github.com/apache/lucene-solr/pull/2115


   https://issues.apache.org/jira/browse/SOLR-14992



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures



 [ 
https://issues.apache.org/jira/browse/SOLR-14992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob reassigned SOLR-14992:


Assignee: Mike Drob

> TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures
> --
>
> Key: SOLR-14992
> URL: https://issues.apache.org/jira/browse/SOLR-14992
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Mike Drob
>Priority: Minor
>
> I've noticed this test started failing very frequently with an error like:
> {noformat}
> Error Message:
> Error from server at http://127.0.0.1:39037/solr: Cannot create collection 
> pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
> maxShardsPerNode is 1, and the number of nodes currently live or live and 
> part of your createNodeSet is 3. This allows a maximum of 3 to be created. 
> Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 
> 0 and value of pullReplicas is 1. This requires 4 shards to be created 
> (higher than the allowed number)
> Stack Trace:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://127.0.0.1:39037/solr: Cannot create collection 
> pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
> maxShardsPerNode is 1, and the number of nodes currently live or live and 
> part of your createNodeSet is 3. This allows a maximum of 3 to be created. 
> Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 
> 0 and value of pullReplicas is 1. This requires 4 shards to be created 
> (higher than the allowed number)
> at 
> __randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
> at 
> org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakC

[jira] [Commented] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures



[ 
https://issues.apache.org/jira/browse/SOLR-14992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242626#comment-17242626
 ] 

Mike Drob commented on SOLR-14992:
--

Ok, I think I figured this out, the PR I opened is for master but patch should 
apply to 8x as well. I beasted it and was able to get failures from 5% to 0% on 
my machine.

> TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures
> --
>
> Key: SOLR-14992
> URL: https://issues.apache.org/jira/browse/SOLR-14992
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Mike Drob
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've noticed this test started failing very frequently with an error like:
> {noformat}
> Error Message:
> Error from server at http://127.0.0.1:39037/solr: Cannot create collection 
> pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
> maxShardsPerNode is 1, and the number of nodes currently live or live and 
> part of your createNodeSet is 3. This allows a maximum of 3 to be created. 
> Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 
> 0 and value of pullReplicas is 1. This requires 4 shards to be created 
> (higher than the allowed number)
> Stack Trace:
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://127.0.0.1:39037/solr: Cannot create collection 
> pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of 
> maxShardsPerNode is 1, and the number of nodes currently live or live and 
> part of your createNodeSet is 3. This allows a maximum of 3 to be created. 
> Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is 
> 0 and value of pullReplicas is 1. This requires 4 shards to be created 
> (higher than the allowed number)
> at 
> __randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
> at 
> org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at 
> com.carrotsearch.ra

[jira] [Created] (SOLR-15026) MiniSolrCloudCluster can inconsistently get confused about when it's using SSL

2020-12-02 Thread ASF subversion and git services (Jira)

Chris M. Hostetter created SOLR-15026:
-

 Summary: MiniSolrCloudCluster can inconsistently get confused 
about when it's using SSL
 Key: SOLR-15026
 URL: https://issues.apache.org/jira/browse/SOLR-15026
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter


A new test added in SOLR-14934 caused the following reproducible failure to pop 
up on jenkins...

{noformat}
hossman@slate:~/lucene/dev [j11] [master] $ ./gradlew -p solr/test-framework/ 
test --tests MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders 
-Dtests.seed=806A85748BD81F48 -Dtests.multiplier=2 -Dtests.slow=true 
-Dtests.locale=ln-CG -Dtests.timezone=Asia/Thimbu -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
Starting a Gradle Daemon (subsequent builds will be faster)

> Task :randomizationInfo
Running tests with randomization seed: tests.seed=806A85748BD81F48

> Task :solr:test-framework:test

org.apache.solr.cloud.MiniSolrCloudClusterTest > testSolrHomeAndResourceLoaders 
FAILED
org.apache.solr.client.solrj.SolrServerException: IOException occurred when 
talking to server at: https://127.0.0.1:38681/solr
at 
__randomizedtesting.SeedInfo.seed([806A85748BD81F48:37548FA7602CB5FD]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:712)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
at 
org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:390)
at 
org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:360)
at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1168)
at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:931)
at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:865)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:246)
at 
org.apache.solr.cloud.MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders(MiniSolrCloudClusterTest.java:125)
...
Caused by:
javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
at 
java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439)
{noformat}

The problem sems to be that even though the MiniSolrCloudCluster being 
instantiated isn't _intentionally_ using any SSL randomization (it just uses 
{{JettyConfig.builder().build()}} the CloudSolrClient returned by 
{{cluster.getSolrClient()}} is evidnetly picking up the ranodmized SSL and 
trying to use it to talk to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers



[ 
https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242645#comment-17242645
 ] 

ASF subversion and git services commented on SOLR-14934:


Commit 8732df8c505eec9109cd8a7bdd553e908447af5f in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8732df8 ]

SOLR-14934: test workaround for SOLR-15026


> Multiple Code Paths for determining "solr home" can return differnet answers
> 
>
> Key: SOLR-14934
> URL: https://issues.apache.org/jira/browse/SOLR-14934
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Minor
> Attachments: SOLR-14934.poc.patch
>
>
> While looking into some possible ways to make our tests more closely match 
> "real" solr installs, I realized that we currently have 2 different methods 
> for determining the "solr home" for a node...
>  * {{SolrPaths.locateSolrHome()}}
>  ** static method that uses a hueristic that typically results in using 
> {{System.getProperty("solr.solr.home");}}
>  *** NOTE: the result is not stored in any static/final variables
>  ** this method
>  * {{SolrDispatchFilter}}
>  ** starts by checking if an explicit {{ServletContext}} attribute is 
> specified
>  *** falls back to using {{SolrPaths.locateSolrHome()}}
>  ** whatever value is found gets set on {{CoreContainer}}
> In a typical Solr install, the {{"solr.solr.home"}} system property is set by 
> {{bin/solr}} and we get a consistent value for the life of the server 
> instance regardless of code path.
> In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that 
> calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests 
> (including {{MiniSolrCloudCluster}} based tests) we rely on the 
> {{ServletContext}} attribute based approach to have a unique "Solr Home" for 
> each node. ({{JettySOlrRunner}} injects the value when wiring up the 
> {{Server}} instance)
> This means that:
>  * in jetty based test - even if it's a single jetty instance - each of the 
> node's CoreContainer has a unique value of "solr home", but any code paths in 
> solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent 
> value across all nodes (different from the value in the CoreContainer for any 
> node)
>  * allthough i don't think it happens now: a test could call 
> {{System.setProperty("solr.solr.home",...)}} while a node is running, and 
> potentially get inconsistent behavior from even a  jetty node over time.
> 
> In practice, I don't think that any of this is currently causing "real bugs" 
> in actual solr code; nor do i _think_ we're seeing any "false positives" or 
> "false failures" in tests as a result of this - but it is a big huge land 
> mine just waiting to go off if we step too close, and i think we should 
> recitfy this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15026) MiniSolrCloudCluster can inconsistently get confused about when it's using SSL

2020-12-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242646#comment-17242646
 ] 

ASF subversion and git services commented on SOLR-15026:


Commit 8732df8c505eec9109cd8a7bdd553e908447af5f in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8732df8 ]

SOLR-14934: test workaround for SOLR-15026


> MiniSolrCloudCluster can inconsistently get confused about when it's using SSL
> --
>
> Key: SOLR-15026
> URL: https://issues.apache.org/jira/browse/SOLR-15026
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> A new test added in SOLR-14934 caused the following reproducible failure to 
> pop up on jenkins...
> {noformat}
> hossman@slate:~/lucene/dev [j11] [master] $ ./gradlew -p solr/test-framework/ 
> test --tests MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders 
> -Dtests.seed=806A85748BD81F48 -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=ln-CG -Dtests.timezone=Asia/Thimbu -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> Starting a Gradle Daemon (subsequent builds will be faster)
> > Task :randomizationInfo
> Running tests with randomization seed: tests.seed=806A85748BD81F48
> > Task :solr:test-framework:test
> org.apache.solr.cloud.MiniSolrCloudClusterTest > 
> testSolrHomeAndResourceLoaders FAILED
> org.apache.solr.client.solrj.SolrServerException: IOException occurred 
> when talking to server at: https://127.0.0.1:38681/solr
> at 
> __randomizedtesting.SeedInfo.seed([806A85748BD81F48:37548FA7602CB5FD]:0)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:712)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:390)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:360)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1168)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:931)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:865)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:246)
> at 
> org.apache.solr.cloud.MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders(MiniSolrCloudClusterTest.java:125)
> ...
> Caused by:
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
> at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439)
> {noformat}
> The problem sems to be that even though the MiniSolrCloudCluster being 
> instantiated isn't _intentionally_ using any SSL randomization (it just uses 
> {{JettyConfig.builder().build()}} the CloudSolrClient returned by 
> {{cluster.getSolrClient()}} is evidnetly picking up the ranodmized SSL and 
> trying to use it to talk to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15027) TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField reproducing failure on branch_8x

Chris M. Hostetter created SOLR-15027:
-

 Summary: 
TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField reproducing 
failure on branch_8x
 Key: SOLR-15027
 URL: https://issues.apache.org/jira/browse/SOLR-15027
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter



{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestInPlaceUpdateWithRouteField 
-Dtests.method=testUpdatingDocValuesWithRouteField 
-Dtests.seed=80F75127980BAE95 -Dtests.nightly=true -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=es-VE -Dtests.timezone=Asia/Sakhalin 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 1.90s | 
TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField <<<
   [junit4]> Throwable #1: java.lang.AssertionError: Lucene doc id should 
not be changed for In-Place Updates.
   [junit4]> Expected: is <21>
   [junit4]>  but: was <30>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([80F75127980BAE95:77AF61938946C1E4]:0)
   [junit4]>at 
org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   [junit4]>at 
org.apache.solr.update.TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField(TestInPlaceUpdateWithRouteField.java:115)
   [junit4]>at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15025) MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value



 [ 
https://issues.apache.org/jira/browse/SOLR-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-15025:
-
Labels: beginner newdev  (was: beginner)

> MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value
> -
>
> Key: SOLR-15025
> URL: https://issues.apache.org/jira/browse/SOLR-15025
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Priority: Major
>  Labels: beginner, newdev
>
> the api could also expand to take a time unit?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2114: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead ~ Backport to 8x



thelabdude merged pull request #2114:
URL: https://github.com/apache/lucene-solr/pull/2114


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] tflobbe commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up



tflobbe commented on a change in pull request #2115:
URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534445545



##
File path: 
solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java
##
@@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws 
Exception {
 JettySolrRunner jetty = 
getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0));
 SolrCore core = jetty.getCoreContainer().getCores().iterator().next();
 
-for (int i = 0; i < 5; i++) {
+for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) {
   cluster.expireZkSession(jetty);
+  waitForState("Expecting node to be disconnected", collectionName, 
activeReplicaCount(1, 0, 0));

Review comment:
   Wouldn't it be better to actually check for the node znode `ctime`? Or 
maybe a bump in `/live_nodes`'s `cversion`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15027) TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField reproducing failure on branch_8x

2020-12-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242700#comment-17242700
 ] 

Chris M. Hostetter commented on SOLR-15027:
---

I don't really understand why/how but git bisect has identified SOLR-14641 / 
d52628d9facfc13d8c29a7ecaf646a3b90263f8c as the cause of this failure.

[~caomanhdat] / [~mkhl] - any ideas what's going on here?

> TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField 
> reproducing failure on branch_8x
> 
>
> Key: SOLR-15027
> URL: https://issues.apache.org/jira/browse/SOLR-15027
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> {noformat}
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestInPlaceUpdateWithRouteField 
> -Dtests.method=testUpdatingDocValuesWithRouteField 
> -Dtests.seed=80F75127980BAE95 -Dtests.nightly=true -Dtests.slow=true 
> -Dtests.badapples=true -Dtests.locale=es-VE -Dtests.timezone=Asia/Sakhalin 
> -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>[junit4] FAILURE 1.90s | 
> TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField <<<
>[junit4]> Throwable #1: java.lang.AssertionError: Lucene doc id should 
> not be changed for In-Place Updates.
>[junit4]> Expected: is <21>
>[junit4]>  but: was <30>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([80F75127980BAE95:77AF61938946C1E4]:0)
>[junit4]>  at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>[junit4]>  at 
> org.apache.solr.update.TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField(TestInPlaceUpdateWithRouteField.java:115)
>[junit4]>  at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers



[ 
https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242702#comment-17242702
 ] 

ASF subversion and git services commented on SOLR-14934:


Commit 05a8477a362beb6b0e5a02b6ee4dfa106a2e6a76 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=05a8477 ]

SOLR-14934: Fix some additional test helper methods that aren't used on master 
but triggered problems when when backporting to branch_8x


> Multiple Code Paths for determining "solr home" can return differnet answers
> 
>
> Key: SOLR-14934
> URL: https://issues.apache.org/jira/browse/SOLR-14934
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Minor
> Attachments: SOLR-14934.poc.patch
>
>
> While looking into some possible ways to make our tests more closely match 
> "real" solr installs, I realized that we currently have 2 different methods 
> for determining the "solr home" for a node...
>  * {{SolrPaths.locateSolrHome()}}
>  ** static method that uses a hueristic that typically results in using 
> {{System.getProperty("solr.solr.home");}}
>  *** NOTE: the result is not stored in any static/final variables
>  ** this method
>  * {{SolrDispatchFilter}}
>  ** starts by checking if an explicit {{ServletContext}} attribute is 
> specified
>  *** falls back to using {{SolrPaths.locateSolrHome()}}
>  ** whatever value is found gets set on {{CoreContainer}}
> In a typical Solr install, the {{"solr.solr.home"}} system property is set by 
> {{bin/solr}} and we get a consistent value for the life of the server 
> instance regardless of code path.
> In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that 
> calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests 
> (including {{MiniSolrCloudCluster}} based tests) we rely on the 
> {{ServletContext}} attribute based approach to have a unique "Solr Home" for 
> each node. ({{JettySOlrRunner}} injects the value when wiring up the 
> {{Server}} instance)
> This means that:
>  * in jetty based test - even if it's a single jetty instance - each of the 
> node's CoreContainer has a unique value of "solr home", but any code paths in 
> solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent 
> value across all nodes (different from the value in the CoreContainer for any 
> node)
>  * allthough i don't think it happens now: a test could call 
> {{System.setProperty("solr.solr.home",...)}} while a node is running, and 
> potentially get inconsistent behavior from even a  jetty node over time.
> 
> In practice, I don't think that any of this is currently causing "real bugs" 
> in actual solr code; nor do i _think_ we're seeing any "false positives" or 
> "false failures" in tests as a result of this - but it is a big huge land 
> mine just waiting to go off if we step too close, and i think we should 
> recitfy this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley merged pull request #2105: Remove obsolete dev-tools scripts



dsmiley merged pull request #2105:
URL: https://github.com/apache/lucene-solr/pull/2105


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up



madrob commented on a change in pull request #2115:
URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534482769



##
File path: 
solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java
##
@@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws 
Exception {
 JettySolrRunner jetty = 
getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0));
 SolrCore core = jetty.getCoreContainer().getCores().iterator().next();
 
-for (int i = 0; i < 5; i++) {
+for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) {
   cluster.expireZkSession(jetty);
+  waitForState("Expecting node to be disconnected", collectionName, 
activeReplicaCount(1, 0, 0));

Review comment:
   There is a window where live node has gone away but state is still 
active because it hasn't updated yet. if we're just waiting for and watching 
live nodes, then we can see that go away and complete the test before the 
cluster has quiesced. this is also how we check in 
testPullReplicaDisconnectsFromZooKeeper, so for consistency this felt better.
   
   There is still a different race here that the replica could go down and come 
back up before we start waiting for it to be down the first time (we're 
expecting the overseer to be slow), which I'm sure @markrmiller would be upset 
with me over, but we can deal with that when he finishes the rest of his speed 
up branch.






This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy closed pull request #2102: SOLR-14977: Fix typo in solr-upgrade-notes.adoc

2020-12-02 Thread ASF subversion and git services (Jira)



janhoy closed pull request #2102:
URL: https://github.com/apache/lucene-solr/pull/2102


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers



[ 
https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242787#comment-17242787
 ] 

ASF subversion and git services commented on SOLR-14934:


Commit 5caadc12f4b00b882ec6235d317c82d823d21ff7 in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5caadc1 ]

SOLR-14934: Refactored duplicate "Solr Home" logic into a single place to 
eliminate risk of tests using divergent values for a single solr node.

(cherry picked from commit 2e6a02394ec4eea6ba72d5bc2bf02c0139a54f39)
(cherry picked from commit 05a8477a362beb6b0e5a02b6ee4dfa106a2e6a76)


> Multiple Code Paths for determining "solr home" can return differnet answers
> 
>
> Key: SOLR-14934
> URL: https://issues.apache.org/jira/browse/SOLR-14934
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Minor
> Attachments: SOLR-14934.poc.patch
>
>
> While looking into some possible ways to make our tests more closely match 
> "real" solr installs, I realized that we currently have 2 different methods 
> for determining the "solr home" for a node...
>  * {{SolrPaths.locateSolrHome()}}
>  ** static method that uses a hueristic that typically results in using 
> {{System.getProperty("solr.solr.home");}}
>  *** NOTE: the result is not stored in any static/final variables
>  ** this method
>  * {{SolrDispatchFilter}}
>  ** starts by checking if an explicit {{ServletContext}} attribute is 
> specified
>  *** falls back to using {{SolrPaths.locateSolrHome()}}
>  ** whatever value is found gets set on {{CoreContainer}}
> In a typical Solr install, the {{"solr.solr.home"}} system property is set by 
> {{bin/solr}} and we get a consistent value for the life of the server 
> instance regardless of code path.
> In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that 
> calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests 
> (including {{MiniSolrCloudCluster}} based tests) we rely on the 
> {{ServletContext}} attribute based approach to have a unique "Solr Home" for 
> each node. ({{JettySOlrRunner}} injects the value when wiring up the 
> {{Server}} instance)
> This means that:
>  * in jetty based test - even if it's a single jetty instance - each of the 
> node's CoreContainer has a unique value of "solr home", but any code paths in 
> solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent 
> value across all nodes (different from the value in the CoreContainer for any 
> node)
>  * allthough i don't think it happens now: a test could call 
> {{System.setProperty("solr.solr.home",...)}} while a node is running, and 
> potentially get inconsistent behavior from even a  jetty node over time.
> 
> In practice, I don't think that any of this is currently causing "real bugs" 
> in actual solr code; nor do i _think_ we're seeing any "false positives" or 
> "false failures" in tests as a result of this - but it is a big huge land 
> mine just waiting to go off if we step too close, and i think we should 
> recitfy this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster



[ 
https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242792#comment-17242792
 ] 

Chris M. Hostetter commented on SOLR-12182:
---

[~thelabdude]: on master CHANGES.txt shows this as a bug fix in 9.0, but on 
backport to 8x CHANGES.txt lists it as a bugfix in 8.8

> Can not switch urlScheme in 7x if there are any cores in the cluster
> 
>
> Key: SOLR-12182
> URL: https://issues.apache.org/jira/browse/SOLR-12182
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0, 7.1, 7.2
>Reporter: Anshum Gupta
>Assignee: Timothy Potter
>Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I was trying to enable TLS on a cluster that was already in use i.e. had 
> existing collections and ended up with down cores, that wouldn't come up and 
> the following core init errors in the logs:
> *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> replica with coreNodeName core_node4 exists but with a different name or 
> base_url.*
> What is happening here is that the core/replica is defined in the 
> clusterstate with the urlScheme as part of it's base URL e.g. 
> *"base_url":"http:hostname:port/solr"*.
> Switching the urlScheme in Solr breaks this convention as the host now uses 
> HTTPS instead.
> Actually, I ran into this with an older version because I was running with 
> *legacyCloud=false* and then realized that we switched that to the default 
> behavior only in 7x i.e while most users did not hit this issue with older 
> versions, unless they overrode the legacyCloud value explicitly, users 
> running 7x are bound to run into this more often.
> Switching the value of legacyCloud to true, bouncing the cluster so that the 
> clusterstate gets flushed, and then setting it back to false is a workaround 
> but a bit risky one if you don't know if you have any old cores lying around.
> Ideally, I think we shouldn't prepend the urlScheme to the base_url value and 
> use the urlScheme on the fly to construct it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil

2020-12-02 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242817#comment-17242817
 ] 

Mike Drob commented on LUCENE-9629:
---

Would it make sense to shove all of these values into arrays?

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties

2020-12-02 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242821#comment-17242821
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14182:
--

{quote}metric reporters configuration should be moved to container-level 
plugins, ie. {{/clusterprops.json:/plugin}} and the corresponding API. This 
will make the reporters easier to configure and change dynamically without 
restarting Solr nodes.
{quote}
I really don't see the point in moving this configuration to clusterprops. This 
will be bad for people that keep configuration as code, specially if they have 
multiple clusters, and it requires very solr-specific deployments processes. 
i.e. instead of building the Docker image and deploy it as you normally do, you 
need to, in addition, do this particular request to each Solr cluster, that's 
specific to this change so you'll never have to do do again unless we do 
changes in this particular component again (handle errors accordingly).
I wish we could tackle SOLR-14843, before doing these changes, hopefully in a 
way where the use-case "long-lived Solr nodes where things can be installed on 
it" can coexist better with other strategies, such us rolling restarts, 
blue-green deployments or any kind of immutable deployments strategies.

> Move metric reporters config from solr.xml to ZK cluster properties
> ---
>
> Key: SOLR-14182
> URL: https://issues.apache.org/jira/browse/SOLR-14182
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.4
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Metric reporters are currently configured statically in solr.xml, which makes 
> it difficult to change dynamically or in a containerized environment.
> We should move this section to ZK /cluster.properties and add a back-compat 
> migration shim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers



[ 
https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242828#comment-17242828
 ] 

ASF subversion and git services commented on SOLR-14934:


Commit 5208d47e1a2030dc51396db74d42b52ba378756d in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5208d47 ]

SOLR-14934: Remove redundent deprecated "solr.solr.home" logic


> Multiple Code Paths for determining "solr home" can return differnet answers
> 
>
> Key: SOLR-14934
> URL: https://issues.apache.org/jira/browse/SOLR-14934
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Minor
> Attachments: SOLR-14934.poc.patch
>
>
> While looking into some possible ways to make our tests more closely match 
> "real" solr installs, I realized that we currently have 2 different methods 
> for determining the "solr home" for a node...
>  * {{SolrPaths.locateSolrHome()}}
>  ** static method that uses a hueristic that typically results in using 
> {{System.getProperty("solr.solr.home");}}
>  *** NOTE: the result is not stored in any static/final variables
>  ** this method
>  * {{SolrDispatchFilter}}
>  ** starts by checking if an explicit {{ServletContext}} attribute is 
> specified
>  *** falls back to using {{SolrPaths.locateSolrHome()}}
>  ** whatever value is found gets set on {{CoreContainer}}
> In a typical Solr install, the {{"solr.solr.home"}} system property is set by 
> {{bin/solr}} and we get a consistent value for the life of the server 
> instance regardless of code path.
> In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that 
> calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests 
> (including {{MiniSolrCloudCluster}} based tests) we rely on the 
> {{ServletContext}} attribute based approach to have a unique "Solr Home" for 
> each node. ({{JettySOlrRunner}} injects the value when wiring up the 
> {{Server}} instance)
> This means that:
>  * in jetty based test - even if it's a single jetty instance - each of the 
> node's CoreContainer has a unique value of "solr home", but any code paths in 
> solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent 
> value across all nodes (different from the value in the CoreContainer for any 
> node)
>  * allthough i don't think it happens now: a test could call 
> {{System.setProperty("solr.solr.home",...)}} while a node is running, and 
> potentially get inconsistent behavior from even a  jetty node over time.
> 
> In practice, I don't think that any of this is currently causing "real bugs" 
> in actual solr code; nor do i _think_ we're seeing any "false positives" or 
> "false failures" in tests as a result of this - but it is a big huge land 
> mine just waiting to go off if we step too close, and i think we should 
> recitfy this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster



[ 
https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242826#comment-17242826
 ] 

Timothy Potter commented on SOLR-12182:
---

Yes, I'm aware of that [~hossman] ... wasn't going to backport this given the 
scope but changed my mind. I'll fix CHANGES.txt in master

> Can not switch urlScheme in 7x if there are any cores in the cluster
> 
>
> Key: SOLR-12182
> URL: https://issues.apache.org/jira/browse/SOLR-12182
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0, 7.1, 7.2
>Reporter: Anshum Gupta
>Assignee: Timothy Potter
>Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> I was trying to enable TLS on a cluster that was already in use i.e. had 
> existing collections and ended up with down cores, that wouldn't come up and 
> the following core init errors in the logs:
> *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> replica with coreNodeName core_node4 exists but with a different name or 
> base_url.*
> What is happening here is that the core/replica is defined in the 
> clusterstate with the urlScheme as part of it's base URL e.g. 
> *"base_url":"http:hostname:port/solr"*.
> Switching the urlScheme in Solr breaks this convention as the host now uses 
> HTTPS instead.
> Actually, I ran into this with an older version because I was running with 
> *legacyCloud=false* and then realized that we switched that to the default 
> behavior only in 7x i.e while most users did not hit this issue with older 
> versions, unless they overrode the legacyCloud value explicitly, users 
> running 7x are bound to run into this more often.
> Switching the value of legacyCloud to true, bouncing the cluster so that the 
> clusterstate gets flushed, and then setting it back to false is a workaround 
> but a bit risky one if you don't know if you have any old cores lying around.
> Ideally, I think we shouldn't prepend the urlScheme to the base_url value and 
> use the urlScheme on the fly to construct it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers



 [ 
https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter resolved SOLR-14934.
---
Fix Version/s: master (9.0)
   8.8
   Resolution: Fixed

> Multiple Code Paths for determining "solr home" can return differnet answers
> 
>
> Key: SOLR-14934
> URL: https://issues.apache.org/jira/browse/SOLR-14934
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-14934.poc.patch
>
>
> While looking into some possible ways to make our tests more closely match 
> "real" solr installs, I realized that we currently have 2 different methods 
> for determining the "solr home" for a node...
>  * {{SolrPaths.locateSolrHome()}}
>  ** static method that uses a hueristic that typically results in using 
> {{System.getProperty("solr.solr.home");}}
>  *** NOTE: the result is not stored in any static/final variables
>  ** this method
>  * {{SolrDispatchFilter}}
>  ** starts by checking if an explicit {{ServletContext}} attribute is 
> specified
>  *** falls back to using {{SolrPaths.locateSolrHome()}}
>  ** whatever value is found gets set on {{CoreContainer}}
> In a typical Solr install, the {{"solr.solr.home"}} system property is set by 
> {{bin/solr}} and we get a consistent value for the life of the server 
> instance regardless of code path.
> In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that 
> calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests 
> (including {{MiniSolrCloudCluster}} based tests) we rely on the 
> {{ServletContext}} attribute based approach to have a unique "Solr Home" for 
> each node. ({{JettySOlrRunner}} injects the value when wiring up the 
> {{Server}} instance)
> This means that:
>  * in jetty based test - even if it's a single jetty instance - each of the 
> node's CoreContainer has a unique value of "solr home", but any code paths in 
> solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent 
> value across all nodes (different from the value in the CoreContainer for any 
> node)
>  * allthough i don't think it happens now: a test could call 
> {{System.setProperty("solr.solr.home",...)}} while a node is running, and 
> potentially get inconsistent behavior from even a  jetty node over time.
> 
> In practice, I don't think that any of this is currently causing "real bugs" 
> in actual solr code; nor do i _think_ we're seeing any "false positives" or 
> "false failures" in tests as a result of this - but it is a big huge land 
> mine just waiting to go off if we step too close, and i think we should 
> recitfy this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2116: SOLR-12182: Fix Changes.txt



thelabdude opened a new pull request #2116:
URL: https://github.com/apache/lucene-solr/pull/2116


   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2116: SOLR-12182: Fix Changes.txt

2020-12-02 Thread ASF subversion and git services (Jira)



thelabdude merged pull request #2116:
URL: https://github.com/apache/lucene-solr/pull/2116


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster



[ 
https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242829#comment-17242829
 ] 

ASF subversion and git services commented on SOLR-12182:


Commit 4c100a0175e2553320ca3133bbe9170592389d9d in lucene-solr's branch 
refs/heads/master from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4c100a0 ]

SOLR-12182: Fix Changes.txt in master (#2116)



> Can not switch urlScheme in 7x if there are any cores in the cluster
> 
>
> Key: SOLR-12182
> URL: https://issues.apache.org/jira/browse/SOLR-12182
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.0, 7.1, 7.2
>Reporter: Anshum Gupta
>Assignee: Timothy Potter
>Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> I was trying to enable TLS on a cluster that was already in use i.e. had 
> existing collections and ended up with down cores, that wouldn't come up and 
> the following core init errors in the logs:
> *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> replica with coreNodeName core_node4 exists but with a different name or 
> base_url.*
> What is happening here is that the core/replica is defined in the 
> clusterstate with the urlScheme as part of it's base URL e.g. 
> *"base_url":"http:hostname:port/solr"*.
> Switching the urlScheme in Solr breaks this convention as the host now uses 
> HTTPS instead.
> Actually, I ran into this with an older version because I was running with 
> *legacyCloud=false* and then realized that we switched that to the default 
> behavior only in 7x i.e while most users did not hit this issue with older 
> versions, unless they overrode the legacyCloud value explicitly, users 
> running 7x are bound to run into this more often.
> Switching the value of legacyCloud to true, bouncing the cluster so that the 
> clusterstate gets flushed, and then setting it back to false is a workaround 
> but a bit risky one if you don't know if you have any old cores lying around.
> Ideally, I think we shouldn't prepend the urlScheme to the base_url value and 
> use the urlScheme on the fly to construct it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy merged pull request #2103: Reconcile upgrade notes in master



janhoy merged pull request #2103:
URL: https://github.com/apache/lucene-solr/pull/2103


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

2020-12-02 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242833#comment-17242833
 ] 

Noble Paul commented on SOLR-14613:
---

(y)

> Provide a clean API for pluggable replica assignment implementations
> 
>
> Key: SOLR-14613
> URL: https://issues.apache.org/jira/browse/SOLR-14613
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Ilan Ginzburg
>Priority: Major
>  Time Spent: 41h 20m
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9406) Make it simpler to track IndexWriter's events

2020-12-02 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242858#comment-17242858
 ] 

Zach Chen commented on LUCENE-9406:
---

Hi [~mikemccand], I'm trying to find a new task to work on and see this. I took 
a look at the comments in the PR and see that you already have some work in 
progress code such as the interface. Just curious, has the discussion been 
carried out further in any way after that PR, and if this task is ready to be 
picked up again at this point (following your original approach to use 
*IndexWriterEvents* class maybe) ?

> Make it simpler to track IndexWriter's events
> -
>
> Key: LUCENE-9406
> URL: https://issues.apache.org/jira/browse/LUCENE-9406
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
>
> This is the second spinoff from a [controversial PR to add a new index-time 
> feature to Lucene to merge small segments during 
> commit|https://github.com/apache/lucene-solr/pull/1552].  That change can 
> substantially reduce the number of small index segments to search.
> In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving 
> the application a chance to track when {{IndexWriter}} kicked off merges 
> during commit, how many, how long it waited, how often it gave up waiting, 
> etc.
> Such telemetry from production usage is really helpful when tuning settings 
> like which merges (e.g. a size threshold) to attempt on commit, and how long 
> to wait during commit, etc.
> I am splitting out this issue to explore possible approaches to do this.  
> E.g. [~simonw] proposed using a statistics class instead, but if I understood 
> that correctly, I think that would put the role of aggregation inside 
> {{IndexWriter}}, which is not ideal.
> Many interesting events, e.g. how many merges are being requested, how large 
> are they, how long did they take to complete or fail, etc., can be gleaned by 
> wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}.  
> But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for 
> merges during commit), it would be very helpful to have some simple way to 
> track so applications can better tune.
> It is also possible to subclass {{IndexWriter}} and override key methods, but 
> I think that is inherently risky as {{IndexWriter}}'s protected methods are 
> not considered to be a stable API, and the synchronization used by 
> {{IndexWriter}} is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] zacharymorn commented on pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory



zacharymorn commented on pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#issuecomment-737614580


   Just want to have a quick follow up on this PR. Are there any more changes 
expected from my end?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9367) Using a queryText which results in zero tokens causes a query to be built as null

2020-12-02 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242868#comment-17242868
 ] 

Zach Chen edited comment on LUCENE-9367 at 12/3/20, 2:37 AM:
-

Looks like there's some inconsistency about parsing, as I can get 
*MatchNoDocsQuery* from *SimpleQueryParser* 
{code:java}
public void test() throws IOException {
  Analyzer analyzer = CustomAnalyzer.builder()
  .withTokenizer(StandardTokenizerFactory.class)
  .addTokenFilter(StopFilterFactory.class)
  .build();

  QueryBuilder queryBuilder = new QueryBuilder(analyzer);

  String onlyStopWords = "the and it";
  Query query = queryBuilder.createPhraseQuery("AnyField", onlyStopWords);

  assertNull(query);

  query = new SimpleQueryParser(analyzer, "AnyField").parse(onlyStopWords);

  assertEquals(new MatchNoDocsQuery("empty string passed to query parser"), 
query);
}
{code}
I can put out a PR to change it to MatchNoDocsQuery if that's the preferred 
direction? I also see additional changes though to check for MatchNoDocsQuery 
now instead of null in this situation.

 


was (Author: zacharymorn):
Looks like there's some inconsistency about parsing, as I can get 
*MatchNoDocsQuery* from *SimpleQueryParser* 

 
{code:java}
public void test() throws IOException {
  Analyzer analyzer = CustomAnalyzer.builder()
  .withTokenizer(StandardTokenizerFactory.class)
  .addTokenFilter(StopFilterFactory.class)
  .build();

  QueryBuilder queryBuilder = new QueryBuilder(analyzer);

  String onlyStopWords = "the and it";
  Query query = queryBuilder.createPhraseQuery("AnyField", onlyStopWords);

  assertNull(query);

  query = new SimpleQueryParser(analyzer, "AnyField").parse(onlyStopWords);

  assertEquals(new MatchNoDocsQuery("empty string passed to query parser"), 
query);
}
{code}
I can put out a PR to change it to MatchNoDocsQuery if that's the preferred 
direction? I also see additional changes though to check for MatchNoDocsQuery 
now instead of null in this situation.

 

> Using a queryText which results in zero tokens causes a query to be built as 
> null
> -
>
> Key: LUCENE-9367
> URL: https://issues.apache.org/jira/browse/LUCENE-9367
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.2.1
>Reporter: Tim Brier
>Priority: Major
>
> If a queryText produces zero tokens after being processed by an Analyzer, 
> when you try to build a Query with it the result is null.
>  
> The following code reproduces this bug:
> {code:java}
> public class LuceneBug {
> public Query buildQuery() throws IOException {
> Analyzer analyzer = CustomAnalyzer.builder()
> .withTokenizer(StandardTokenizerFactory.class)
> .addTokenFilter(StopFilterFactory.class)
> .build();
> QueryBuilder queryBuilder = new QueryBuilder(analyzer);
> String onlyStopWords = "the and it";
> return queryBuilder.createPhraseQuery("AnyField", onlyStopWords);
> }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9367) Using a queryText which results in zero tokens causes a query to be built as null

2020-12-02 Thread Zach Chen (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242868#comment-17242868
 ] 

Zach Chen commented on LUCENE-9367:
---

Looks like there's some inconsistency about parsing, as I can get 
*MatchNoDocsQuery* from *SimpleQueryParser* 

 
{code:java}
public void test() throws IOException {
  Analyzer analyzer = CustomAnalyzer.builder()
  .withTokenizer(StandardTokenizerFactory.class)
  .addTokenFilter(StopFilterFactory.class)
  .build();

  QueryBuilder queryBuilder = new QueryBuilder(analyzer);

  String onlyStopWords = "the and it";
  Query query = queryBuilder.createPhraseQuery("AnyField", onlyStopWords);

  assertNull(query);

  query = new SimpleQueryParser(analyzer, "AnyField").parse(onlyStopWords);

  assertEquals(new MatchNoDocsQuery("empty string passed to query parser"), 
query);
}
{code}
I can put out a PR to change it to MatchNoDocsQuery if that's the preferred 
direction? I also see additional changes though to check for MatchNoDocsQuery 
now instead of null in this situation.

 

> Using a queryText which results in zero tokens causes a query to be built as 
> null
> -
>
> Key: LUCENE-9367
> URL: https://issues.apache.org/jira/browse/LUCENE-9367
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.2.1
>Reporter: Tim Brier
>Priority: Major
>
> If a queryText produces zero tokens after being processed by an Analyzer, 
> when you try to build a Query with it the result is null.
>  
> The following code reproduces this bug:
> {code:java}
> public class LuceneBug {
> public Query buildQuery() throws IOException {
> Analyzer analyzer = CustomAnalyzer.builder()
> .withTokenizer(StandardTokenizerFactory.class)
> .addTokenFilter(StopFilterFactory.class)
> .build();
> QueryBuilder queryBuilder = new QueryBuilder(analyzer);
> String onlyStopWords = "the and it";
> return queryBuilder.createPhraseQuery("AnyField", onlyStopWords);
> }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15028) summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted

Amy Bai created SOLR-15028:
--

 Summary: summarySolrCloud shows cluster still healthy without 
failover even the node data directory is deleted
 Key: SOLR-15028
 URL: https://issues.apache.org/jira/browse/SOLR-15028
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 7.4.1
Reporter: Amy Bai


I found that SolrCloud won't check the IO status if the SolrCloud process is 
alive.

 

e.g. If I delete the SolrCloud data directory, there are no errors report, and 
I can still log in to the SolrCloud Admin UI to create/query collections.

Then, index/search queries keep failing because one of the node data 
directories is gone, but the node is not marked as down.

The replicas on the failed node are not working, but the Index/search queries 
didn't failover to other healthy replicas.  

 

The ERROR message as below shows:

"""

curl -X POST -H 'Content-Type: application/json' 
'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' 
\{ "a": "1", }' \{ "responseHeader":{ "status":400, "QTime":6}, "error":\{ 
"metadata":[ "error-class","org.apache.solr.common.SolrException", 
"root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting 
document: ", "code":400}}

"""  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15028) summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted



 [ 
https://issues.apache.org/jira/browse/SOLR-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amy Bai updated SOLR-15028:
---
Affects Version/s: (was: 7.4.1)
   7.4

> summarySolrCloud shows cluster still healthy without failover even the node 
> data directory is deleted
> -
>
> Key: SOLR-15028
> URL: https://issues.apache.org/jira/browse/SOLR-15028
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.4
>Reporter: Amy Bai
>Priority: Major
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
>  
> e.g. If I delete the SolrCloud data directory, there are no errors report, 
> and I can still log in to the SolrCloud Admin UI to create/query collections.
> Then, index/search queries keep failing because one of the node data 
> directories is gone, but the node is not marked as down.
> The replicas on the failed node are not working, but the Index/search queries 
> didn't failover to other healthy replicas.  
>  
> The ERROR message as below shows:
> """
> curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary 
> ' \{ "a": "1", }' \{ "responseHeader":{ "status":400, "QTime":6}, "error":\{ 
> "metadata":[ "error-class","org.apache.solr.common.SolrException", 
> "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error 
> inserting document: ", "code":400}}
> """  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15028) summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted



 [ 
https://issues.apache.org/jira/browse/SOLR-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amy Bai updated SOLR-15028:
---
Description: 
I found that SolrCloud won't check the IO status if the SolrCloud process is 
alive.

 

e.g. If I delete the data directory for one of the SolrCloud node, there are no 
errors report, and I can still log in to the SolrCloud Admin UI to create/query 
collections. SolrCloud Admin UI shows the collections' status is green.

Then, index/search queries keep failing because one of the node data 
directories is gone, but the node is not marked as down.

The replicas on the failed node are not working, but the Index/search queries 
didn't failover to other healthy replicas.  

 

The ERROR message as below shows:

"""

curl -X POST -H 'Content-Type: application/json' 
'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' 
\{ "a": "1", }' { "responseHeader":

{ "status":400, "QTime":6}

, "error":\{ "metadata":[ "error-class","org.apache.solr.common.SolrException", 
"root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting 
document: ", "code":400}}

"""  

  was:
I found that SolrCloud won't check the IO status if the SolrCloud process is 
alive.

 

e.g. If I delete the SolrCloud data directory, there are no errors report, and 
I can still log in to the SolrCloud Admin UI to create/query collections.

Then, index/search queries keep failing because one of the node data 
directories is gone, but the node is not marked as down.

The replicas on the failed node are not working, but the Index/search queries 
didn't failover to other healthy replicas.  

 

The ERROR message as below shows:

"""

curl -X POST -H 'Content-Type: application/json' 
'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' 
\{ "a": "1", }' \{ "responseHeader":{ "status":400, "QTime":6}, "error":\{ 
"metadata":[ "error-class","org.apache.solr.common.SolrException", 
"root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting 
document: ", "code":400}}

"""  


> summarySolrCloud shows cluster still healthy without failover even the node 
> data directory is deleted
> -
>
> Key: SOLR-15028
> URL: https://issues.apache.org/jira/browse/SOLR-15028
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.4
>Reporter: Amy Bai
>Priority: Major
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
>  
> e.g. If I delete the data directory for one of the SolrCloud node, there are 
> no errors report, and I can still log in to the SolrCloud Admin UI to 
> create/query collections. SolrCloud Admin UI shows the collections' status is 
> green.
> Then, index/search queries keep failing because one of the node data 
> directories is gone, but the node is not marked as down.
> The replicas on the failed node are not working, but the Index/search queries 
> didn't failover to other healthy replicas.  
>  
> The ERROR message as below shows:
> """
> curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary 
> ' \{ "a": "1", }' { "responseHeader":
> { "status":400, "QTime":6}
> , "error":\{ "metadata":[ 
> "error-class","org.apache.solr.common.SolrException", 
> "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error 
> inserting document: ", "code":400}}
> """  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15028) SolrCloud shows cluster still healthy without failover even the node data directory is deleted



 [ 
https://issues.apache.org/jira/browse/SOLR-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amy Bai updated SOLR-15028:
---
Summary: SolrCloud shows cluster still healthy without failover even the 
node data directory is deleted  (was: summarySolrCloud shows cluster still 
healthy without failover even the node data directory is deleted)

> SolrCloud shows cluster still healthy without failover even the node data 
> directory is deleted
> --
>
> Key: SOLR-15028
> URL: https://issues.apache.org/jira/browse/SOLR-15028
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.4
>Reporter: Amy Bai
>Priority: Major
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
>  
> e.g. If I delete the data directory for one of the SolrCloud node, there are 
> no errors report, and I can still log in to the SolrCloud Admin UI to 
> create/query collections. SolrCloud Admin UI shows the collections' status is 
> green.
> Then, index/search queries keep failing because one of the node data 
> directories is gone, but the node is not marked as down.
> The replicas on the failed node are not working, but the Index/search queries 
> didn't failover to other healthy replicas.  
>  
> The ERROR message as below shows:
> """
> curl -X POST -H 'Content-Type: application/json' 
> 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary 
> ' \{ "a": "1", }' { "responseHeader":
> { "status":400, "QTime":6}
> , "error":\{ "metadata":[ 
> "error-class","org.apache.solr.common.SolrException", 
> "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error 
> inserting document: ", "code":400}}
> """  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888
 ] 

Feng Guo commented on LUCENE-9629:
--

[~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull 
request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the 
lines of code get even less~

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:51 AM:


[~mdrob] Thanks for you reply! That's a really good idea, I updated my 
[PR|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines 
of code get even less~


was (Author: gf2121):
[~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull 
request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the 
lines of code get even less~

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:51 AM:


[~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull 
request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the 
lines of code get even less~


was (Author: gf2121):
[~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull 
request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the 
lines of code get even less~

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:53 AM:


[~mdrob] Thanks for you advice! That's a really good idea, I updated my 
[PR|https://github.com/apache/lucene-solr/pull/2113/files] and find the lines 
of code get even less~


was (Author: gf2121):
[~mdrob] Thanks for you reply! That's a really good idea, I updated my 
[PR|https://github.com/apache/lucene-solr/pull/2113/files] and find the lines 
of code get even less~

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:53 AM:


[~mdrob] Thanks for you reply! That's a really good idea, I updated my 
[PR|https://github.com/apache/lucene-solr/pull/2113/files] and find the lines 
of code get even less~


was (Author: gf2121):
[~mdrob] Thanks for you reply! That's a really good idea, I updated my 
[PR|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines 
of code get even less~

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
>  
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
>  
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9629) Use computed mask values in ForUtil



 [ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-9629:
-
Description: 
In the class ForkUtil, mask values have been computed and stored in static 
final vailables, but they are recomputed for every encoding, which may be 
unnecessary. 

anther small fix is that change
{code:java}
remainingBitsPerValue > remainingBitsPerLong{code}
 to
{code:java}
remainingBitsPerValue >= remainingBitsPerLong{code}
otherwise
{code:java}
if (remainingBitsPerValue == 0) {
 idx++;
 remainingBitsPerValue = bitsPerValue; 
}
{code}
these code will never be used.

  was:
In the class ForkUtil, mask values have been computed and stored in static 
final vailables, but they are recomputed for every encoding, which may be 
unnecessary. 

anther small fix is that change
{code:java}
remainingBitsPerValue > remainingBitsPerLong{code}
 to
{code:java}
remainingBitsPerValue >= remainingBitsPerLong{code}
otherwise

 
{code:java}
if (remainingBitsPerValue == 0) {
 idx++;
 remainingBitsPerValue = bitsPerValue; 
}
{code}
 

these code will never be used.


> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/3/20, 4:02 AM:


[~jpountz] Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change

In addition, i'm a bit poor in english speaking and most of words above come 
from translate programs. if there are any word offending you, please just 
ignore it. i really admire this amazing project and just try my best to make it 
better:)

 


was (Author: gf2121):
Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change

In addition, i'm a bit poor in english speaking and most of words above come 
from translate programs. if there are any word offending you, please just 
ignore it. i really admire this amazing project and just try to do my best to 
make it better:)

 

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9630) Allow Shard Leader to give up leadership gracefully via shard terms

Mike Drob created LUCENE-9630:
-

 Summary: Allow Shard Leader to give up leadership gracefully via 
shard terms
 Key: LUCENE-9630
 URL: https://issues.apache.org/jira/browse/LUCENE-9630
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Mike Drob


Currently we have (via SOLR-12412) that when a leader sees an index writing 
error during an update it will give up leadership by deleting the replica and 
adding a new replica. One stated benefit of this was that because we are using 
the overseer and a known code path, that this is done asynchronous and very 
efficiently.

I would argue that this approach is too heavy handed.

In the case of a corrupt index exception, it makes some sense to completely 
delete the index dir and attempt to sync from a good peer. Even in this case, 
however, it might be better to allow fingerprinting and other index delta 
mechanisms take over and allow for a more efficient data transfer.

In an alternate case where the index error arises due to a disconnected file 
system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) and 
the required solution is some kind of reconnect, then this approach has several 
shortcomings - the core delete and creations are going to fail leaving dangling 
replicas. Further, the data is still present so there is no need to do so many 
extra copies.

I propose that we bring in a mechanism to give up leadership via the existing 
shard terms language. I believe we would be able to set all replicas currently 
equal to leader term T to T+1, and then trigger a new leader election. The 
current leader would know it is ineligible, while the other replicas that were 
current before the failed update would be eligible. This improvement would 
entail adding an additional possible operation to terms state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-15029) Allow Shard Leader to give up leadership gracefully via shard terms



 [ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob reassigned SOLR-15029:


Assignee: Mike Drob

> Allow Shard Leader to give up leadership gracefully via shard terms
> ---
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Moved] (SOLR-15029) Allow Shard Leader to give up leadership gracefully via shard terms



 [ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob moved LUCENE-9630 to SOLR-15029:
--

  Key: SOLR-15029  (was: LUCENE-9630)
Lucene Fields:   (was: New)
   Issue Type: Improvement  (was: Bug)
  Project: Solr  (was: Lucene - Core)

> Allow Shard Leader to give up leadership gracefully via shard terms
> ---
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Priority: Major
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15029) Allow Shard Leader to give up leadership gracefully via shard terms



[ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242904#comment-17242904
 ] 

Mike Drob commented on SOLR-15029:
--

[~tflobbe], [~caomanhdat] - you were involved with the initial implementation 
of giving up leadership so I would love to hear your thoughts on this proposal. 
[~varun], you too, since it looks like you were battle tested on that issue.

> Allow Shard Leader to give up leadership gracefully via shard terms
> ---
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/3/20, 5:52 AM:


[~jpountz] Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears. So you 
may think of it as a "fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, and ForUtil is somewhat a hot way when indexing, so it may be worth 
>fixing. But if you insist that the precompute make no sense, just tell me and 
>i will revert this part of change.

In addition, i'm a bit poor in english speaking and most of words above come 
from translate programs. if there are any word offending you, please just 
ignore it. i really admire this amazing project and just try my best to make it 
better:)

 


was (Author: gf2121):
[~jpountz] Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears, not to 
mention here is somewhat a hot way when indexing. So you may consider it as a 
"fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, but if you insist that precompute make no sense, just tell me and i 
>will revert this part of change

In addition, i'm a bit poor in english speaking and most of words above come 
from translate programs. if there are any word offending you, please just 
ignore it. i really admire this amazing project and just try my best to make it 
better:)

 

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] tflobbe commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up



tflobbe commented on a change in pull request #2115:
URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534702097



##
File path: 
solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java
##
@@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws 
Exception {
 JettySolrRunner jetty = 
getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0));
 SolrCore core = jetty.getCoreContainer().getCores().iterator().next();
 
-for (int i = 0; i < 5; i++) {
+for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) {
   cluster.expireZkSession(jetty);
+  waitForState("Expecting node to be disconnected", collectionName, 
activeReplicaCount(1, 0, 0));

Review comment:
   > There is a window where live node has gone away but state is still 
active because it hasn't updated yet. 
   Have you seen that happening? AFAIK, everywhere that we check if a replica 
is active we look at the state and the live nodes.
   > if we're just waiting for and watching live nodes, then we can see that go 
away and complete the test before the cluster has quiesced.
   We would still have the check in line 243, right? My point was:
   1) wait to see a change in live nodes
   2) wait for active (line 243 as it is now)
   
   Wouldn't that be safe (assuming no other, unrelated node dies just at this 
point)?
   
   > There is still a different race here that the replica could go down and 
come back up before we start waiting for it to be down the first time
   Right, that's the one I was concerned about. Difficult to happen, but...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] tflobbe commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up



tflobbe commented on a change in pull request #2115:
URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534702097



##
File path: 
solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java
##
@@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws 
Exception {
 JettySolrRunner jetty = 
getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0));
 SolrCore core = jetty.getCoreContainer().getCores().iterator().next();
 
-for (int i = 0; i < 5; i++) {
+for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) {
   cluster.expireZkSession(jetty);
+  waitForState("Expecting node to be disconnected", collectionName, 
activeReplicaCount(1, 0, 0));

Review comment:
   > There is a window where live node has gone away but state is still 
active because it hasn't updated yet. 
   
   Have you seen that happening? AFAIK, everywhere that we check if a replica 
is active we look at the state and the live nodes.
   > if we're just waiting for and watching live nodes, then we can see that go 
away and complete the test before the cluster has quiesced.
   
   We would still have the check in line 243, right? My point was:
   1) wait to see a change in live nodes
   2) wait for active (line 243 as it is now)
   
   Wouldn't that be safe (assuming no other, unrelated node dies just at this 
point)?
   
   > There is still a different race here that the replica could go down and 
come back up before we start waiting for it to be down the first time
   
   Right, that's the one I was concerned about. Difficult to happen, but...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/3/20, 6:41 AM:


[~jpountz] Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears. So you 
may think of it as a "fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler
{code:java}
public static void main(String[] args) throws Exception {
Random random = new Random(System.currentTimeMillis());
long[] nums = new long[128];
for (int i = 0; i < 128; i++) {
nums[i] = random.nextInt(7) + 1;
}
ForUtil forUtil = new ForUtil();
DataOutput dataOutput = new DataOutput() {
@Override
public void writeLong(long i) throws IOException {}
@Override
public void writeByte(byte b) throws IOException {}
@Override
public void writeBytes(byte[] bytes, int i, int i1) throws IOException 
{}
};
while (true){
forUtil.encode(nums, 3, dataOutput);
}
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask8
java.lang.Long.reverseBytes
org.apache.lucene.codecs.lucene84.Main$1.writeLong|29.9%
 13.7%
< 1%
< 1%|31.7%
2.8%
< 1%
< 1%|

>From my point of view, the number of code lines is less important than writing 
>speed, and ForUtil is somewhat a hot way when indexing, so it may be worth 
>fixing. But if you insist that the precompute make no sense, just tell me and 
>i will revert this part of change.

In addition, i'm a bit poor in english speaking and most of words above come 
from translate programs. if there are any word offending you, please just 
ignore it. i really admire this amazing project and just try my best to make it 
better:)

 


was (Author: gf2121):
[~jpountz] Thanks for your reply! I can't agree more that write path is less 
performance-sensitive than the read path, and to be honest, i didn't expect 
this change will bring a very big improvement in writing speed. All I'm trying 
to do is just to reduce duplicate compute no matter where it appears. So you 
may think of it as a "fix" instead of an "enhancement".

here is a simple benchmark run with cpu profiler if your are interested~
{code:java}
for (int time=0; time<100; time++) { 
  Random random = new Random(System.currentTimeMillis());
  long[] nums = new long[128];
  for (int i=0;i<128;i++) {
    nums[i] = random.nextInt(4)+1;
  }
  ForUtil forUtil = new ForUtil();
  Directory directory = new ByteBuffersDirectory();
  DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT);
  for (int i = 0; i < 1; i++) {
    forUtil.encode(nums, 3, dataOutput);
  }
  directory.close();
}{code}
*result:*
|| ||before||after||
|org.apache.lucene.store.ByteBuffersIndexOutput.writeLong
 org.apache.lucene.store.ForUtil.collapse8
 org.apache.lucene.store.ForUtil.mask(ed)8|40.4%
 15.3%
 8.8%|41.2%
 14.8%
 3.8%|

>From my point of view, the number of code lines is less important than writing 
>speed, and ForUtil is somewhat a hot way when indexing, so it may be worth 
>fixing. But if you insist that the precompute make no sense, just tell me and 
>i will revert this part of change.

In addition, i'm a bit poor in english speaking and most of words above come 
from translate programs. if there are any word offending you, please just 
ignore it. i really admire this amazing project and just try my best to make it 
better:)

 

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.ap

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil