[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting
[ https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242229#comment-17242229 ] Andrzej Bialecki commented on SOLR-4735: [~Pavithrad] please open a new Jira issue and describe the problem in more detail, including Solr version, environment and Solr logs - this issue is closed. > Improve Solr metrics reporting > -- > > Key: SOLR-4735 > URL: https://issues.apache.org/jira/browse/SOLR-4735 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Alan Woodward >Assignee: Andrzej Bialecki >Priority: Minor > Fix For: 6.4, 7.0 > > Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, > SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, screenshot-2.png > > Time Spent: 20m > Remaining Estimate: 0h > > Following on from a discussion on the mailing list: > http://search-lucene.com/m/IO0EI1qdyJF1/codahale&subj=Solr+metrics+in+Codahale+metrics+and+Graphite+ > It would be good to make Solr play more nicely with existing devops > monitoring systems, such as Graphite or Ganglia. Stats monitoring at the > moment is poll-only, either via JMX or through the admin stats page. I'd > like to refactor things a bit to make this more pluggable. > This patch is a start. It adds a new interface, InstrumentedBean, which > extends SolrInfoMBean to return a > [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a > couple of MetricReporters (which basically just duplicate the JMX and admin > page reporting that's there at the moment, but which should be more > extensible). The patch includes a change to RequestHandlerBase showing how > this could work. The idea would be to eventually replace the getStatistics() > call on SolrInfoMBean with this instead. > The next step would be to allow more MetricReporters to be defined in > solrconfig.xml. The Metrics library comes with ganglia and graphite > reporting modules, and we can add contrib plugins for both of those. > There's some more general cleanup that could be done around SolrInfoMBean > (we've got two plugin handlers at /mbeans and /plugins that basically do the > same thing, and the beans themselves have some weirdly inconsistent data on > them - getVersion() returns different things for different impls, and > getSource() seems pretty useless), but maybe that's for another issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9629) Use computed mask values in ForUtil
Feng Guo created LUCENE-9629: Summary: Use computed mask values in ForUtil Key: LUCENE-9629 URL: https://issues.apache.org/jira/browse/LUCENE-9629 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Feng Guo In the class ForkUtil, mask values have been computed and stored in static final vailables, but they are recomputed for every encoding, which may be unnecessary. anther small fix is that `remainingBitsPerValue > remainingBitsPerLong` to 'remainingBitsPerValue >= remainingBitsPerLong', otherwise ``` if (remainingBitsPerValue == 0) { idx++; remainingBitsPerValue = bitsPerValue; } ``` these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-9629: - Description: In the class ForkUtil, mask values have been computed and stored in static final vailables, but they are recomputed for every encoding, which may be unnecessary. anther small fix is that change {code:java} remainingBitsPerValue > remainingBitsPerLong{code} to {code:java} remainingBitsPerValue >= remainingBitsPerLong{code} otherwise {code:java} if (remainingBitsPerValue == 0) { idx++; remainingBitsPerValue = bitsPerValue; } {code} these code will never be used. was: In the class ForkUtil, mask values have been computed and stored in static final vailables, but they are recomputed for every encoding, which may be unnecessary. anther small fix is that `remainingBitsPerValue > remainingBitsPerLong` to 'remainingBitsPerValue >= remainingBitsPerLong', otherwise ``` if (remainingBitsPerValue == 0) { idx++; remainingBitsPerValue = bitsPerValue; } ``` these code will never be used. > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gf2121 opened a new pull request #2113: LUCENE-9629: use computed masks
gf2121 opened a new pull request #2113: URL: https://github.com/apache/lucene-solr/pull/2113 # Description In the class ForUtil, mask values have been computed and stored in static final vailables, but they are recomputed when encoding, may be we can avoid this~ # Solution use the computed mask values This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties
[ https://issues.apache.org/jira/browse/SOLR-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242285#comment-17242285 ] Andrzej Bialecki commented on SOLR-14182: - I'd like to start working on this. As I see it this issue needs to address the following: * mark {{solr.xml:/solr/metrics}} as deprecated and remove in 9.1. * general metrics configuration (such as enable/disable, metric suppliers options) should move to {{/clusterprops.json:/metrics}} * metric reporters configuration should be moved to container-level plugins, ie. {{/clusterprops.json:/plugin}} and the corresponding API. This will make the reporters easier to configure and change dynamically without restarting Solr nodes. * precedence: {{MetricsConfig}} will be initialized from {{solr.xml}} as before. Then, if any clusterprops configuration is present then it will REPLACE the one from {{solr.xml}} - I don't want to attempt any fusion of these two, and I think it's easier to migrate if you don't merge these configs. This approach means that defining anything using the new locations will automatically turn off the old {{solr.xml}} config. > Move metric reporters config from solr.xml to ZK cluster properties > --- > > Key: SOLR-14182 > URL: https://issues.apache.org/jira/browse/SOLR-14182 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.4 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Metric reporters are currently configured statically in solr.xml, which makes > it difficult to change dynamically or in a containerized environment. > We should move this section to ZK /cluster.properties and add a back-compat > migration shim. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures
[ https://issues.apache.org/jira/browse/SOLR-14992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242331#comment-17242331 ] Erick Erickson commented on SOLR-14992: --- [~mdrob] No magic, I just use Mark Miller's "beasting" script. This fails about 20% of the time for me (MBP). I really hate these, you change code, beast for a while and are never completely sure you've found the problem... If you have a possibility for a fix, I'd be glad to beast it, even if it's just a wild stab... Next time it's nasty out side I'll be taking a closer look at it. > TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures > -- > > Key: SOLR-14992 > URL: https://issues.apache.org/jira/browse/SOLR-14992 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomas Eduardo Fernandez Lobbe >Priority: Minor > > I've noticed this test started failing very frequently with an error like: > {noformat} > Error Message: > Error from server at http://127.0.0.1:39037/solr: Cannot create collection > pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of > maxShardsPerNode is 1, and the number of nodes currently live or live and > part of your createNodeSet is 3. This allows a maximum of 3 to be created. > Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is > 0 and value of pullReplicas is 1. This requires 4 shards to be created > (higher than the allowed number) > Stack Trace: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:39037/solr: Cannot create collection > pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of > maxShardsPerNode is 1, and the number of nodes currently live or live and > part of your createNodeSet is 3. This allows a maximum of 3 to be created. > Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is > 0 and value of pullReplicas is 1. This requires 4 shards to be created > (higher than the allowed number) > at > __randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) > at > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) > at > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) > at > org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure
[jira] [Created] (SOLR-15023) Timeout Issue with Solr Metrics API
Dinesh Kumar created SOLR-15023: --- Summary: Timeout Issue with Solr Metrics API Key: SOLR-15023 URL: https://issues.apache.org/jira/browse/SOLR-15023 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Admin UI, metrics Affects Versions: 8.2 Reporter: Dinesh Kumar Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detail Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking 20K ms but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" I could see the AdminHandlerProxy timeout warning in solr logs. Attached the error msg from Solr logs: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API
[ https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Kumar updated SOLR-15023: Description: Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detailed Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking 20K ms but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" I could see the AdminHandlerProxy timeout warning in solr logs. Attached the error msg from Solr logs: was: Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detail Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking 20K ms but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" I could see the AdminHandlerProxy timeout warning in solr logs. Attached the error msg from Solr logs: > Timeout Issue with Solr Metrics API > --- > > Key: SOLR-15023 > URL: https://issues.apache.org/jira/browse/SOLR-15023 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI, metrics >Affects Versions: 8.2 >Reporter: Dinesh Kumar >Priority: Major > > Hi Team, > We are facing "connection lost error" in Solr admin page. While debugging, we > found an issue with admin/metrics API. > *Detailed Analysis:* > We have around 200+ collections on a Solr cloud cluster which is having 4 > Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit > "cloud" page on the Solr admin UI it results in "Connection to Solr lost" > error in few seconds. > When we tried to debug the same, we found Solr admin/metrics API that is > called internally is taking *20K ms* and leads to time out which results in > Connection to Solr lost. On the other hand, I tried to hit the same query > separately on a browser which still taking 20K ms but I get a proper response. > When the admin/metrics API call is happening from the admin console, the > first call is taking a too long time and finally fails to load the response. > As a result, concurrent multiple calls were made for the same API and throws > "Connection to Solr lost" > I could see the AdminHandlerProxy timeout warning in solr logs. Attached the > error msg from Solr logs: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API
[ https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Kumar updated SOLR-15023: Description: Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detailed Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking *20K ms* but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the error msg from Solr logs: was: Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detailed Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking 20K ms but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" I could see the AdminHandlerProxy timeout warning in solr logs. Attached the error msg from Solr logs: > Timeout Issue with Solr Metrics API > --- > > Key: SOLR-15023 > URL: https://issues.apache.org/jira/browse/SOLR-15023 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI, metrics >Affects Versions: 8.2 >Reporter: Dinesh Kumar >Priority: Major > > Hi Team, > We are facing "connection lost error" in Solr admin page. While debugging, we > found an issue with admin/metrics API. > *Detailed Analysis:* > We have around 200+ collections on a Solr cloud cluster which is having 4 > Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit > "cloud" page on the Solr admin UI it results in "Connection to Solr lost" > error in few seconds. > When we tried to debug the same, we found Solr admin/metrics API that is > called internally is taking *20K ms* and leads to time out which results in > Connection to Solr lost. On the other hand, I tried to hit the same query > separately on a browser which still taking *20K ms* but I get a proper > response. > When the admin/metrics API call is happening from the admin console, the > first call is taking a too long time and finally fails to load the response. > As a result, concurrent multiple calls were made for the same API and throws > "Connection to Solr lost" > I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the > error msg from Solr logs: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242358#comment-17242358 ] Adrien Grand commented on LUCENE-9629: -- Thanks for catching this unused code block. I'm unsure whether we should move forward with the other part of the change that makes sure we precompute all masks. Have you been able to measure a speedup with your change? It brings some more lines of code for the write path, which is less performance-sensitive than the read path so we usually care less about optimizing it. This is e.g. why the read path specializes code for every number of bits per value while the write path doesn't. > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API
[ https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Kumar updated SOLR-15023: Attachment: Error.pdf > Timeout Issue with Solr Metrics API > --- > > Key: SOLR-15023 > URL: https://issues.apache.org/jira/browse/SOLR-15023 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI, metrics >Affects Versions: 8.2 >Reporter: Dinesh Kumar >Priority: Major > Attachments: Error.pdf > > > Hi Team, > We are facing "connection lost error" in Solr admin page. While debugging, we > found an issue with admin/metrics API. > *Detailed Analysis:* > We have around 200+ collections on a Solr cloud cluster which is having 4 > Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit > "cloud" page on the Solr admin UI it results in "Connection to Solr lost" > error in few seconds. > When we tried to debug the same, we found Solr admin/metrics API that is > called internally is taking *20K ms* and leads to time out which results in > Connection to Solr lost. On the other hand, I tried to hit the same query > separately on a browser which still taking *20K ms* but I get a proper > response. > When the admin/metrics API call is happening from the admin console, the > first call is taking a too long time and finally fails to load the response. > As a result, concurrent multiple calls were made for the same API and throws > "Connection to Solr lost" > I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the > error msg from Solr logs: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API
[ https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Kumar updated SOLR-15023: Description: Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detail Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking *20K ms* but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" We tried a few ways to disable this API call or to increase the timeout, but nothing works. I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the error msg from Solr logs: was: Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detailed Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking *20K ms* but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the error msg from Solr logs: > Timeout Issue with Solr Metrics API > --- > > Key: SOLR-15023 > URL: https://issues.apache.org/jira/browse/SOLR-15023 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI, metrics >Affects Versions: 8.2 >Reporter: Dinesh Kumar >Priority: Major > Attachments: Error.pdf > > > Hi Team, > We are facing "connection lost error" in Solr admin page. While debugging, we > found an issue with admin/metrics API. > *Detail Analysis:* > We have around 200+ collections on a Solr cloud cluster which is having 4 > Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit > "cloud" page on the Solr admin UI it results in "Connection to Solr lost" > error in few seconds. > When we tried to debug the same, we found Solr admin/metrics API that is > called internally is taking *20K ms* and leads to time out which results in > Connection to Solr lost. On the other hand, I tried to hit the same query > separately on a browser which still taking *20K ms* but I get a proper > response. > When the admin/metrics API call is happening from the admin console, the > first call is taking a too long time and finally fails to load the response. > As a result, concurrent multiple calls were made for the same API and throws > "Connection to Solr lost" > We tried a few ways to disable this API call or to increase the timeout, but > nothing works. > I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the > error msg from Solr logs: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API
[ https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Kumar updated SOLR-15023: Description: Hi Team, We are facing a "Connection to Solr lost" error on the Solr admin page. While debugging, we found an issue with admin/metrics API. *Detail Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the "cloud" page on the Solr admin UI it results in a "Connection to Solr lost" error within few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost error. On the other hand, we tried to hit the same query separately on a browser which still takes *20K ms* but we get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a long time and finally fails to load the response. As a result, *concurrent multiple calls were made for the same API* and throws "Connection to Solr lost" We tried a few ways to disable this API call and to increase the timeout, but nothing works. We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the error msg from Solr logs. was: Hi Team, We are facing "connection lost error" in Solr admin page. While debugging, we found an issue with admin/metrics API. *Detail Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever I try to hit "cloud" page on the Solr admin UI it results in "Connection to Solr lost" error in few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost. On the other hand, I tried to hit the same query separately on a browser which still taking *20K ms* but I get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a too long time and finally fails to load the response. As a result, concurrent multiple calls were made for the same API and throws "Connection to Solr lost" We tried a few ways to disable this API call or to increase the timeout, but nothing works. I could see the AdminHandlerProxy timeout warning in Solr logs. Attached the error msg from Solr logs: > Timeout Issue with Solr Metrics API > --- > > Key: SOLR-15023 > URL: https://issues.apache.org/jira/browse/SOLR-15023 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI, metrics >Affects Versions: 8.2 >Reporter: Dinesh Kumar >Priority: Major > Attachments: Error.pdf > > > Hi Team, > We are facing a "Connection to Solr lost" error on the Solr admin page. While > debugging, we found an issue with admin/metrics API. > *Detail Analysis:* > We have around 200+ collections on a Solr cloud cluster which is having 4 > Solr nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the > "cloud" page on the Solr admin UI it results in a "Connection to Solr lost" > error within few seconds. > When we tried to debug the same, we found Solr admin/metrics API that is > called internally is taking *20K ms* and leads to time out which results in > Connection to Solr lost error. On the other hand, we tried to hit the same > query separately on a browser which still takes *20K ms* but we get a proper > response. > When the admin/metrics API call is happening from the admin console, the > first call is taking a long time and finally fails to load the response. As a > result, *concurrent multiple calls were made for the same API* and throws > "Connection to Solr lost" > We tried a few ways to disable this API call and to increase the timeout, but > nothing works. > We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the > error msg from Solr logs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15023) Timeout Issue with Solr Metrics API
[ https://issues.apache.org/jira/browse/SOLR-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Kumar updated SOLR-15023: Description: Hi Team, We are facing a "Connection to Solr lost" error on the Solr admin page. While debugging, we found an issue with the admin/metrics API. *Detail Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the "cloud" page on the Solr admin UI it results in a "Connection to Solr lost" error within few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost error. On the other hand, we tried to hit the same query separately on a browser which still takes *20K ms* but we get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a long time and finally fails to load the response. As a result, *concurrent multiple calls were made for the same API* and throws "Connection to Solr lost" We tried a few ways to disable this API call and to increase the timeout, but nothing works. We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the error msg from Solr logs. was: Hi Team, We are facing a "Connection to Solr lost" error on the Solr admin page. While debugging, we found an issue with admin/metrics API. *Detail Analysis:* We have around 200+ collections on a Solr cloud cluster which is having 4 Solr nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the "cloud" page on the Solr admin UI it results in a "Connection to Solr lost" error within few seconds. When we tried to debug the same, we found Solr admin/metrics API that is called internally is taking *20K ms* and leads to time out which results in Connection to Solr lost error. On the other hand, we tried to hit the same query separately on a browser which still takes *20K ms* but we get a proper response. When the admin/metrics API call is happening from the admin console, the first call is taking a long time and finally fails to load the response. As a result, *concurrent multiple calls were made for the same API* and throws "Connection to Solr lost" We tried a few ways to disable this API call and to increase the timeout, but nothing works. We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the error msg from Solr logs. > Timeout Issue with Solr Metrics API > --- > > Key: SOLR-15023 > URL: https://issues.apache.org/jira/browse/SOLR-15023 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI, metrics >Affects Versions: 8.2 >Reporter: Dinesh Kumar >Priority: Major > Attachments: Error.pdf > > > Hi Team, > We are facing a "Connection to Solr lost" error on the Solr admin page. While > debugging, we found an issue with the admin/metrics API. > *Detail Analysis:* > We have around 200+ collections on a Solr cloud cluster which is having 4 > Solr nodes among which 3 are zookeeper+Solr nodes. Whenever we try to hit the > "cloud" page on the Solr admin UI it results in a "Connection to Solr lost" > error within few seconds. > When we tried to debug the same, we found Solr admin/metrics API that is > called internally is taking *20K ms* and leads to time out which results in > Connection to Solr lost error. On the other hand, we tried to hit the same > query separately on a browser which still takes *20K ms* but we get a proper > response. > When the admin/metrics API call is happening from the admin console, the > first call is taking a long time and finally fails to load the response. As a > result, *concurrent multiple calls were made for the same API* and throws > "Connection to Solr lost" > We tried a few ways to disable this API call and to increase the timeout, but > nothing works. > We could see the AdminHandlerProxy timeout warning in Solr logs. Attached the > error msg from Solr logs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-9812) Implement a /admin/metrics API
[ https://issues.apache.org/jira/browse/SOLR-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242363#comment-17242363 ] Dinesh Kumar commented on SOLR-9812: Team, We are facing issues with the Solr admin/metrics API. Can you please help with this issue : SOLR-15023 - https://issues.apache.org/jira/browse/SOLR-15023 > Implement a /admin/metrics API > -- > > Key: SOLR-9812 > URL: https://issues.apache.org/jira/browse/SOLR-9812 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: 6.4, 7.0 > > Attachments: SOLR-9812.patch, SOLR-9812.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > We added a bare bones metrics API in SOLR-9788 but due to limitations with > the metrics servlet supplied by the metrics library, it can show statistics > from only one metric registry. SOLR-4735 has added a hierarchy of metric > registries and the /admin/metrics API should support showing all of them as > well as be able to filter metrics from a given registry name. > In this issue we will implement the improved /admin/metrics API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9812) Implement a /admin/metrics API
[ https://issues.apache.org/jira/browse/SOLR-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242363#comment-17242363 ] Dinesh Kumar edited comment on SOLR-9812 at 12/2/20, 1:33 PM: -- Team, We are facing issues with the Solr admin/metrics API. Can you please help with this issue : SOLR-15023 was (Author: dineshkumark): Team, We are facing issues with the Solr admin/metrics API. Can you please help with this issue : SOLR-15023 - https://issues.apache.org/jira/browse/SOLR-15023 > Implement a /admin/metrics API > -- > > Key: SOLR-9812 > URL: https://issues.apache.org/jira/browse/SOLR-9812 > Project: Solr > Issue Type: Improvement > Components: metrics >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: 6.4, 7.0 > > Attachments: SOLR-9812.patch, SOLR-9812.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > We added a bare bones metrics API in SOLR-9788 but due to limitations with > the metrics servlet supplied by the metrics library, it can show statistics > from only one metric registry. SOLR-4735 has added a hierarchy of metric > registries and the /admin/metrics API should support showing all of them as > well as be able to filter metrics from a given registry name. > In this issue we will implement the improved /admin/metrics API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on pull request #2094: LUCENE-9047: Move the Directory APIs to be little endian
iverase commented on pull request #2094: URL: https://github.com/apache/lucene-solr/pull/2094#issuecomment-737236259 Thanks @dweiss! I think your approach is potentially more efficient but harder to make it to a state where you have everything working. I am currently taking a different approach by increasing the version number on the codec files. Therefore the writers should be mostly untouched and only the readers should wrap the IndexInput when the version is lower that the current one. In most of the cases the real change on the codec is a one-liner. Unfortunately I need to do some refactor and therefore the patch is bigger. I opened an issue to do the refactor on the side as I think it is valuable even if this PR does not succeed. The only issue left is the PackedInts algorithms as I think the need to be adapted. I have done that already for the DirectWriter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
murblanc commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534205062 ## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ## @@ -896,6 +900,9 @@ public void load() { containerHandlers.getApiBag().registerObject(containerPluginsApi.readAPI); containerHandlers.getApiBag().registerObject(containerPluginsApi.editAPI); + // get the placement plugin Review comment: I'd rather comment "get the placement plugin **factory**" And possibly specify what's the plugin factory lifecycle wrt configuration? Now that everything is a bit more implicit than it previously was, I don't get (yet...) when exactly the configuration is passed. I believe comments related to this would be useful. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
murblanc commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534209824 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/PlacementPluginFactory.java ## @@ -18,14 +18,22 @@ package org.apache.solr.cluster.placement; /** - * Factory implemented by client code and configured in {@code solr.xml} allowing the creation of instances of + * Factory implemented by client code and configured in container plugins allowing the creation of instances of * {@link PlacementPlugin} to be used for replica placement computation. + * Note: configurable factory implementations should also implement + * {@link org.apache.solr.api.ConfigurablePlugin} with the appropriate configuration + * bean type. */ public interface PlacementPluginFactory { Review comment: Shouldn't this interface extend `ConfigurablePlugin` so that concrete config classes such as `AffinityPlacementFactory` do not have to implement multiple interfaces? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
murblanc commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534210911 ## File path: solr/core/src/java/org/apache/solr/cluster/placement/PlacementPluginFactory.java ## @@ -18,14 +18,22 @@ package org.apache.solr.cluster.placement; /** - * Factory implemented by client code and configured in {@code solr.xml} allowing the creation of instances of + * Factory implemented by client code and configured in container plugins allowing the creation of instances of * {@link PlacementPlugin} to be used for replica placement computation. + * Note: configurable factory implementations should also implement + * {@link org.apache.solr.api.ConfigurablePlugin} with the appropriate configuration + * bean type. */ public interface PlacementPluginFactory { Review comment: That would force every placement plugin to be configurable though I guess... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
murblanc commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534219720 ## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ## @@ -257,6 +260,7 @@ public CoreLoadFailure(CoreDescriptor cd, Exception loadFailure) { // initially these are the same to collect the plugin-based listeners during init private ClusterEventProducer clusterEventProducer; + private PlacementPluginFactory placementPluginFactory; Review comment: I believe we have a synchronization issue here on access to that variable. It is not `final` nor `volatile`, access is not synchronized but it is accessed from multiple threads (the command execution Overseer threads calling `getPlacementPluginFactory()`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
sigram commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534234770 ## File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java ## @@ -257,6 +260,7 @@ public CoreLoadFailure(CoreDescriptor cd, Exception loadFailure) { // initially these are the same to collect the plugin-based listeners during init private ClusterEventProducer clusterEventProducer; + private PlacementPluginFactory placementPluginFactory; Review comment: Since we always use `DelegatingPlacementPluginFactory` I'm going to create a final instance of this wrapper - and then it will be initialized once the plugins registry is ready. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418 ] Feng Guo commented on LUCENE-9629: -- Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change:) > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418 ] Feng Guo edited comment on LUCENE-9629 at 12/2/20, 3:05 PM: Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change:) was (Author: gf2121): Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change:) > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
murblanc commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534255115 ## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/AddReplicaCmd.java ## @@ -144,7 +140,8 @@ public void call(ClusterState state, ZkNodeProps message, @SuppressWarnings({"ra } } -List createReplicas = buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, replicaTypesVsCount) +List createReplicas = buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, replicaTypesVsCount, + ocmh.overseer.getCoreContainer().getPlacementPluginFactory().createPluginInstance()) Review comment: Unclear to me: what happens here when placement plugins are not configured and we use for example the legacy assign strategy? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
murblanc commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534259966 ## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/AddReplicaCmd.java ## @@ -144,7 +140,8 @@ public void call(ClusterState state, ZkNodeProps message, @SuppressWarnings({"ra } } -List createReplicas = buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, replicaTypesVsCount) +List createReplicas = buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, replicaTypesVsCount, + ocmh.overseer.getCoreContainer().getPlacementPluginFactory().createPluginInstance()) Review comment: Ok, I see that's how we tell which one is configured by this value being `null`... Can't say I really like it, but at least it should be commented (here and in `PlacementPluginFactory` as well?). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418 ] Feng Guo edited comment on LUCENE-9629 at 12/2/20, 3:34 PM: Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try to do my best to make it better:) was (Author: gf2121): Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change:) > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] sigram commented on a change in pull request #2101: SOLR-15016 Replica placement plugins should use container plugins API / configs
sigram commented on a change in pull request #2101: URL: https://github.com/apache/lucene-solr/pull/2101#discussion_r534293404 ## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/AddReplicaCmd.java ## @@ -144,7 +140,8 @@ public void call(ClusterState state, ZkNodeProps message, @SuppressWarnings({"ra } } -List createReplicas = buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, replicaTypesVsCount) +List createReplicas = buildReplicaPositions(ocmh.cloudManager, clusterState, collectionName, message, replicaTypesVsCount, + ocmh.overseer.getCoreContainer().getPlacementPluginFactory().createPluginInstance()) Review comment: The plugin is null and `Assign.createAssignStrategy` provides a `LegacyAssignStrategy`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2114: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead ~ Backport to 8x
thelabdude opened a new pull request #2114: URL: https://github.com/apache/lucene-solr/pull/2114 # Description This is the backport to 8x from master, original PR was #2010 See JIRA for description of the issue: https://issues.apache.org/jira/browse/SOLR-12182 # Solution This PR computes the `base_url` for a Replica using the stored `node_name` and a global `urlScheme` rather than storing the `base_url` in `state.json`. This avoids storing an incorrect URL scheme for replicas in persistent storage. The `base_url` is computed when read back from ZK and dropped when marshaling the Replica state to JSON. This also means we don't need a migration tool as stored state is "healed" on-the-fly when read back from ZK. The unfortunate aspect of this PR is we need to keep the URL scheme for the cluster in a global variable (so that it is available when reading from ZK). The global `urlScheme` still comes from the cluster property but is then stored in a global singleton, see: `org.apache.solr.common.cloud.UrlScheme`. Alternatively, we could just keep the `urlScheme` in a static in ZkStateReader, I felt the global singleton `UrlScheme.INSTANCE` made it clearer that this was a global thing but it also made more sense with my first implementation that tried to make rolling restart upgrades to TLS less chaotic. It's a trivial change to move all this over to ZkStateReader and remove UrlScheme. I initially tried setting a `ThreadLocal` that gives access to the `urlScheme` whenever we need to read these props from ZK. However, that ended up being problematic because we tend to read ZkNodeProps from ZK in many places. In reality, the `urlScheme` really is an immutable global variable that should be set once during initialization by reading from the cluster property stored in ZK. So I felt trying to treat this global as something that was highly dynamic made the code overly cumbersome. Put simply, we shouldn't support `urlScheme` changing in a live node after initialization, it's bad for business. I also tried to get rid of the `urlScheme` cluster property (re: https://issues.apache.org/jira/browse/SOLR-10202) but I'm not sure how SolrCloud client applications can resolve the correct `urlScheme` for the cluster without this property? On the server-side, sure we can just get the `urlScheme` from a Java System Property, but that won't be set for remote client applications that initialize via a connection to ZooKeeper. So I'm keeping the cluster property `urlScheme` for now. We also need to consider how to enable TLS on an existing cluster (with active collections) using a rolling restart process. The current `org.apache.solr.cloud.SSLMigrationTest` just stopped all test nodes at once and then brought them back with TLS enabled. Based on feedback, I've since removed the option to pull the active urlScheme from live nodes as we're not able to ensure zero-downtime when moving from `http` -> `https` for clusters with existing collections and live traffic. Put simply, the feature was a bit trappy in that it tried to reduce chaos when doing a rolling restart to enable TLS, but it made no guarantees. Thus, users just need to be sure to enable TLS before building production clusters! Lastly, I've tried to clean-up some of the places that access the baseUrl on replicas to be more consistent, so you'll see some of that in this PR as well. # Tests Many existing tests cover regression caused by these code changes. Added simple unit test for UrlScheme. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [x] I have run `./gradlew check`. - [x] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.o
[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers
[ https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242546#comment-17242546 ] ASF subversion and git services commented on SOLR-14934: Commit 2e6a02394ec4eea6ba72d5bc2bf02c0139a54f39 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2e6a023 ] SOLR-14934: Refactored duplicate "Solr Home" logic into a single place to eliminate risk of tests using divergent values for a single solr node. > Multiple Code Paths for determining "solr home" can return differnet answers > > > Key: SOLR-14934 > URL: https://issues.apache.org/jira/browse/SOLR-14934 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Minor > Attachments: SOLR-14934.poc.patch > > > While looking into some possible ways to make our tests more closely match > "real" solr installs, I realized that we currently have 2 different methods > for determining the "solr home" for a node... > * {{SolrPaths.locateSolrHome()}} > ** static method that uses a hueristic that typically results in using > {{System.getProperty("solr.solr.home");}} > *** NOTE: the result is not stored in any static/final variables > ** this method > * {{SolrDispatchFilter}} > ** starts by checking if an explicit {{ServletContext}} attribute is > specified > *** falls back to using {{SolrPaths.locateSolrHome()}} > ** whatever value is found gets set on {{CoreContainer}} > In a typical Solr install, the {{"solr.solr.home"}} system property is set by > {{bin/solr}} and we get a consistent value for the life of the server > instance regardless of code path. > In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that > calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests > (including {{MiniSolrCloudCluster}} based tests) we rely on the > {{ServletContext}} attribute based approach to have a unique "Solr Home" for > each node. ({{JettySOlrRunner}} injects the value when wiring up the > {{Server}} instance) > This means that: > * in jetty based test - even if it's a single jetty instance - each of the > node's CoreContainer has a unique value of "solr home", but any code paths in > solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent > value across all nodes (different from the value in the CoreContainer for any > node) > * allthough i don't think it happens now: a test could call > {{System.setProperty("solr.solr.home",...)}} while a node is running, and > potentially get inconsistent behavior from even a jetty node over time. > > In practice, I don't think that any of this is currently causing "real bugs" > in actual solr code; nor do i _think_ we're seeing any "false positives" or > "false failures" in tests as a result of this - but it is a big huge land > mine just waiting to go off if we step too close, and i think we should > recitfy this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude commented on pull request #2114: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead ~ Backport to 8x
thelabdude commented on pull request #2114: URL: https://github.com/apache/lucene-solr/pull/2114#issuecomment-737397592 precommit and solr tests pass locally so will merge and watch for CI failures This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242579#comment-17242579 ] ASF subversion and git services commented on SOLR-12182: Commit 6af56e141a52f4e616985ca5b03dda3677889bfa in lucene-solr's branch refs/heads/branch_8x from Timothy Potter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6af56e1 ] SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead ~ Backport to 8x (#2114) > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Assignee: Timothy Potter >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9628) Make sure to account for ScoreMode.TOP_DOCS in queries
[ https://issues.apache.org/jira/browse/LUCENE-9628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242585#comment-17242585 ] Adrien Grand commented on LUCENE-9628: -- bq. In BooleanWeight#bulkScorer, we check if score mode is TOP_SCORES and if so, force non-bulk scoring. Should we expand this to include modes like TOP_DOCS? I think so. DefaultBulkScorer seems to be the only bulk scorer which knows how to deal with {{LeafCollector#competitiveIterator}}, so we seem to be disabling the numeric sort optimization with boolean queries today? Let's switch to ScoreMode#isExhaustive? What do you think [~mayyas]? bq. In ConstantScoreQuery, we create the delegate weight with a hardcoded COMPLETE_NO_SCORES. I'm not sure it actually causes problems, but it seems like this doesn't handle TOP_DOCS correctly. I suspect that this could be a problem if the wrapped query uses the ScoreMode as an indication of whether it will need to handle {{LeafCollector#competitiveIterator}} or not, which seems to be something we'd like to do for boolean queries since BS1 (BooleanScorer) only really makes sense if we know we're going to collect all matches. I think it'd be helpful if we improved ScoreMode javadocs to be more explicit regarding the expectations we have on scorers. TOP_SCORES mentions the relationship with {{Scorer#setMinCompetitiveScore}}, we should add something similar to TOP_DOCS and TOP_DOCS_WITH_SCORES regarding bulk scorers and {{LeafCollector#competitiveIterator}}? > Make sure to account for ScoreMode.TOP_DOCS in queries > -- > > Key: LUCENE-9628 > URL: https://issues.apache.org/jira/browse/LUCENE-9628 > Project: Lucene - Core > Issue Type: Test > Components: core/search >Reporter: Julie Tibshirani >Priority: Minor > > I noticed a few places where we are directly check the {{ScoreMode}} type > that should perhaps be generalized. These could affect whether numeric sort > optimization is applied: > * In {{BooleanWeight#bulkScorer}}, we check if score mode is {{TOP_SCORES}} > and if so, force non-bulk scoring. Should we expand this to include modes > like {{TOP_DOCS}}? > * In {{ConstantScoreQuery}}, we create the delegate weight with a hardcoded > {{COMPLETE_NO_SCORES}}. I'm not sure it actually causes problems, but it > seems like this doesn't handle {{TOP_DOCS}} correctly. > Apologies this issue isn’t more precise – I am not up-to-speed on the numeric > sort optimization but wanted to raise these in case they’re helpful. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15024) Admin UI doesnt' show CharFilters correctly
Erick Erickson created SOLR-15024: - Summary: Admin UI doesnt' show CharFilters correctly Key: SOLR-15024 URL: https://issues.apache.org/jira/browse/SOLR-15024 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Admin UI Affects Versions: master (9.0) Reporter: Erick Erickson Attachments: Screen Shot 2020-12-02 at 1.19.23 PM.png, Screen Shot 2020-12-02 at 1.19.49 PM.png Brought up on the user's list, I verified it on trunk. The Admin UI isn't showing the data correctly for either the schema page or the analysis page: Here's the fieldType definition: {code:java} {code} The transformations are correct, it's just that the display is messed up. See attached. On the analysis page, nothing is shown for the CharFilters. For the schema page, only the _last_ CharFilter is shown -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated SOLR-12182: -- Fix Version/s: 8.8 > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Assignee: Timothy Potter >Priority: Major > Fix For: 8.8, master (9.0) > > Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter resolved SOLR-12182. --- Resolution: Fixed As of 8.8, we opt'd to not store the `base_url` in persisted state in ZK. > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Assignee: Timothy Potter >Priority: Major > Fix For: 8.8, master (9.0) > > Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-10202) Auto resolve urlScheme, remove cluster property
[ https://issues.apache.org/jira/browse/SOLR-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter reassigned SOLR-10202: - Assignee: (was: Timothy Potter) > Auto resolve urlScheme, remove cluster property > --- > > Key: SOLR-10202 > URL: https://issues.apache.org/jira/browse/SOLR-10202 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Jan Høydahl >Priority: Major > > Spinoff from SOLR-9640. > Today we need to explicitly set {{urlScheme}} cluster property to enable SSL, > at the same time as we need to set all the SSL env variables on each node. As > discussed in SOLR-9640, we could be smarter about this so an admin only need > to setup {{solr.in.sh}} with keystore to enable SSL. > h3. How > Perhaps simplified a bit, but in principle, at node start, if > {{solr.jetty.keystore}} (one out of several possiilities) is defined then use > https, else http :-) Then, if the administrator has mixed it up and failed to > configure {{solr.jetty.keystore}} on one of the nodes, then that node will > not be able to communicate with the others over {{http}}, it will get {{curl: > (52) Empty reply from server}}. Opposite, an SSL enabled node trying to talk > to a Solr node that is not SSL enabled over {{https}}, will get {{curl: (35) > Unknown SSL protocol error in connection to localhost:-9847}} (not the curl > error of course, but similar). > I don't think the nodes need to tell ZK about SSL at all? > So my claim is that this will not give bigger risk of misconfiguration, cause > if you add a new node to the cluster without SSL, it will generate a lot of > BUZZ in the logs and it will never receive any unencrypted data from the > other nodes since connections will fail. Agree? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-10202) Auto resolve urlScheme, remove cluster property
[ https://issues.apache.org/jira/browse/SOLR-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242596#comment-17242596 ] Timothy Potter commented on SOLR-10202: --- Un-assigned myself on this one as I originally wanted to tackle it as part of SOLR-12182, but that ended up not working out. Personally I'm fine with the first server to come up in https mode to set the cluster property based on a system property but it seems like that isn't the consensus on how this global should be treated. > Auto resolve urlScheme, remove cluster property > --- > > Key: SOLR-10202 > URL: https://issues.apache.org/jira/browse/SOLR-10202 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Jan Høydahl >Priority: Major > > Spinoff from SOLR-9640. > Today we need to explicitly set {{urlScheme}} cluster property to enable SSL, > at the same time as we need to set all the SSL env variables on each node. As > discussed in SOLR-9640, we could be smarter about this so an admin only need > to setup {{solr.in.sh}} with keystore to enable SSL. > h3. How > Perhaps simplified a bit, but in principle, at node start, if > {{solr.jetty.keystore}} (one out of several possiilities) is defined then use > https, else http :-) Then, if the administrator has mixed it up and failed to > configure {{solr.jetty.keystore}} on one of the nodes, then that node will > not be able to communicate with the others over {{http}}, it will get {{curl: > (52) Empty reply from server}}. Opposite, an SSL enabled node trying to talk > to a Solr node that is not SSL enabled over {{https}}, will get {{curl: (35) > Unknown SSL protocol error in connection to localhost:-9847}} (not the curl > error of course, but similar). > I don't think the nodes need to tell ZK about SSL at all? > So my claim is that this will not give bigger risk of misconfiguration, cause > if you add a new node to the cluster without SSL, it will generate a lot of > BUZZ in the logs and it will never receive any unencrypted data from the > other nodes since connections will fail. Agree? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9619) Move Points from a visitor API to a custor-style API?
[ https://issues.apache.org/jira/browse/LUCENE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242609#comment-17242609 ] Adrien Grand commented on LUCENE-9619: -- Thanks for the feedback [~ivera]! I initially wanted to remove the visitor pattern entirely, but this made it challenging to retain some optimizations we have like the one we have to do only one comparison in case multiple documents share the same value: https://github.com/apache/lucene-solr/blob/af47cb7bcdd4eb10263a0586474c6e255307/lucene/core/src/java/org/apache/lucene/index/PointValues.java#L219-L224. As far as implementing a DocIdSetIterator is concerned, my thinking was that this API could be use to fill an int[] buffer only one leaf at a time, so we wouldn't allocate more than an int[512] with the current Points file format. This wouldn't provide skipping capabilities, but at least we wouldn't need to maintain a giant int[] or BitSet. > Move Points from a visitor API to a custor-style API? > - > > Key: LUCENE-9619 > URL: https://issues.apache.org/jira/browse/LUCENE-9619 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Points' visitor API work well but there are a couple things we could make > better if we moved to a cursor API, e.g. > - Term queries could return a DocIdSetIterator without having to materialize > a BitSet. > - Nearest-neighbor search could work on top of the regular API instead of > casting to BKDReader > https://github.com/apache/lucene-solr/blob/6a7131ee246d700c2436a85ddc537575de2aeacf/lucene/sandbox/src/java/org/apache/lucene/sandbox/document/FloatPointNearestNeighbor.java#L296 > - We could optimize counting the number of matches of a query by adding the > number of points in a leaf without visiting documents where there are no > deleted documents and a leaf fully matches the query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15025) MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value
Mike Drob created SOLR-15025: Summary: MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value Key: SOLR-15025 URL: https://issues.apache.org/jira/browse/SOLR-15025 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Tests Reporter: Mike Drob the api could also expand to take a time unit? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14940) ReplicationHandler memory leak through SolrCore.closeHooks
[ https://issues.apache.org/jira/browse/SOLR-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved SOLR-14940. -- Resolution: Fixed re-resolving in favor of tackling it in SOLR-14992 > ReplicationHandler memory leak through SolrCore.closeHooks > -- > > Key: SOLR-14940 > URL: https://issues.apache.org/jira/browse/SOLR-14940 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) > Environment: Solr Cloud Cluster on v.8.6.2 configured as 3 TLOG nodes > with 2 cores in each JVM. > >Reporter: Anver Sotnikov >Assignee: Mike Drob >Priority: Major > Fix For: 8.8, master (9.0) > > Attachments: Actual references to hooks that in turn hold references > to ReplicationHandlers.png, Memory Analyzer SolrCore.closeHooks .png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We are experiencing a memory leak in Solr Cloud cluster configured as 3 TLOG > nodes. > Leader does not seem to be affected while Followers are. > > Looking at memory dump we noticed that SolrCore holds lots of references to > ReplicationHandler through anonymous inner classes in SolrCore.closeHooks, > which in turn holds ReplicationHandlers. > ReplicationHandler registers hooks as anonymous inner classes in > SolrCore.closeHooks through ReplicationHandler.inform() -> > ReplicationHandler.registerCloseHook(). > > Whenever ZkController.stopReplicationFromLeader is called - it would shutdown > ReplicationHandler (ReplicationHandler.shutdown()), BUT reference to > ReplicationHandler will stay in SolrCore.closeHooks. Once replication is > started again on same SolrCore - new ReplicationHandler will be created and > registered in closeHooks. > > It looks like there are few scenarios when replication is stopped and > restarted on same core and in our TLOG setup it shows up quite often. > > Potential solutions: > # Allow unregistering SolrCore.closeHooks so it can be used from > ReplicationHandler.shutdown > # Hack but easier - break the link between ReplicationHandler close hooks > and full ReplicationHandler object so ReplicationHandler can be GCed even > when hooks are still registered in SolrCore.closeHooks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #2115: SOLR-14992 Wait for node down before checking for node up
madrob opened a new pull request #2115: URL: https://github.com/apache/lucene-solr/pull/2115 https://issues.apache.org/jira/browse/SOLR-14992 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures
[ https://issues.apache.org/jira/browse/SOLR-14992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob reassigned SOLR-14992: Assignee: Mike Drob > TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures > -- > > Key: SOLR-14992 > URL: https://issues.apache.org/jira/browse/SOLR-14992 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Mike Drob >Priority: Minor > > I've noticed this test started failing very frequently with an error like: > {noformat} > Error Message: > Error from server at http://127.0.0.1:39037/solr: Cannot create collection > pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of > maxShardsPerNode is 1, and the number of nodes currently live or live and > part of your createNodeSet is 3. This allows a maximum of 3 to be created. > Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is > 0 and value of pullReplicas is 1. This requires 4 shards to be created > (higher than the allowed number) > Stack Trace: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:39037/solr: Cannot create collection > pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of > maxShardsPerNode is 1, and the number of nodes currently live or live and > part of your createNodeSet is 3. This allows a maximum of 3 to be created. > Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is > 0 and value of pullReplicas is 1. This requires 4 shards to be created > (higher than the allowed number) > at > __randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) > at > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) > at > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) > at > org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakC
[jira] [Commented] (SOLR-14992) TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures
[ https://issues.apache.org/jira/browse/SOLR-14992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242626#comment-17242626 ] Mike Drob commented on SOLR-14992: -- Ok, I think I figured this out, the PR I opened is for master but patch should apply to 8x as well. I beasted it and was able to get failures from 5% to 0% on my machine. > TestPullReplicaErrorHandling.testCantConnectToPullReplica Failures > -- > > Key: SOLR-14992 > URL: https://issues.apache.org/jira/browse/SOLR-14992 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Mike Drob >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I've noticed this test started failing very frequently with an error like: > {noformat} > Error Message: > Error from server at http://127.0.0.1:39037/solr: Cannot create collection > pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of > maxShardsPerNode is 1, and the number of nodes currently live or live and > part of your createNodeSet is 3. This allows a maximum of 3 to be created. > Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is > 0 and value of pullReplicas is 1. This requires 4 shards to be created > (higher than the allowed number) > Stack Trace: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:39037/solr: Cannot create collection > pull_replica_error_handling_test_cant_connect_to_pull_replica. Value of > maxShardsPerNode is 1, and the number of nodes currently live or live and > part of your createNodeSet is 3. This allows a maximum of 3 to be created. > Value of numShards is 2, value of nrtReplicas is 1, value of tlogReplicas is > 0 and value of pullReplicas is 1. This requires 4 shards to be created > (higher than the allowed number) > at > __randomizedtesting.SeedInfo.seed([3D670DC4BEABD958:3550EB0C6505ADD6]:0) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) > at > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369) > at > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1173) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:934) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:866) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) > at > org.apache.solr.cloud.TestPullReplicaErrorHandling.testCantConnectToPullReplica(TestPullReplicaErrorHandling.java:149) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at > com.carrotsearch.ra
[jira] [Created] (SOLR-15026) MiniSolrCloudCluster can inconsistently get confused about when it's using SSL
Chris M. Hostetter created SOLR-15026: - Summary: MiniSolrCloudCluster can inconsistently get confused about when it's using SSL Key: SOLR-15026 URL: https://issues.apache.org/jira/browse/SOLR-15026 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter A new test added in SOLR-14934 caused the following reproducible failure to pop up on jenkins... {noformat} hossman@slate:~/lucene/dev [j11] [master] $ ./gradlew -p solr/test-framework/ test --tests MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders -Dtests.seed=806A85748BD81F48 -Dtests.multiplier=2 -Dtests.slow=true -Dtests.locale=ln-CG -Dtests.timezone=Asia/Thimbu -Dtests.asserts=true -Dtests.file.encoding=UTF-8 Starting a Gradle Daemon (subsequent builds will be faster) > Task :randomizationInfo Running tests with randomization seed: tests.seed=806A85748BD81F48 > Task :solr:test-framework:test org.apache.solr.cloud.MiniSolrCloudClusterTest > testSolrHomeAndResourceLoaders FAILED org.apache.solr.client.solrj.SolrServerException: IOException occurred when talking to server at: https://127.0.0.1:38681/solr at __randomizedtesting.SeedInfo.seed([806A85748BD81F48:37548FA7602CB5FD]:0) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:712) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:390) at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:360) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1168) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:931) at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:865) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:246) at org.apache.solr.cloud.MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders(MiniSolrCloudClusterTest.java:125) ... Caused by: javax.net.ssl.SSLException: Unsupported or unrecognized SSL message at java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439) {noformat} The problem sems to be that even though the MiniSolrCloudCluster being instantiated isn't _intentionally_ using any SSL randomization (it just uses {{JettyConfig.builder().build()}} the CloudSolrClient returned by {{cluster.getSolrClient()}} is evidnetly picking up the ranodmized SSL and trying to use it to talk to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers
[ https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242645#comment-17242645 ] ASF subversion and git services commented on SOLR-14934: Commit 8732df8c505eec9109cd8a7bdd553e908447af5f in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8732df8 ] SOLR-14934: test workaround for SOLR-15026 > Multiple Code Paths for determining "solr home" can return differnet answers > > > Key: SOLR-14934 > URL: https://issues.apache.org/jira/browse/SOLR-14934 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Minor > Attachments: SOLR-14934.poc.patch > > > While looking into some possible ways to make our tests more closely match > "real" solr installs, I realized that we currently have 2 different methods > for determining the "solr home" for a node... > * {{SolrPaths.locateSolrHome()}} > ** static method that uses a hueristic that typically results in using > {{System.getProperty("solr.solr.home");}} > *** NOTE: the result is not stored in any static/final variables > ** this method > * {{SolrDispatchFilter}} > ** starts by checking if an explicit {{ServletContext}} attribute is > specified > *** falls back to using {{SolrPaths.locateSolrHome()}} > ** whatever value is found gets set on {{CoreContainer}} > In a typical Solr install, the {{"solr.solr.home"}} system property is set by > {{bin/solr}} and we get a consistent value for the life of the server > instance regardless of code path. > In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that > calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests > (including {{MiniSolrCloudCluster}} based tests) we rely on the > {{ServletContext}} attribute based approach to have a unique "Solr Home" for > each node. ({{JettySOlrRunner}} injects the value when wiring up the > {{Server}} instance) > This means that: > * in jetty based test - even if it's a single jetty instance - each of the > node's CoreContainer has a unique value of "solr home", but any code paths in > solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent > value across all nodes (different from the value in the CoreContainer for any > node) > * allthough i don't think it happens now: a test could call > {{System.setProperty("solr.solr.home",...)}} while a node is running, and > potentially get inconsistent behavior from even a jetty node over time. > > In practice, I don't think that any of this is currently causing "real bugs" > in actual solr code; nor do i _think_ we're seeing any "false positives" or > "false failures" in tests as a result of this - but it is a big huge land > mine just waiting to go off if we step too close, and i think we should > recitfy this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15026) MiniSolrCloudCluster can inconsistently get confused about when it's using SSL
[ https://issues.apache.org/jira/browse/SOLR-15026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242646#comment-17242646 ] ASF subversion and git services commented on SOLR-15026: Commit 8732df8c505eec9109cd8a7bdd553e908447af5f in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8732df8 ] SOLR-14934: test workaround for SOLR-15026 > MiniSolrCloudCluster can inconsistently get confused about when it's using SSL > -- > > Key: SOLR-15026 > URL: https://issues.apache.org/jira/browse/SOLR-15026 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > > A new test added in SOLR-14934 caused the following reproducible failure to > pop up on jenkins... > {noformat} > hossman@slate:~/lucene/dev [j11] [master] $ ./gradlew -p solr/test-framework/ > test --tests MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders > -Dtests.seed=806A85748BD81F48 -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=ln-CG -Dtests.timezone=Asia/Thimbu -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > Starting a Gradle Daemon (subsequent builds will be faster) > > Task :randomizationInfo > Running tests with randomization seed: tests.seed=806A85748BD81F48 > > Task :solr:test-framework:test > org.apache.solr.cloud.MiniSolrCloudClusterTest > > testSolrHomeAndResourceLoaders FAILED > org.apache.solr.client.solrj.SolrServerException: IOException occurred > when talking to server at: https://127.0.0.1:38681/solr > at > __randomizedtesting.SeedInfo.seed([806A85748BD81F48:37548FA7602CB5FD]:0) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:712) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251) > at > org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:390) > at > org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:360) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1168) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:931) > at > org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:865) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:246) > at > org.apache.solr.cloud.MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders(MiniSolrCloudClusterTest.java:125) > ... > Caused by: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439) > {noformat} > The problem sems to be that even though the MiniSolrCloudCluster being > instantiated isn't _intentionally_ using any SSL randomization (it just uses > {{JettyConfig.builder().build()}} the CloudSolrClient returned by > {{cluster.getSolrClient()}} is evidnetly picking up the ranodmized SSL and > trying to use it to talk to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15027) TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField reproducing failure on branch_8x
Chris M. Hostetter created SOLR-15027: - Summary: TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField reproducing failure on branch_8x Key: SOLR-15027 URL: https://issues.apache.org/jira/browse/SOLR-15027 Project: Solr Issue Type: Test Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestInPlaceUpdateWithRouteField -Dtests.method=testUpdatingDocValuesWithRouteField -Dtests.seed=80F75127980BAE95 -Dtests.nightly=true -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=es-VE -Dtests.timezone=Asia/Sakhalin -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.90s | TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField <<< [junit4]> Throwable #1: java.lang.AssertionError: Lucene doc id should not be changed for In-Place Updates. [junit4]> Expected: is <21> [junit4]> but: was <30> [junit4]>at __randomizedtesting.SeedInfo.seed([80F75127980BAE95:77AF61938946C1E4]:0) [junit4]>at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) [junit4]>at org.apache.solr.update.TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField(TestInPlaceUpdateWithRouteField.java:115) [junit4]>at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15025) MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value
[ https://issues.apache.org/jira/browse/SOLR-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob updated SOLR-15025: - Labels: beginner newdev (was: beginner) > MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value > - > > Key: SOLR-15025 > URL: https://issues.apache.org/jira/browse/SOLR-15025 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Priority: Major > Labels: beginner, newdev > > the api could also expand to take a time unit? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2114: SOLR-12182: Don't persist base_url in ZK as the scheme is variable, compute from node_name instead ~ Backport to 8x
thelabdude merged pull request #2114: URL: https://github.com/apache/lucene-solr/pull/2114 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up
tflobbe commented on a change in pull request #2115: URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534445545 ## File path: solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java ## @@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws Exception { JettySolrRunner jetty = getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0)); SolrCore core = jetty.getCoreContainer().getCores().iterator().next(); -for (int i = 0; i < 5; i++) { +for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) { cluster.expireZkSession(jetty); + waitForState("Expecting node to be disconnected", collectionName, activeReplicaCount(1, 0, 0)); Review comment: Wouldn't it be better to actually check for the node znode `ctime`? Or maybe a bump in `/live_nodes`'s `cversion`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15027) TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField reproducing failure on branch_8x
[ https://issues.apache.org/jira/browse/SOLR-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242700#comment-17242700 ] Chris M. Hostetter commented on SOLR-15027: --- I don't really understand why/how but git bisect has identified SOLR-14641 / d52628d9facfc13d8c29a7ecaf646a3b90263f8c as the cause of this failure. [~caomanhdat] / [~mkhl] - any ideas what's going on here? > TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField > reproducing failure on branch_8x > > > Key: SOLR-15027 > URL: https://issues.apache.org/jira/browse/SOLR-15027 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > > {noformat} >[junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestInPlaceUpdateWithRouteField > -Dtests.method=testUpdatingDocValuesWithRouteField > -Dtests.seed=80F75127980BAE95 -Dtests.nightly=true -Dtests.slow=true > -Dtests.badapples=true -Dtests.locale=es-VE -Dtests.timezone=Asia/Sakhalin > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 >[junit4] FAILURE 1.90s | > TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField <<< >[junit4]> Throwable #1: java.lang.AssertionError: Lucene doc id should > not be changed for In-Place Updates. >[junit4]> Expected: is <21> >[junit4]> but: was <30> >[junit4]> at > __randomizedtesting.SeedInfo.seed([80F75127980BAE95:77AF61938946C1E4]:0) >[junit4]> at > org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) >[junit4]> at > org.apache.solr.update.TestInPlaceUpdateWithRouteField.testUpdatingDocValuesWithRouteField(TestInPlaceUpdateWithRouteField.java:115) >[junit4]> at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers
[ https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242702#comment-17242702 ] ASF subversion and git services commented on SOLR-14934: Commit 05a8477a362beb6b0e5a02b6ee4dfa106a2e6a76 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=05a8477 ] SOLR-14934: Fix some additional test helper methods that aren't used on master but triggered problems when when backporting to branch_8x > Multiple Code Paths for determining "solr home" can return differnet answers > > > Key: SOLR-14934 > URL: https://issues.apache.org/jira/browse/SOLR-14934 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Minor > Attachments: SOLR-14934.poc.patch > > > While looking into some possible ways to make our tests more closely match > "real" solr installs, I realized that we currently have 2 different methods > for determining the "solr home" for a node... > * {{SolrPaths.locateSolrHome()}} > ** static method that uses a hueristic that typically results in using > {{System.getProperty("solr.solr.home");}} > *** NOTE: the result is not stored in any static/final variables > ** this method > * {{SolrDispatchFilter}} > ** starts by checking if an explicit {{ServletContext}} attribute is > specified > *** falls back to using {{SolrPaths.locateSolrHome()}} > ** whatever value is found gets set on {{CoreContainer}} > In a typical Solr install, the {{"solr.solr.home"}} system property is set by > {{bin/solr}} and we get a consistent value for the life of the server > instance regardless of code path. > In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that > calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests > (including {{MiniSolrCloudCluster}} based tests) we rely on the > {{ServletContext}} attribute based approach to have a unique "Solr Home" for > each node. ({{JettySOlrRunner}} injects the value when wiring up the > {{Server}} instance) > This means that: > * in jetty based test - even if it's a single jetty instance - each of the > node's CoreContainer has a unique value of "solr home", but any code paths in > solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent > value across all nodes (different from the value in the CoreContainer for any > node) > * allthough i don't think it happens now: a test could call > {{System.setProperty("solr.solr.home",...)}} while a node is running, and > potentially get inconsistent behavior from even a jetty node over time. > > In practice, I don't think that any of this is currently causing "real bugs" > in actual solr code; nor do i _think_ we're seeing any "false positives" or > "false failures" in tests as a result of this - but it is a big huge land > mine just waiting to go off if we step too close, and i think we should > recitfy this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley merged pull request #2105: Remove obsolete dev-tools scripts
dsmiley merged pull request #2105: URL: https://github.com/apache/lucene-solr/pull/2105 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up
madrob commented on a change in pull request #2115: URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534482769 ## File path: solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java ## @@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws Exception { JettySolrRunner jetty = getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0)); SolrCore core = jetty.getCoreContainer().getCores().iterator().next(); -for (int i = 0; i < 5; i++) { +for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) { cluster.expireZkSession(jetty); + waitForState("Expecting node to be disconnected", collectionName, activeReplicaCount(1, 0, 0)); Review comment: There is a window where live node has gone away but state is still active because it hasn't updated yet. if we're just waiting for and watching live nodes, then we can see that go away and complete the test before the cluster has quiesced. this is also how we check in testPullReplicaDisconnectsFromZooKeeper, so for consistency this felt better. There is still a different race here that the replica could go down and come back up before we start waiting for it to be down the first time (we're expecting the overseer to be slow), which I'm sure @markrmiller would be upset with me over, but we can deal with that when he finishes the rest of his speed up branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy closed pull request #2102: SOLR-14977: Fix typo in solr-upgrade-notes.adoc
janhoy closed pull request #2102: URL: https://github.com/apache/lucene-solr/pull/2102 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers
[ https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242787#comment-17242787 ] ASF subversion and git services commented on SOLR-14934: Commit 5caadc12f4b00b882ec6235d317c82d823d21ff7 in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5caadc1 ] SOLR-14934: Refactored duplicate "Solr Home" logic into a single place to eliminate risk of tests using divergent values for a single solr node. (cherry picked from commit 2e6a02394ec4eea6ba72d5bc2bf02c0139a54f39) (cherry picked from commit 05a8477a362beb6b0e5a02b6ee4dfa106a2e6a76) > Multiple Code Paths for determining "solr home" can return differnet answers > > > Key: SOLR-14934 > URL: https://issues.apache.org/jira/browse/SOLR-14934 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Minor > Attachments: SOLR-14934.poc.patch > > > While looking into some possible ways to make our tests more closely match > "real" solr installs, I realized that we currently have 2 different methods > for determining the "solr home" for a node... > * {{SolrPaths.locateSolrHome()}} > ** static method that uses a hueristic that typically results in using > {{System.getProperty("solr.solr.home");}} > *** NOTE: the result is not stored in any static/final variables > ** this method > * {{SolrDispatchFilter}} > ** starts by checking if an explicit {{ServletContext}} attribute is > specified > *** falls back to using {{SolrPaths.locateSolrHome()}} > ** whatever value is found gets set on {{CoreContainer}} > In a typical Solr install, the {{"solr.solr.home"}} system property is set by > {{bin/solr}} and we get a consistent value for the life of the server > instance regardless of code path. > In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that > calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests > (including {{MiniSolrCloudCluster}} based tests) we rely on the > {{ServletContext}} attribute based approach to have a unique "Solr Home" for > each node. ({{JettySOlrRunner}} injects the value when wiring up the > {{Server}} instance) > This means that: > * in jetty based test - even if it's a single jetty instance - each of the > node's CoreContainer has a unique value of "solr home", but any code paths in > solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent > value across all nodes (different from the value in the CoreContainer for any > node) > * allthough i don't think it happens now: a test could call > {{System.setProperty("solr.solr.home",...)}} while a node is running, and > potentially get inconsistent behavior from even a jetty node over time. > > In practice, I don't think that any of this is currently causing "real bugs" > in actual solr code; nor do i _think_ we're seeing any "false positives" or > "false failures" in tests as a result of this - but it is a big huge land > mine just waiting to go off if we step too close, and i think we should > recitfy this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242792#comment-17242792 ] Chris M. Hostetter commented on SOLR-12182: --- [~thelabdude]: on master CHANGES.txt shows this as a bug fix in 9.0, but on backport to 8x CHANGES.txt lists it as a bugfix in 8.8 > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Assignee: Timothy Potter >Priority: Major > Fix For: 8.8, master (9.0) > > Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242817#comment-17242817 ] Mike Drob commented on LUCENE-9629: --- Would it make sense to shove all of these values into arrays? > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties
[ https://issues.apache.org/jira/browse/SOLR-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242821#comment-17242821 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-14182: -- {quote}metric reporters configuration should be moved to container-level plugins, ie. {{/clusterprops.json:/plugin}} and the corresponding API. This will make the reporters easier to configure and change dynamically without restarting Solr nodes. {quote} I really don't see the point in moving this configuration to clusterprops. This will be bad for people that keep configuration as code, specially if they have multiple clusters, and it requires very solr-specific deployments processes. i.e. instead of building the Docker image and deploy it as you normally do, you need to, in addition, do this particular request to each Solr cluster, that's specific to this change so you'll never have to do do again unless we do changes in this particular component again (handle errors accordingly). I wish we could tackle SOLR-14843, before doing these changes, hopefully in a way where the use-case "long-lived Solr nodes where things can be installed on it" can coexist better with other strategies, such us rolling restarts, blue-green deployments or any kind of immutable deployments strategies. > Move metric reporters config from solr.xml to ZK cluster properties > --- > > Key: SOLR-14182 > URL: https://issues.apache.org/jira/browse/SOLR-14182 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.4 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Metric reporters are currently configured statically in solr.xml, which makes > it difficult to change dynamically or in a containerized environment. > We should move this section to ZK /cluster.properties and add a back-compat > migration shim. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers
[ https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242828#comment-17242828 ] ASF subversion and git services commented on SOLR-14934: Commit 5208d47e1a2030dc51396db74d42b52ba378756d in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5208d47 ] SOLR-14934: Remove redundent deprecated "solr.solr.home" logic > Multiple Code Paths for determining "solr home" can return differnet answers > > > Key: SOLR-14934 > URL: https://issues.apache.org/jira/browse/SOLR-14934 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Minor > Attachments: SOLR-14934.poc.patch > > > While looking into some possible ways to make our tests more closely match > "real" solr installs, I realized that we currently have 2 different methods > for determining the "solr home" for a node... > * {{SolrPaths.locateSolrHome()}} > ** static method that uses a hueristic that typically results in using > {{System.getProperty("solr.solr.home");}} > *** NOTE: the result is not stored in any static/final variables > ** this method > * {{SolrDispatchFilter}} > ** starts by checking if an explicit {{ServletContext}} attribute is > specified > *** falls back to using {{SolrPaths.locateSolrHome()}} > ** whatever value is found gets set on {{CoreContainer}} > In a typical Solr install, the {{"solr.solr.home"}} system property is set by > {{bin/solr}} and we get a consistent value for the life of the server > instance regardless of code path. > In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that > calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests > (including {{MiniSolrCloudCluster}} based tests) we rely on the > {{ServletContext}} attribute based approach to have a unique "Solr Home" for > each node. ({{JettySOlrRunner}} injects the value when wiring up the > {{Server}} instance) > This means that: > * in jetty based test - even if it's a single jetty instance - each of the > node's CoreContainer has a unique value of "solr home", but any code paths in > solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent > value across all nodes (different from the value in the CoreContainer for any > node) > * allthough i don't think it happens now: a test could call > {{System.setProperty("solr.solr.home",...)}} while a node is running, and > potentially get inconsistent behavior from even a jetty node over time. > > In practice, I don't think that any of this is currently causing "real bugs" > in actual solr code; nor do i _think_ we're seeing any "false positives" or > "false failures" in tests as a result of this - but it is a big huge land > mine just waiting to go off if we step too close, and i think we should > recitfy this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242826#comment-17242826 ] Timothy Potter commented on SOLR-12182: --- Yes, I'm aware of that [~hossman] ... wasn't going to backport this given the scope but changed my mind. I'll fix CHANGES.txt in master > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Assignee: Timothy Potter >Priority: Major > Fix For: 8.8, master (9.0) > > Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14934) Multiple Code Paths for determining "solr home" can return differnet answers
[ https://issues.apache.org/jira/browse/SOLR-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter resolved SOLR-14934. --- Fix Version/s: master (9.0) 8.8 Resolution: Fixed > Multiple Code Paths for determining "solr home" can return differnet answers > > > Key: SOLR-14934 > URL: https://issues.apache.org/jira/browse/SOLR-14934 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Minor > Fix For: 8.8, master (9.0) > > Attachments: SOLR-14934.poc.patch > > > While looking into some possible ways to make our tests more closely match > "real" solr installs, I realized that we currently have 2 different methods > for determining the "solr home" for a node... > * {{SolrPaths.locateSolrHome()}} > ** static method that uses a hueristic that typically results in using > {{System.getProperty("solr.solr.home");}} > *** NOTE: the result is not stored in any static/final variables > ** this method > * {{SolrDispatchFilter}} > ** starts by checking if an explicit {{ServletContext}} attribute is > specified > *** falls back to using {{SolrPaths.locateSolrHome()}} > ** whatever value is found gets set on {{CoreContainer}} > In a typical Solr install, the {{"solr.solr.home"}} system property is set by > {{bin/solr}} and we get a consistent value for the life of the server > instance regardless of code path. > In tests, we have {{SolrTestCaseJ4}} (and a handful of other places) that > calls {{System.setProperty("solr.solr.home",...)}} *AND* in jetty based tests > (including {{MiniSolrCloudCluster}} based tests) we rely on the > {{ServletContext}} attribute based approach to have a unique "Solr Home" for > each node. ({{JettySOlrRunner}} injects the value when wiring up the > {{Server}} instance) > This means that: > * in jetty based test - even if it's a single jetty instance - each of the > node's CoreContainer has a unique value of "solr home", but any code paths in > solr that directly call {{SolrPaths.locateSolrHome()}} will get a consistent > value across all nodes (different from the value in the CoreContainer for any > node) > * allthough i don't think it happens now: a test could call > {{System.setProperty("solr.solr.home",...)}} while a node is running, and > potentially get inconsistent behavior from even a jetty node over time. > > In practice, I don't think that any of this is currently causing "real bugs" > in actual solr code; nor do i _think_ we're seeing any "false positives" or > "false failures" in tests as a result of this - but it is a big huge land > mine just waiting to go off if we step too close, and i think we should > recitfy this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude opened a new pull request #2116: SOLR-12182: Fix Changes.txt
thelabdude opened a new pull request #2116: URL: https://github.com/apache/lucene-solr/pull/2116 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude merged pull request #2116: SOLR-12182: Fix Changes.txt
thelabdude merged pull request #2116: URL: https://github.com/apache/lucene-solr/pull/2116 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12182) Can not switch urlScheme in 7x if there are any cores in the cluster
[ https://issues.apache.org/jira/browse/SOLR-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242829#comment-17242829 ] ASF subversion and git services commented on SOLR-12182: Commit 4c100a0175e2553320ca3133bbe9170592389d9d in lucene-solr's branch refs/heads/master from Timothy Potter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4c100a0 ] SOLR-12182: Fix Changes.txt in master (#2116) > Can not switch urlScheme in 7x if there are any cores in the cluster > > > Key: SOLR-12182 > URL: https://issues.apache.org/jira/browse/SOLR-12182 > Project: Solr > Issue Type: Bug >Affects Versions: 7.0, 7.1, 7.2 >Reporter: Anshum Gupta >Assignee: Timothy Potter >Priority: Major > Fix For: 8.8, master (9.0) > > Attachments: SOLR-12182.patch, SOLR-12182_20200423.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > I was trying to enable TLS on a cluster that was already in use i.e. had > existing collections and ended up with down cores, that wouldn't come up and > the following core init errors in the logs: > *org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > replica with coreNodeName core_node4 exists but with a different name or > base_url.* > What is happening here is that the core/replica is defined in the > clusterstate with the urlScheme as part of it's base URL e.g. > *"base_url":"http:hostname:port/solr"*. > Switching the urlScheme in Solr breaks this convention as the host now uses > HTTPS instead. > Actually, I ran into this with an older version because I was running with > *legacyCloud=false* and then realized that we switched that to the default > behavior only in 7x i.e while most users did not hit this issue with older > versions, unless they overrode the legacyCloud value explicitly, users > running 7x are bound to run into this more often. > Switching the value of legacyCloud to true, bouncing the cluster so that the > clusterstate gets flushed, and then setting it back to false is a workaround > but a bit risky one if you don't know if you have any old cores lying around. > Ideally, I think we shouldn't prepend the urlScheme to the base_url value and > use the urlScheme on the fly to construct it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy merged pull request #2103: Reconcile upgrade notes in master
janhoy merged pull request #2103: URL: https://github.com/apache/lucene-solr/pull/2103 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations
[ https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242833#comment-17242833 ] Noble Paul commented on SOLR-14613: --- (y) > Provide a clean API for pluggable replica assignment implementations > > > Key: SOLR-14613 > URL: https://issues.apache.org/jira/browse/SOLR-14613 > Project: Solr > Issue Type: Improvement > Components: AutoScaling >Reporter: Andrzej Bialecki >Assignee: Ilan Ginzburg >Priority: Major > Time Spent: 41h 20m > Remaining Estimate: 0h > > As described in SIP-8 the current autoscaling Policy implementation has > several limitations that make it difficult to use for very large clusters and > very large collections. SIP-8 also mentions the possible migration path by > providing alternative implementations of the placement strategies that are > less complex but more efficient in these very large environments. > We should review the existing APIs that the current autoscaling engine uses > ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related > interfaces) to see if they provide a sufficient and minimal API for plugging > in alternative autoscaling placement strategies, and if necessary refactor > the existing APIs. > Since these APIs are internal it should be possible to do this without > breaking back-compat. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9406) Make it simpler to track IndexWriter's events
[ https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242858#comment-17242858 ] Zach Chen commented on LUCENE-9406: --- Hi [~mikemccand], I'm trying to find a new task to work on and see this. I took a look at the comments in the PR and see that you already have some work in progress code such as the interface. Just curious, has the discussion been carried out further in any way after that PR, and if this task is ready to be picked up again at this point (following your original approach to use *IndexWriterEvents* class maybe) ? > Make it simpler to track IndexWriter's events > - > > Key: LUCENE-9406 > URL: https://issues.apache.org/jira/browse/LUCENE-9406 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Priority: Major > > This is the second spinoff from a [controversial PR to add a new index-time > feature to Lucene to merge small segments during > commit|https://github.com/apache/lucene-solr/pull/1552]. That change can > substantially reduce the number of small index segments to search. > In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving > the application a chance to track when {{IndexWriter}} kicked off merges > during commit, how many, how long it waited, how often it gave up waiting, > etc. > Such telemetry from production usage is really helpful when tuning settings > like which merges (e.g. a size threshold) to attempt on commit, and how long > to wait during commit, etc. > I am splitting out this issue to explore possible approaches to do this. > E.g. [~simonw] proposed using a statistics class instead, but if I understood > that correctly, I think that would put the role of aggregation inside > {{IndexWriter}}, which is not ideal. > Many interesting events, e.g. how many merges are being requested, how large > are they, how long did they take to complete or fail, etc., can be gleaned by > wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}. > But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for > merges during commit), it would be very helpful to have some simple way to > track so applications can better tune. > It is also possible to subclass {{IndexWriter}} and override key methods, but > I think that is inherently risky as {{IndexWriter}}'s protected methods are > not considered to be a stable API, and the synchronization used by > {{IndexWriter}} is confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] zacharymorn commented on pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory
zacharymorn commented on pull request #2052: URL: https://github.com/apache/lucene-solr/pull/2052#issuecomment-737614580 Just want to have a quick follow up on this PR. Are there any more changes expected from my end? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9367) Using a queryText which results in zero tokens causes a query to be built as null
[ https://issues.apache.org/jira/browse/LUCENE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242868#comment-17242868 ] Zach Chen edited comment on LUCENE-9367 at 12/3/20, 2:37 AM: - Looks like there's some inconsistency about parsing, as I can get *MatchNoDocsQuery* from *SimpleQueryParser* {code:java} public void test() throws IOException { Analyzer analyzer = CustomAnalyzer.builder() .withTokenizer(StandardTokenizerFactory.class) .addTokenFilter(StopFilterFactory.class) .build(); QueryBuilder queryBuilder = new QueryBuilder(analyzer); String onlyStopWords = "the and it"; Query query = queryBuilder.createPhraseQuery("AnyField", onlyStopWords); assertNull(query); query = new SimpleQueryParser(analyzer, "AnyField").parse(onlyStopWords); assertEquals(new MatchNoDocsQuery("empty string passed to query parser"), query); } {code} I can put out a PR to change it to MatchNoDocsQuery if that's the preferred direction? I also see additional changes though to check for MatchNoDocsQuery now instead of null in this situation. was (Author: zacharymorn): Looks like there's some inconsistency about parsing, as I can get *MatchNoDocsQuery* from *SimpleQueryParser* {code:java} public void test() throws IOException { Analyzer analyzer = CustomAnalyzer.builder() .withTokenizer(StandardTokenizerFactory.class) .addTokenFilter(StopFilterFactory.class) .build(); QueryBuilder queryBuilder = new QueryBuilder(analyzer); String onlyStopWords = "the and it"; Query query = queryBuilder.createPhraseQuery("AnyField", onlyStopWords); assertNull(query); query = new SimpleQueryParser(analyzer, "AnyField").parse(onlyStopWords); assertEquals(new MatchNoDocsQuery("empty string passed to query parser"), query); } {code} I can put out a PR to change it to MatchNoDocsQuery if that's the preferred direction? I also see additional changes though to check for MatchNoDocsQuery now instead of null in this situation. > Using a queryText which results in zero tokens causes a query to be built as > null > - > > Key: LUCENE-9367 > URL: https://issues.apache.org/jira/browse/LUCENE-9367 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.2.1 >Reporter: Tim Brier >Priority: Major > > If a queryText produces zero tokens after being processed by an Analyzer, > when you try to build a Query with it the result is null. > > The following code reproduces this bug: > {code:java} > public class LuceneBug { > public Query buildQuery() throws IOException { > Analyzer analyzer = CustomAnalyzer.builder() > .withTokenizer(StandardTokenizerFactory.class) > .addTokenFilter(StopFilterFactory.class) > .build(); > QueryBuilder queryBuilder = new QueryBuilder(analyzer); > String onlyStopWords = "the and it"; > return queryBuilder.createPhraseQuery("AnyField", onlyStopWords); > } > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9367) Using a queryText which results in zero tokens causes a query to be built as null
[ https://issues.apache.org/jira/browse/LUCENE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242868#comment-17242868 ] Zach Chen commented on LUCENE-9367: --- Looks like there's some inconsistency about parsing, as I can get *MatchNoDocsQuery* from *SimpleQueryParser* {code:java} public void test() throws IOException { Analyzer analyzer = CustomAnalyzer.builder() .withTokenizer(StandardTokenizerFactory.class) .addTokenFilter(StopFilterFactory.class) .build(); QueryBuilder queryBuilder = new QueryBuilder(analyzer); String onlyStopWords = "the and it"; Query query = queryBuilder.createPhraseQuery("AnyField", onlyStopWords); assertNull(query); query = new SimpleQueryParser(analyzer, "AnyField").parse(onlyStopWords); assertEquals(new MatchNoDocsQuery("empty string passed to query parser"), query); } {code} I can put out a PR to change it to MatchNoDocsQuery if that's the preferred direction? I also see additional changes though to check for MatchNoDocsQuery now instead of null in this situation. > Using a queryText which results in zero tokens causes a query to be built as > null > - > > Key: LUCENE-9367 > URL: https://issues.apache.org/jira/browse/LUCENE-9367 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 7.2.1 >Reporter: Tim Brier >Priority: Major > > If a queryText produces zero tokens after being processed by an Analyzer, > when you try to build a Query with it the result is null. > > The following code reproduces this bug: > {code:java} > public class LuceneBug { > public Query buildQuery() throws IOException { > Analyzer analyzer = CustomAnalyzer.builder() > .withTokenizer(StandardTokenizerFactory.class) > .addTokenFilter(StopFilterFactory.class) > .build(); > QueryBuilder queryBuilder = new QueryBuilder(analyzer); > String onlyStopWords = "the and it"; > return queryBuilder.createPhraseQuery("AnyField", onlyStopWords); > } > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15028) summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted
Amy Bai created SOLR-15028: -- Summary: summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted Key: SOLR-15028 URL: https://issues.apache.org/jira/browse/SOLR-15028 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 7.4.1 Reporter: Amy Bai I found that SolrCloud won't check the IO status if the SolrCloud process is alive. e.g. If I delete the SolrCloud data directory, there are no errors report, and I can still log in to the SolrCloud Admin UI to create/query collections. Then, index/search queries keep failing because one of the node data directories is gone, but the node is not marked as down. The replicas on the failed node are not working, but the Index/search queries didn't failover to other healthy replicas. The ERROR message as below shows: """ curl -X POST -H 'Content-Type: application/json' 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' \{ "a": "1", }' \{ "responseHeader":{ "status":400, "QTime":6}, "error":\{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting document: ", "code":400}} """ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15028) summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted
[ https://issues.apache.org/jira/browse/SOLR-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amy Bai updated SOLR-15028: --- Affects Version/s: (was: 7.4.1) 7.4 > summarySolrCloud shows cluster still healthy without failover even the node > data directory is deleted > - > > Key: SOLR-15028 > URL: https://issues.apache.org/jira/browse/SOLR-15028 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.4 >Reporter: Amy Bai >Priority: Major > > I found that SolrCloud won't check the IO status if the SolrCloud process is > alive. > > e.g. If I delete the SolrCloud data directory, there are no errors report, > and I can still log in to the SolrCloud Admin UI to create/query collections. > Then, index/search queries keep failing because one of the node data > directories is gone, but the node is not marked as down. > The replicas on the failed node are not working, but the Index/search queries > didn't failover to other healthy replicas. > > The ERROR message as below shows: > """ > curl -X POST -H 'Content-Type: application/json' > 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary > ' \{ "a": "1", }' \{ "responseHeader":{ "status":400, "QTime":6}, "error":\{ > "metadata":[ "error-class","org.apache.solr.common.SolrException", > "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error > inserting document: ", "code":400}} > """ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15028) summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted
[ https://issues.apache.org/jira/browse/SOLR-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amy Bai updated SOLR-15028: --- Description: I found that SolrCloud won't check the IO status if the SolrCloud process is alive. e.g. If I delete the data directory for one of the SolrCloud node, there are no errors report, and I can still log in to the SolrCloud Admin UI to create/query collections. SolrCloud Admin UI shows the collections' status is green. Then, index/search queries keep failing because one of the node data directories is gone, but the node is not marked as down. The replicas on the failed node are not working, but the Index/search queries didn't failover to other healthy replicas. The ERROR message as below shows: """ curl -X POST -H 'Content-Type: application/json' 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' \{ "a": "1", }' { "responseHeader": { "status":400, "QTime":6} , "error":\{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting document: ", "code":400}} """ was: I found that SolrCloud won't check the IO status if the SolrCloud process is alive. e.g. If I delete the SolrCloud data directory, there are no errors report, and I can still log in to the SolrCloud Admin UI to create/query collections. Then, index/search queries keep failing because one of the node data directories is gone, but the node is not marked as down. The replicas on the failed node are not working, but the Index/search queries didn't failover to other healthy replicas. The ERROR message as below shows: """ curl -X POST -H 'Content-Type: application/json' 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' \{ "a": "1", }' \{ "responseHeader":{ "status":400, "QTime":6}, "error":\{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting document: ", "code":400}} """ > summarySolrCloud shows cluster still healthy without failover even the node > data directory is deleted > - > > Key: SOLR-15028 > URL: https://issues.apache.org/jira/browse/SOLR-15028 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.4 >Reporter: Amy Bai >Priority: Major > > I found that SolrCloud won't check the IO status if the SolrCloud process is > alive. > > e.g. If I delete the data directory for one of the SolrCloud node, there are > no errors report, and I can still log in to the SolrCloud Admin UI to > create/query collections. SolrCloud Admin UI shows the collections' status is > green. > Then, index/search queries keep failing because one of the node data > directories is gone, but the node is not marked as down. > The replicas on the failed node are not working, but the Index/search queries > didn't failover to other healthy replicas. > > The ERROR message as below shows: > """ > curl -X POST -H 'Content-Type: application/json' > 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary > ' \{ "a": "1", }' { "responseHeader": > { "status":400, "QTime":6} > , "error":\{ "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error > inserting document: ", "code":400}} > """ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15028) SolrCloud shows cluster still healthy without failover even the node data directory is deleted
[ https://issues.apache.org/jira/browse/SOLR-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amy Bai updated SOLR-15028: --- Summary: SolrCloud shows cluster still healthy without failover even the node data directory is deleted (was: summarySolrCloud shows cluster still healthy without failover even the node data directory is deleted) > SolrCloud shows cluster still healthy without failover even the node data > directory is deleted > -- > > Key: SOLR-15028 > URL: https://issues.apache.org/jira/browse/SOLR-15028 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 7.4 >Reporter: Amy Bai >Priority: Major > > I found that SolrCloud won't check the IO status if the SolrCloud process is > alive. > > e.g. If I delete the data directory for one of the SolrCloud node, there are > no errors report, and I can still log in to the SolrCloud Admin UI to > create/query collections. SolrCloud Admin UI shows the collections' status is > green. > Then, index/search queries keep failing because one of the node data > directories is gone, but the node is not marked as down. > The replicas on the failed node are not working, but the Index/search queries > didn't failover to other healthy replicas. > > The ERROR message as below shows: > """ > curl -X POST -H 'Content-Type: application/json' > 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary > ' \{ "a": "1", }' { "responseHeader": > { "status":400, "QTime":6} > , "error":\{ "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error > inserting document: ", "code":400}} > """ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888 ] Feng Guo commented on LUCENE-9629: -- [~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines of code get even less~ > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:51 AM: [~mdrob] Thanks for you reply! That's a really good idea, I updated my [PR|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines of code get even less~ was (Author: gf2121): [~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines of code get even less~ > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:51 AM: [~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines of code get even less~ was (Author: gf2121): [~mdrob] Thanks for you reply! That's a really good idea, I updated my [pull request|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines of code get even less~ > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:53 AM: [~mdrob] Thanks for you advice! That's a really good idea, I updated my [PR|https://github.com/apache/lucene-solr/pull/2113/files] and find the lines of code get even less~ was (Author: gf2121): [~mdrob] Thanks for you reply! That's a really good idea, I updated my [PR|https://github.com/apache/lucene-solr/pull/2113/files] and find the lines of code get even less~ > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242888#comment-17242888 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 3:53 AM: [~mdrob] Thanks for you reply! That's a really good idea, I updated my [PR|https://github.com/apache/lucene-solr/pull/2113/files] and find the lines of code get even less~ was (Author: gf2121): [~mdrob] Thanks for you reply! That's a really good idea, I updated my [PR|[https://github.com/apache/lucene-solr/pull/2113/files]] and find the lines of code get even less~ > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-9629: - Description: In the class ForkUtil, mask values have been computed and stored in static final vailables, but they are recomputed for every encoding, which may be unnecessary. anther small fix is that change {code:java} remainingBitsPerValue > remainingBitsPerLong{code} to {code:java} remainingBitsPerValue >= remainingBitsPerLong{code} otherwise {code:java} if (remainingBitsPerValue == 0) { idx++; remainingBitsPerValue = bitsPerValue; } {code} these code will never be used. was: In the class ForkUtil, mask values have been computed and stored in static final vailables, but they are recomputed for every encoding, which may be unnecessary. anther small fix is that change {code:java} remainingBitsPerValue > remainingBitsPerLong{code} to {code:java} remainingBitsPerValue >= remainingBitsPerLong{code} otherwise {code:java} if (remainingBitsPerValue == 0) { idx++; remainingBitsPerValue = bitsPerValue; } {code} these code will never be used. > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 4:02 AM: [~jpountz] Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try my best to make it better:) was (Author: gf2121): Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try to do my best to make it better:) > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9630) Allow Shard Leader to give up leadership gracefully via shard terms
Mike Drob created LUCENE-9630: - Summary: Allow Shard Leader to give up leadership gracefully via shard terms Key: LUCENE-9630 URL: https://issues.apache.org/jira/browse/LUCENE-9630 Project: Lucene - Core Issue Type: Bug Reporter: Mike Drob Currently we have (via SOLR-12412) that when a leader sees an index writing error during an update it will give up leadership by deleting the replica and adding a new replica. One stated benefit of this was that because we are using the overseer and a known code path, that this is done asynchronous and very efficiently. I would argue that this approach is too heavy handed. In the case of a corrupt index exception, it makes some sense to completely delete the index dir and attempt to sync from a good peer. Even in this case, however, it might be better to allow fingerprinting and other index delta mechanisms take over and allow for a more efficient data transfer. In an alternate case where the index error arises due to a disconnected file system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) and the required solution is some kind of reconnect, then this approach has several shortcomings - the core delete and creations are going to fail leaving dangling replicas. Further, the data is still present so there is no need to do so many extra copies. I propose that we bring in a mechanism to give up leadership via the existing shard terms language. I believe we would be able to set all replicas currently equal to leader term T to T+1, and then trigger a new leader election. The current leader would know it is ineligible, while the other replicas that were current before the failed update would be eligible. This improvement would entail adding an additional possible operation to terms state machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-15029) Allow Shard Leader to give up leadership gracefully via shard terms
[ https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob reassigned SOLR-15029: Assignee: Mike Drob > Allow Shard Leader to give up leadership gracefully via shard terms > --- > > Key: SOLR-15029 > URL: https://issues.apache.org/jira/browse/SOLR-15029 > Project: Solr > Issue Type: Improvement >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > > Currently we have (via SOLR-12412) that when a leader sees an index writing > error during an update it will give up leadership by deleting the replica and > adding a new replica. One stated benefit of this was that because we are > using the overseer and a known code path, that this is done asynchronous and > very efficiently. > I would argue that this approach is too heavy handed. > In the case of a corrupt index exception, it makes some sense to completely > delete the index dir and attempt to sync from a good peer. Even in this case, > however, it might be better to allow fingerprinting and other index delta > mechanisms take over and allow for a more efficient data transfer. > In an alternate case where the index error arises due to a disconnected file > system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) > and the required solution is some kind of reconnect, then this approach has > several shortcomings - the core delete and creations are going to fail > leaving dangling replicas. Further, the data is still present so there is no > need to do so many extra copies. > I propose that we bring in a mechanism to give up leadership via the existing > shard terms language. I believe we would be able to set all replicas > currently equal to leader term T to T+1, and then trigger a new leader > election. The current leader would know it is ineligible, while the other > replicas that were current before the failed update would be eligible. This > improvement would entail adding an additional possible operation to terms > state machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Moved] (SOLR-15029) Allow Shard Leader to give up leadership gracefully via shard terms
[ https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob moved LUCENE-9630 to SOLR-15029: -- Key: SOLR-15029 (was: LUCENE-9630) Lucene Fields: (was: New) Issue Type: Improvement (was: Bug) Project: Solr (was: Lucene - Core) > Allow Shard Leader to give up leadership gracefully via shard terms > --- > > Key: SOLR-15029 > URL: https://issues.apache.org/jira/browse/SOLR-15029 > Project: Solr > Issue Type: Improvement >Reporter: Mike Drob >Priority: Major > > Currently we have (via SOLR-12412) that when a leader sees an index writing > error during an update it will give up leadership by deleting the replica and > adding a new replica. One stated benefit of this was that because we are > using the overseer and a known code path, that this is done asynchronous and > very efficiently. > I would argue that this approach is too heavy handed. > In the case of a corrupt index exception, it makes some sense to completely > delete the index dir and attempt to sync from a good peer. Even in this case, > however, it might be better to allow fingerprinting and other index delta > mechanisms take over and allow for a more efficient data transfer. > In an alternate case where the index error arises due to a disconnected file > system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) > and the required solution is some kind of reconnect, then this approach has > several shortcomings - the core delete and creations are going to fail > leaving dangling replicas. Further, the data is still present so there is no > need to do so many extra copies. > I propose that we bring in a mechanism to give up leadership via the existing > shard terms language. I believe we would be able to set all replicas > currently equal to leader term T to T+1, and then trigger a new leader > election. The current leader would know it is ineligible, while the other > replicas that were current before the failed update would be eligible. This > improvement would entail adding an additional possible operation to terms > state machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15029) Allow Shard Leader to give up leadership gracefully via shard terms
[ https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242904#comment-17242904 ] Mike Drob commented on SOLR-15029: -- [~tflobbe], [~caomanhdat] - you were involved with the initial implementation of giving up leadership so I would love to hear your thoughts on this proposal. [~varun], you too, since it looks like you were battle tested on that issue. > Allow Shard Leader to give up leadership gracefully via shard terms > --- > > Key: SOLR-15029 > URL: https://issues.apache.org/jira/browse/SOLR-15029 > Project: Solr > Issue Type: Improvement >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > > Currently we have (via SOLR-12412) that when a leader sees an index writing > error during an update it will give up leadership by deleting the replica and > adding a new replica. One stated benefit of this was that because we are > using the overseer and a known code path, that this is done asynchronous and > very efficiently. > I would argue that this approach is too heavy handed. > In the case of a corrupt index exception, it makes some sense to completely > delete the index dir and attempt to sync from a good peer. Even in this case, > however, it might be better to allow fingerprinting and other index delta > mechanisms take over and allow for a more efficient data transfer. > In an alternate case where the index error arises due to a disconnected file > system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) > and the required solution is some kind of reconnect, then this approach has > several shortcomings - the core delete and creations are going to fail > leaving dangling replicas. Further, the data is still present so there is no > need to do so many extra copies. > I propose that we bring in a mechanism to give up leadership via the existing > shard terms language. I believe we would be able to set all replicas > currently equal to leader term T to T+1, and then trigger a new leader > election. The current leader would know it is ineligible, while the other > replicas that were current before the failed update would be eligible. This > improvement would entail adding an additional possible operation to terms > state machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 5:52 AM: [~jpountz] Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears. So you may think of it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, and ForUtil is somewhat a hot way when indexing, so it may be worth >fixing. But if you insist that the precompute make no sense, just tell me and >i will revert this part of change. In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try my best to make it better:) was (Author: gf2121): [~jpountz] Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears, not to mention here is somewhat a hot way when indexing. So you may consider it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, but if you insist that precompute make no sense, just tell me and i >will revert this part of change In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try my best to make it better:) > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up
tflobbe commented on a change in pull request #2115: URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534702097 ## File path: solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java ## @@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws Exception { JettySolrRunner jetty = getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0)); SolrCore core = jetty.getCoreContainer().getCores().iterator().next(); -for (int i = 0; i < 5; i++) { +for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) { cluster.expireZkSession(jetty); + waitForState("Expecting node to be disconnected", collectionName, activeReplicaCount(1, 0, 0)); Review comment: > There is a window where live node has gone away but state is still active because it hasn't updated yet. Have you seen that happening? AFAIK, everywhere that we check if a replica is active we look at the state and the live nodes. > if we're just waiting for and watching live nodes, then we can see that go away and complete the test before the cluster has quiesced. We would still have the check in line 243, right? My point was: 1) wait to see a change in live nodes 2) wait for active (line 243 as it is now) Wouldn't that be safe (assuming no other, unrelated node dies just at this point)? > There is still a different race here that the replica could go down and come back up before we start waiting for it to be down the first time Right, that's the one I was concerned about. Difficult to happen, but... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe commented on a change in pull request #2115: SOLR-14992 Wait for node down before checking for node up
tflobbe commented on a change in pull request #2115: URL: https://github.com/apache/lucene-solr/pull/2115#discussion_r534702097 ## File path: solr/core/src/test/org/apache/solr/cloud/TestPullReplicaErrorHandling.java ## @@ -236,8 +237,9 @@ public void testCloseHooksDeletedOnReconnect() throws Exception { JettySolrRunner jetty = getJettyForReplica(s.getReplicas(EnumSet.of(Replica.Type.PULL)).get(0)); SolrCore core = jetty.getCoreContainer().getCores().iterator().next(); -for (int i = 0; i < 5; i++) { +for (int i = 0; i < (TEST_NIGHTLY ? 5 : 2); i++) { cluster.expireZkSession(jetty); + waitForState("Expecting node to be disconnected", collectionName, activeReplicaCount(1, 0, 0)); Review comment: > There is a window where live node has gone away but state is still active because it hasn't updated yet. Have you seen that happening? AFAIK, everywhere that we check if a replica is active we look at the state and the live nodes. > if we're just waiting for and watching live nodes, then we can see that go away and complete the test before the cluster has quiesced. We would still have the check in line 243, right? My point was: 1) wait to see a change in live nodes 2) wait for active (line 243 as it is now) Wouldn't that be safe (assuming no other, unrelated node dies just at this point)? > There is still a different race here that the replica could go down and come back up before we start waiting for it to be down the first time Right, that's the one I was concerned about. Difficult to happen, but... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 6:41 AM: [~jpountz] Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears. So you may think of it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler {code:java} public static void main(String[] args) throws Exception { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i = 0; i < 128; i++) { nums[i] = random.nextInt(7) + 1; } ForUtil forUtil = new ForUtil(); DataOutput dataOutput = new DataOutput() { @Override public void writeLong(long i) throws IOException {} @Override public void writeByte(byte b) throws IOException {} @Override public void writeBytes(byte[] bytes, int i, int i1) throws IOException {} }; while (true){ forUtil.encode(nums, 3, dataOutput); } }{code} *result:* || ||before||after|| |org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask8 java.lang.Long.reverseBytes org.apache.lucene.codecs.lucene84.Main$1.writeLong|29.9% 13.7% < 1% < 1%|31.7% 2.8% < 1% < 1%| >From my point of view, the number of code lines is less important than writing >speed, and ForUtil is somewhat a hot way when indexing, so it may be worth >fixing. But if you insist that the precompute make no sense, just tell me and >i will revert this part of change. In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try my best to make it better:) was (Author: gf2121): [~jpountz] Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears. So you may think of it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler if your are interested~ {code:java} for (int time=0; time<100; time++) { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i=0;i<128;i++) { nums[i] = random.nextInt(4)+1; } ForUtil forUtil = new ForUtil(); Directory directory = new ByteBuffersDirectory(); DataOutput dataOutput = directory.createOutput("test", IOContext.DEFAULT); for (int i = 0; i < 1; i++) { forUtil.encode(nums, 3, dataOutput); } directory.close(); }{code} *result:* || ||before||after|| |org.apache.lucene.store.ByteBuffersIndexOutput.writeLong org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask(ed)8|40.4% 15.3% 8.8%|41.2% 14.8% 3.8%| >From my point of view, the number of code lines is less important than writing >speed, and ForUtil is somewhat a hot way when indexing, so it may be worth >fixing. But if you insist that the precompute make no sense, just tell me and >i will revert this part of change. In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try my best to make it better:) > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > these code will never be used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.ap
[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil
[ https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242418#comment-17242418 ] Feng Guo edited comment on LUCENE-9629 at 12/3/20, 6:43 AM: [~jpountz] Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears. So you may think of it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler {code:java} public static void main(String[] args) throws Exception { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i = 0; i < 128; i++) { nums[i] = random.nextInt(7) + 1; } ForUtil forUtil = new ForUtil(); DataOutput dataOutput = new DataOutput() { @Override public void writeLong(long i) throws IOException {} @Override public void writeByte(byte b) throws IOException {} @Override public void writeBytes(byte[] bytes, int i, int i1) throws IOException {} }; while (true){ forUtil.encode(nums, 3, dataOutput); } }{code} *result:* || method||before||after|| |org.apache.lucene.store.ForUtil.collapse8|29.9%|31.7%| |org.apache.lucene.store.ForUtil.mask8|13.7%|2.8%| |java.lang.Long.reverseBytes|< 1%|< 1%| |org.apache.lucene.codecs.lucene84.Main$1.writeLong |< 1%|< 1%| >From my point of view, the number of code lines is less important than writing >speed, and ForUtil is somewhat a hot way when indexing, so it may be worth >fixing. But if you insist that the precompute make no sense, just tell me and >i will revert this part of change. In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try my best to make it better:) was (Author: gf2121): [~jpountz] Thanks for your reply! I can't agree more that write path is less performance-sensitive than the read path, and to be honest, i didn't expect this change will bring a very big improvement in writing speed. All I'm trying to do is just to reduce duplicate compute no matter where it appears. So you may think of it as a "fix" instead of an "enhancement". here is a simple benchmark run with cpu profiler {code:java} public static void main(String[] args) throws Exception { Random random = new Random(System.currentTimeMillis()); long[] nums = new long[128]; for (int i = 0; i < 128; i++) { nums[i] = random.nextInt(7) + 1; } ForUtil forUtil = new ForUtil(); DataOutput dataOutput = new DataOutput() { @Override public void writeLong(long i) throws IOException {} @Override public void writeByte(byte b) throws IOException {} @Override public void writeBytes(byte[] bytes, int i, int i1) throws IOException {} }; while (true){ forUtil.encode(nums, 3, dataOutput); } }{code} *result:* || ||before||after|| |org.apache.lucene.store.ForUtil.collapse8 org.apache.lucene.store.ForUtil.mask8 java.lang.Long.reverseBytes org.apache.lucene.codecs.lucene84.Main$1.writeLong|29.9% 13.7% < 1% < 1%|31.7% 2.8% < 1% < 1%| >From my point of view, the number of code lines is less important than writing >speed, and ForUtil is somewhat a hot way when indexing, so it may be worth >fixing. But if you insist that the precompute make no sense, just tell me and >i will revert this part of change. In addition, i'm a bit poor in english speaking and most of words above come from translate programs. if there are any word offending you, please just ignore it. i really admire this amazing project and just try my best to make it better:) > Use computed mask values in ForUtil > --- > > Key: LUCENE-9629 > URL: https://issues.apache.org/jira/browse/LUCENE-9629 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Feng Guo >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In the class ForkUtil, mask values have been computed and stored in static > final vailables, but they are recomputed for every encoding, which may be > unnecessary. > anther small fix is that change > {code:java} > remainingBitsPerValue > remainingBitsPerLong{code} > to > {code:java} > remainingBitsPerValue >= remainingBitsPerLong{code} > otherwise > {code:java} > if (remainingBitsPerValue == 0) { > idx++; > remainingBitsPerValue = bitsPerValue; > } > {code} > these code will never be used. -- This message was sent by Atlassian Jira (v8.3