On 6/11/2018 9:46 AM, Joe Obernberger wrote:
> We are seeing an issue on our Solr Cloud 7.3.1 cluster where
> replication starts and pegs network interfaces so aggressively that
> other tasks cannot talk.  We will see it peg a bonded 2GB interfaces. 
> In some cases the replication fails over and over until it finally
> succeeds and the replica comes back up.  Usually the error is a timeout.
>
> Has anyone seen this?  We've tried adjust the /replication
> requestHandler and setting:
>
> <requestHandler name="/replication" class="solr.ReplicationHandler">
>          <lst name="defaults">
>           <str name="maxWriteMBPerSec">75</str>
>          </lst>
> </requestHandler>

Here's something I'd like you to try.  Open a browser and visit the URL
for the handler with some specific parameters, so we can see if that
config is actually being applied.  Substitute the correct host, port,
and collection name:

http://host:port/solr/collection/replication?command=details&echoParams=all&wt=json&indent=true

And provide the full raw JSON response.

On a solr 7.3.0 example, I added your replication handler definition,
and this is the result of visiting a similar URL:

{
  "responseHeader":{
    "status":0,
    "QTime":5,
    "params":{
      "echoParams":"all",
      "indent":"true",
      "wt":"json",
      "command":"details",
      "maxWriteMBPerSec":"75"}},
  "details":{
    "indexSize":"6.27 KB",
    
"indexPath":"C:\\Users\\sheisey\\Downloads\\solr-7.3.0\\server\\solr\\foo\\data\\index/",
    "commits":[[
        "indexVersion",1528213960436,
        "generation",4,
        "filelist",["_0.fdt",
          "_0.fdx",
          "_0.fnm",
          "_0.si",
          "_0_Lucene50_0.doc",
          "_0_Lucene50_0.tim",
          "_0_Lucene50_0.tip",
          "_0_Lucene70_0.dvd",
          "_0_Lucene70_0.dvm",
          "_1.fdt",
          "_1.fdx",
          "_1.fnm",
          "_1.nvd",
          "_1.nvm",
          "_1.si",
          "_1_Lucene50_0.doc",
          "_1_Lucene50_0.pos",
          "_1_Lucene50_0.tim",
          "_1_Lucene50_0.tip",
          "_1_Lucene70_0.dvd",
          "_1_Lucene70_0.dvm",
          "_2.fdt",
          "_2.fdx",
          "_2.fnm",
          "_2.nvd",
          "_2.nvm",
          "_2.si",
          "_2_Lucene50_0.doc",
          "_2_Lucene50_0.pos",
          "_2_Lucene50_0.tim",
          "_2_Lucene50_0.tip",
          "_2_Lucene70_0.dvd",
          "_2_Lucene70_0.dvm",
          "segments_4"]]],
    "isMaster":"true",
    "isSlave":"false",
    "indexVersion":1528213960436,
    "generation":4,
    "master":{
      "replicateAfter":["commit"],
      "replicationEnabled":"true"}}}

The maxWriteMBPerSec parameter can be seen in the response header, so on
this system, it looks like it's working.

Thanks,
Shawn

Reply via email to