Re: Solr 7 + HDFS issue

Shawn Heisey Wed, 13 Jun 2018 06:42:42 -0700

On 6/12/2018 10:14 PM, Joe Obernberger wrote:

Thank you Shawn. It looks like it is being applied. This could besome sort of chain reaction where:
Drive or server fails. HDFS starts to replicate blocks which causesnetwork congestion. Solr7 can't talk, so initiates a replicationprocess which causes more network congestion....which causes morereplicas to replicate, and which eventually causes HBase (we runHBase+Solr on the same machines) to also not be able to talk. That ismy running hypothesis anyway!

I was also thinking that there was a possibility that a lot ofreplications were happening at once. At 75 megabytes per second each,it would only take a few of them to saturate a link at 2 gigabits, evenif the load sharing between gigabit links is perfect. (and depending onthe type of bonding in use, it might not be perfect)

75 MB per second is in the neighborhood of 700 megabits per second, soif three of those are happening at the same time and the disks canactually keep up, it would be enough to fill a 2Gb/s link.

We've made a change to limit how much bandwidth HDFS can use. Oneissue that we have seen is that the replicas fail to replicate, andretry, over and over. I believe they are getting a timeout error; isthat parameter adjustable?

To have any idea whether it's adjustable, I would need to know exactlywhat timeout is being exceeded. Can you share the full error foranything you're seeing?


Thanks,
Shawn

Re: Solr 7 + HDFS issue

Reply via email to