Re: Data node not able to contact the resource manager

Jon Mack Tue, 06 Aug 2019 12:15:44 -0700

Can your share to the group what the xlm configuration file it was. Maybe
it could help someone in the future.


Thanks for letting us know the outcome.

On Mon, Aug 5, 2019 at 6:00 PM Daniel Santos <[email protected]> wrote:

> Hello
>
> I found out the cause of the error. When I submit a job to the cluster, I
> supply a xml configuration file with properties of the cluster I am
> connecting to.
> I had to replicate some properties related to addresses of yarn on that
> configuration file.
>
> I though that the cluster configuration would be sufficient, but no.
>
> Thanks for your interest
> Regards
>
>
> On 5 Aug 2019, at 19:21, Jon Mack <[email protected]> wrote:
>
> Doesn't look the client is resolving the IP Address correctly (IE
> 0.0.0.0/0.0.0.0:8030 ), try a nslookup on one of the clients (IE
> nslookup  hadoopresourcemanager ) to see what the client is resolving it
> to. Change the configuration to use the IP Address instead of the hostname
> if possible.
>
> Also do a netstat -an | grep 8030 on hadoopresourcemanager to verify the
> resource manager service is running.
>
>
> On Mon, Aug 5, 2019 at 12:38 PM Daniel Santos <[email protected]>
> wrote:
>
>> Hello,
>> I am using hosts files on all machines that are centrally managed through
>> puppet. When I run the yarn startup script on the hadoopresourcemanager
>> machine it creates the node managers one each slave.
>>
>> Regards
>>
>> Sent from my iPhone
>>
>> On 5 Aug 2019, at 16:01, Jeff Hubbs <[email protected]> wrote:
>>
>> Does "hadoopresourcemanager" resolve to a machine that's a Hadoop
>> resource manager? In Hadoop, it's absolutely vital that all names resolve
>> correctly in both directions.
>>
>> On 8/5/19 10:55 AM, Daniel Santos wrote:
>>
>> Hello Jon,
>>
>> I have the following yarn-site.xml :
>>
>> <configuration>
>> ? ? ? ? <!-- Site specific YARN configuration properties -->
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.acl.enable</name>
>> ? ? ? ? ? ? ? ? <value>0</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.hostname</name>
>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager,aux-services</name>
>> ? ? ? ? ? ? ? ? <value>mapreduce_shuffle</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.resource.memory-mb</name>
>> ? ? ? ? ? ? ? ? <value>1536</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.maximum-allocation-mb</name>
>> ? ? ? ? ? ? ? ? <value>1536</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.minimum-allocation-mb</name>
>> ? ? ? ? ? ? ? ? <value>128</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.vmem-check-enabled</name>
>> ? ? ? ? ? ? ? ? <value>false</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.address</name>
>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8032</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.scheduler.address</name>
>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8030</value>
>> ? ? ? ? </property>
>> ? ? ? ? <property>
>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.resource-tracker.address</name>
>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8031</value>
>> ? ? ? ? </property>
>> </configuration>
>>
>> So I can say, I already tried your suggestion
>>
>> Cheers
>>
>> On 5 Aug 2019, at 15:22, Jon Mack <[email protected]> wrote:
>>
>> Looks to me it's missing the resource manager configuration based on the
>> port it's trying to connect to..
>>
>> On Mon, Aug 5, 2019 at 9:15 AM Daniel Santos <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> I have a cluster with one machine holding the name nodes (primary and
>>> secondary) a yarn node (resource manager) and four data nodes.
>>> I am running hadoop 2.7.0.
>>>
>>> When I submit a job to the cluster I can see it in the scheduler
>>> webpage. If I go to the container page and check the logs, in the syslog
>>> file i have in the end the following :
>>>
>>> 2019-08-05 14:58:05,962 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>> 2019-08-05 14:58:06,962 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>> 2019-08-05 14:58:07,963 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 4 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>> 2019-08-05 14:58:08,965 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 5 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>> 2019-08-05 14:58:09,966 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 6 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>> 2019-08-05 14:58:10,967 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 7 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>> 2019-08-05 14:58:11,968 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 8 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>> 2019-08-05 14:58:12,969 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>> connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 9 time(s); retry 
>>> policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>> MILLISECONDS)
>>>
>>>
>>> I have checked the configuration of the resource manager and the data
>>> node where the application is running on and the property :
>>> ?yarn.resourcemanager.hostname that I have set in yarn-site.xml is shown.
>>> I have disabled ipv6 on the yarn machine, as some posts on the internet
>>> suggested. All the configuration files are the same in every node of the
>>> cluster.
>>>
>>> still I am getting these errors, and the application ends with a timeout.
>>>
>>> What am I doing wrong ?
>>>
>>> Thanks
>>> Regards
>>>
>>
>>
>>
>

Re: Data node not able to contact the resource manager

Reply via email to