> Markus Jelsma wrote on 01/06/12 at 06:22:54 -0800: > >Hi, > > > >We sometimes see tasks failing with the exception below. There are no > >network issues and the domainname resolves normally. Also, all nodes have > >a local DNS caching daemon running. Any idea why we see this error? It > >usually happens when there is more than one job running on the cluster. > > > >We could, of course, add all nodes in /etc/hosts but i prefer not. > > > >java.net.UnknownHostException: unknown host: namenode > > is 'namenode' here an fqdn? Both forward and reverse should resolve to > the same name. If this is fine, then you may want to check your local > caching resolver.
Yes, this is a a FQDN. We've checked pdnsd but nothing out of the ordinary so far. Will keep looking. Thanks > > > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214) > > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1192) > > at org.apache.hadoop.ipc.Client.call(Client.java:1046) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > > at $Proxy2.getProtocolVersion(Unknown Source) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) > > at > > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:118) > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:222) > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:187) > > at > > > >org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSys > >tem.java:89) > > > > at > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1328) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65) > > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1346) > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244) > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:122) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:254) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. > >java:1059) > > > > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > > >Thanks
