We have an existing CDH 5.5.1 cluster with simple authentication and no 
authorization. We are building out a new cluster and plan to move to CDH 5.8.2 
wiith Kerberos based authentication. We have an existing MIT Kerberos 
infrastructure which we sucessfully use for a variety of services. 
(ssh,apache,postfix)

I am very confident that out /etc/krb5.conf and name resolution is working. I 
have even used HadoopDNSVerifier-1.0.jar to verify that java sees the same name 
canonicalization that we see.

I have built and test cluster and closely followed the instructions on the 
secure hadoop install doc from the clodera site making sure that all the conf 
files are properly edited and all the Kerberos keytabs contain the correct 
principals and have the correct permissions.

We are using HA namenodes with Quorm based journalmanagers

I am running into a persistent problem with many hadoop compents when they need 
to talk securely to remote servers. The two example that I post here are the 
namenode needing to talk to remote journalnodes and command line hdfs client 
needing to speak to a remote namenode. Both give the same error

Server has invalid Kerberos principal: 
hdfs/[email protected]; Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: 
"aw1hdnn002.tnbsound.com":8020;

There is not much on the inter-webs about this and the error that is showing up 
is leading me to belive that the issue is aroung the kerberos realm being used 
in one place and not the other.

I just can not seem to figure out what is going on here as I know these are 
vaild principals. I have added a snippet at the end where I have enabled 
kerberos debugging to see if that helps at all

The weird part is that this error applies only to remote daemons. The local 
namenode and journal node does not have the issue. We can “speak” locally but 
not remotely.

All and Any help is greatly appreciated

#
# This is me with hdfs kerberos credentials trying to run hdfs dfsadmin 
-refreshServiceAcl
#

hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 53$ klist
Ticket cache: FILE:/tmp/krb5cc_115
Default principal: hdfs/[email protected]
Valid starting Expires Service principal
10/20/2016 15:34:49 10/21/2016 15:34:49 krbtgt/[email protected]
renew until 10/27/2016 15:34:49

hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 54$ hdfs dfsadmin -refreshServiceAcl
Refresh service acl successful for aw1hdnn001.tnbsound.com/10.132.8.19:8020
refreshServiceAcl: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
hdfs/[email protected]; Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: 
"aw1hdnn002.tnbsound.com":8020;

#
# This is the namenode trying to start up and contant and off server jornalnode
#
2016-10-20 16:51:40,703 WARN org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hdfs/[email protected] 
(auth:KERBEROS) cause:java.io.IOException: java.lang.IllegalArgumentException: 
Server has invalid Kerberos principal: hdfs/[email protected]
10.132.8.21:8485: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
hdfs/[email protected]; Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: 
"aw1hdrm001.tnbsound.com":8485;

#
# This is me with hdfs kerberos credentials trying to run hdfs dfsadmin 
-refreshServiceAcl with debug into
#
hdfs@aw1hdnn001 /var/lib/hadoop-hdfs 46$ 
HADOOP_OPTS="-Dsun.security.krb5.debug=true" hdfs dfsadmin -refreshServiceAcl
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
>>>KinitOptions cache name is /tmp/krb5cc_115
>>>DEBUG <CCacheInputStream> client principal is 
>>>hdfs/[email protected]
>>>DEBUG <CCacheInputStream> server principal is 
>>>krbtgt/[email protected]
>>>DEBUG <CCacheInputStream> key type: 18
>>>DEBUG <CCacheInputStream> auth time: Thu Oct 20 16:55:42 UTC 2016
>>>DEBUG <CCacheInputStream> start time: Thu Oct 20 16:55:42 UTC 2016
>>>DEBUG <CCacheInputStream> end time: Fri Oct 21 16:55:42 UTC 2016
>>>DEBUG <CCacheInputStream> renew_till time: Thu Oct 27 16:55:42 UTC 2016
>>> CCacheInputStream: readFlags() FORWARDABLE; PROXIABLE; RENEWABLE; INITIAL; 
>>> PRE_AUTH;
>>>DEBUG <CCacheInputStream> client principal is 
>>>hdfs/[email protected]
>>>DEBUG <CCacheInputStream> server principal is 
>>>X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/[email protected]
>>>DEBUG <CCacheInputStream> key type: 0
>>>DEBUG <CCacheInputStream> auth time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> start time: null
>>>DEBUG <CCacheInputStream> end time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> renew_till time: null
>>> CCacheInputStream: readFlags()
>>>DEBUG <CCacheInputStream> client principal is 
>>>hdfs/[email protected]
>>>DEBUG <CCacheInputStream> server principal is 
>>>X-CACHECONF:/krb5_ccache_conf_data/pa_type/krbtgt/[email protected]
>>>DEBUG <CCacheInputStream> key type: 0
>>>DEBUG <CCacheInputStream> auth time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> start time: null
>>>DEBUG <CCacheInputStream> end time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> renew_till time: null
>>> CCacheInputStream: readFlags()
Found ticket for hdfs/[email protected] to go to 
krbtgt/[email protected] expiring on Fri Oct 21 16:55:42 UTC 2016
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for hdfs/[email protected] to go to 
krbtgt/[email protected] expiring on Fri Oct 21 16:55:42 UTC 2016
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23 1 3.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KdcAccessibility: reset
>>> KrbKdcReq send: kdc=dc1util003.tnbsound.com UDP:88, timeout=30000, number 
>>> of retries =3, #bytes=734
>>> KDCCommunication: kdc=dc1util003.tnbsound.com UDP:88, timeout=30000,Attempt 
>>> =1, #bytes=734
>>> KrbKdcReq send: #bytes read=721
>>> KdcAccessibility: remove dc1util003.tnbsound.com
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 561537595
Created InitSecContextToken:
0000: 01 00 6E 82 02 7F 30 82 02 7B A0 03 02 01 05 A1 ..n...0.........
0010: 03 02 01 0E A2 07 03 05 00 20 00 00 00 A3 82 01 ......... ......
0020: 7A 61 82 01 76 30 82 01 72 A0 03 02 01 05 A1 0E za..v0..r.......
0030: 1B 0C 54 4E 42 53 4F 55 4E 44 2E 43 4F 4D A2 2A ..TNBSOUND.COM.*
0040: 30 28 A0 03 02 01 00 A1 21 30 1F 1B 04 68 64 66 0(......!0...hdf
0050: 73 1B 17 61 77 31 68 64 6E 6E 30 30 31 2E 74 6E s..aw1hdnn001.tn
0060: 62 73 6F 75 6E 64 2E 63 6F 6D A3 82 01 2D 30 82 bsound.com...-0.
0070: 01 29 A0 03 02 01 12 A1 03 02 01 01 A2 82 01 1B .)..............
0080: 04 82 01 17 04 6E 26 46 08 EA 9C 61 08 80 B8 4B .....n&F...a...K
0090: AF 7C D2 CD 5E 47 19 3D A1 FB CD 8D 41 F4 C9 49 ....^G.=....A..I
00A0: 09 95 1C C7 9A D8 1B 92 0F 3C E0 5F 41 BF 99 96 .........<._A...
00B0: 42 A9 2D 17 D6 F0 AB 41 72 3E 7E F7 13 33 E2 0A B.-....Ar>...3..
00C0: 2D F5 71 AD 97 9A 9D 7F E0 EA 1A 29 7C D4 47 AB -.q........)..G.
00D0: B4 7E C1 A1 C5 28 DD 46 F1 C4 17 0B FC DB C9 D3 .....(.F........
00E0: F4 4D C2 1F 6C 59 A6 C4 9E 9D FD 56 E3 B0 31 E6 .M..lY.....V..1.
00F0: C6 6E 50 44 2C 07 44 91 40 F7 C8 6E AD 1E FB 26 .nPD,[email protected]...&
0100: EC 6D E4 ED BC F8 15 17 0B 31 B6 4B 68 64 03 E4 .m.......1.Khd..
0110: 28 9B A5 9D AE 2A DF 1B BD 0F B2 AE B3 BB E0 4D (....*.........M
0120: 14 D1 9C E0 AC 99 59 1B B6 28 22 E2 B5 55 52 58 ......Y..("..URX
0130: D2 61 39 DE 8F C8 3F E6 6F EB 41 5D E1 F2 43 40 .a9...?.o.A]..C@
0140: 8F AC 78 C8 09 35 7B BA 39 6B CD C6 01 7B 90 0B ..x..5..9k......
0150: 20 0C 49 0D 8B E5 2B F1 E6 6F 38 4E EA DF 5C A9 .I...+..o8N..\.
0160: 40 AE 11 75 AE B2 E2 35 13 A8 CE CF E7 F5 92 CB @..u...5........
0170: A5 66 53 47 92 5A EF 31 CD 60 CD 67 46 D0 B7 0D .fSG.Z.1.`.gF...
0180: B6 76 FE 09 B1 03 16 FE B8 57 6E 08 9A E6 DD F8 .v.......Wn.....
0190: D3 AA 00 54 6C D4 70 61 95 08 CF A4 81 E7 30 81 ...Tl.pa......0.
01A0: E4 A0 03 02 01 12 A2 81 DC 04 81 D9 4E 48 9E 35 ............NH.5
01B0: 57 7C 7C 54 1C 9F 41 FE F3 C0 94 07 E2 D8 EE 38 W..T..A........8
01C0: BA 4A DA 97 43 04 B5 96 F6 A9 34 FD 54 FF 7B 96 .J..C.....4.T...
01D0: DA DD A9 6F C4 7B A5 E4 50 9F 9E 1A 62 D3 F3 3C ...o....P...b..<
01E0: 50 50 E9 02 05 F2 37 52 4D BC 86 D8 2B A4 9F FE PP....7RM...+...
01F0: 97 4C 01 7F E6 B4 8B 66 1F 6E 63 FD 3F EF 57 E9 .L.....f.nc.?.W.
0200: 04 E9 BE 28 4C 03 BC 26 EB EF EC DC 8C 48 C0 51 ...(L..&.....H.Q
0210: 7B 2B 5B 0F 16 7C 83 D0 73 F9 2A 94 CF 67 F2 F8 .+[.....s.*..g..
0220: 11 CC 2B E9 0D FE 95 F5 7E 2B C4 40 19 FE FE 6F [email protected]
0230: B7 C4 B8 7E 87 D1 0A 98 8A F2 B0 1A DF FA 27 24 ..............'$
0240: C2 EE 06 FE 3F 36 57 3D 6C B9 F3 18 98 19 D6 A1 ....?6W=l.......
0250: F4 49 57 5D 58 6E 88 C9 2E 1F FA 7D 53 24 B9 67 .IW]Xn......S$.g
0260: 02 85 C2 2C 01 25 18 BA BF 0E 64 A2 C3 06 7D AC ...,.%....d.....
0270: D6 11 A6 F4 ED 47 71 22 CC D4 E8 54 08 17 51 E6 .....Gq"...T..Q.
0280: EE 6F FE 31 37 .o.17
Entered Krb5Context.initSecContext with state=STATE_IN_PROCESS
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting peerSeqNumber to: 374605590
Krb5Context.unwrap: token=[05 04 01 ff 00 0c 00 00 00 00 00 00 16 54 07 16 01 
01 00 00 c5 67 32 c5 74 d0 68 ef 82 46 a8 85 ]
Krb5Context.unwrap: data=[01 01 00 00 ]
Krb5Context.wrap: data=[01 01 00 00 ]
Krb5Context.wrap: token=[05 04 00 ff 00 0c 00 00 00 00 00 00 21 78 62 3b 01 01 
00 00 a1 51 c9 92 95 bd cd 88 66 59 b7 49 ]
Refresh service acl successful for aw1hdnn001.tnbsound.com/10.132.8.19:8020
refreshServiceAcl: Failed on local exception: java.io.IOException: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
hdfs/[email protected]; Host Details : local host is: 
"aw1hdnn001.tnbsound.com/10.132.8.19"; destination host is: 
"aw1hdnn002.tnbsound.com":8020;

#
# hdfs-site.xml
#

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

    <!--               -->
    <!-- HDFS security -->
    <!--               -->

    <property>
        <name>dfs.block.access.token.enable</name>
        <value>true</value>
    </property>

    <!--              -->
    <!-- HA namespace -->
    <!--              -->

    <property>
        <name>dfs.nameservices</name>
        <value>nbs-aw1-test</value>
    </property>

    <!--              -->
    <!-- HA namenodes -->
    <!--              -->

    <property>
        <name>dfs.ha.namenodes.nbs-aw1-test</name>
        <value>nn1,nn2</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address.nbs-aw1-test.nn1</name>
        <value>aw1hdnn001.tnbsound.com:8020</value>
    </property>

    <property>
        <name>dfs.namenode.http-address.nbs-aw1-test.nn1</name>
        <value>aw1hdnn001.tnbsound.com:50070</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address.nbs-aw1-test.nn2</name>
        <value>aw1hdnn002.tnbsound.com:8020</value>
    </property>

    <property>
        <name>dfs.namenode.http-address.nbs-aw1-test.nn2</name>
        <value>aw1hdnn002.tnbsound.com:50070</value>
    </property>

    <!--              -->
    <!-- FS image dir -->
    <!--              -->

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/var/lib/hadoop-hdfs/dfs/name</value>
    </property>

    <!--            -->
    <!-- QJM config -->
    <!--            -->

    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        
<value>qjournal://aw1hdnn001.tnbsound.com:8485;aw1hdnn002.tnbsound.com:8485;aw1hdrm001.tnbsound.com:8485/nbs-aw1-test</value>
    </property>

    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/var/lib/hadoop-hdfs/dfs/journal</value>
    </property>

    <!--                      -->
    <!-- JournalNode security -->
    <!--                      -->

    <property>
        <name>dfs.journalnode.keytab.file</name>
        <value>/etc/krb5/hdfs.keytab</value>
    </property>

    <property>
        <name>dfs.journalnode.kerberos.principal</name>
        <value>hdfs/[email protected]</value>
    </property>

    <property>
        <name>dfs.journalnode.kerberos.internal.spnego.principal</name>
        <value>HTTP/[email protected]</value>
    </property>

    <!--                   -->
    <!-- Namenode failover -->
    <!--                   -->

    <property>
        <name>dfs.client.failover.proxy.provider.nbs-aw1-test</name>
        
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence
        shell(/bin/true)</value>
    </property>

    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value>
    </property>

    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>3000</value>
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        
<value>aw1zook001.tnbsound.com:2181,aw1zook002.tnbsound.com:2181,aw1zook003.tnbsound.com:2181</value>
    </property>

    <!--                      -->
    <!-- NameNode security -->
    <!--                      -->

    <property>
        <name>dfs.namenode.keytab.file</name>
        <value>/etc/krb5/hdfs.keytab</value>
    </property>

    <property>
        <name>dfs.namenode.kerberos.principal</name>
        <value>hdfs/[email protected]</value>
    </property>

    <property>
        <name>dfs.namenode.kerberos.internal.spnego.principal</name>
        <value>HTTP/[email protected]</value>
    </property>

    <!--          -->
    <!-- Datanode -->
    <!--          -->

    <property>
        <name>dfs.datanode.data.dir</name>
        
<value>/data01/hadoop-hdfs/dfs/data,/data02/hadoop-hdfs/dfs/data,/data03/hadoop-hdfs/dfs/data,/data04/hadoop-hdfs/dfs/data</value>
    </property>

    <property>
        <name>dfs.datanode.failed.volumes.tolerated</name>
        <value>0</value>
    </property>

    <property>
        <name>dfs.datanode.fsdataset.volume.choosing.policy</name>
        
<value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
    </property>

    <property>
        
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
        <value>107374182400</value>
    </property>

    <property>
        
<name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
        <value>0.75</value>
    </property>

    <!--                   -->
    <!-- DataNode security -->
    <!--                   -->

    <property>
        <name>dfs.datanode.data.dir.perm</name>
        <value>700</value>
    </property>

    <property>
        <name>dfs.datanode.keytab.file</name>
        <value>/etc/krb5/hdfs.keytab</value>
    </property>

    <property>
        <name>dfs.datanode.kerberos.principal</name>
        <value>hdfs/[email protected]</value>
    </property>

    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:1004</value>
    </property>

    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:1006</value>
    </property>

    <!--      -->
    <!-- Misc -->
    <!--      -->

    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>

    <property>
        <name>dfs.permissions.superusergroup</name>
        <value>hadoop</value>
    </property>

    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.hosts.exclude</name>
        <value>/etc/hadoop/conf/hosts.exclude</value>
        <final>true</final>
    </property>

    <!--
    From O'Reilly Hadoop Operations: A general guideline for setting
    dfs.namenode.handler.count is to make it the natural logarithm of
    the number of cluster nodes times 20 (as a whole number).  python -c
    'import math ; print int(math.log(num_of_nodes) * 20)'
    -->
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>24</value>
    </property>

    <!--              -->
    <!-- Web security -->
    <!--              -->

    <property>
        <name>dfs.web.authentication.kerberos.keytab</name>
        <value>/etc/krb5/hdfs.keytab</value>
    </property>

    <property>
        <name>dfs.web.authentication.kerberos.principal</name>
        <value>HTTP/[email protected]</value>
    </property>

    <property>
        <name>dfs.http.policy</name>
        <value>HTTP_ONLY</value>
    </property>

</configuration>

Reply via email to