gardenia opened a new issue, #2032:
URL: https://github.com/apache/iceberg-python/issues/2032

   ### Apache Iceberg version
   
   None
   
   ### Please describe the bug 🐞
   
   Hi,
   
   I'm using the following code to connect to a kerberized hive metastore:
   
   ```
   from pyiceberg.catalog import load_catalog
   
   # Set up the Iceberg catalog
   catalog = load_catalog("hive", **{
           "type": "hive",
           "uri": "thrift://cluster1-hive-server:9083",
           "hive.kerberos-authentication": "true"
   })
   print("Initial Namespaces:", catalog.list_namespaces())
   ```
   
   Before running this I did a kinit:
     kinit -kt /var/keytabs/hive.keytab 
hiveuser/cluster1-hive-ser...@cluster1.com
   
   When I run the script I get the following error:
   
   ```
   Traceback (most recent call last):
     File 
"/home/sandbox-user/connect-to-hive-metastore-and-list-namespaces.py", line 20, 
in <module>
       print("Initial Namespaces:", catalog.list_namespaces())
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/sandbox-user/venv/lib/python3.12/site-packages/pyiceberg/catalog/hive.py",
 line 707, in list_namespaces
       with self._client as open_client:
     File 
"/home/sandbox-user/venv/lib/python3.12/site-packages/pyiceberg/catalog/hive.py",
 line 172, in __enter__
       self._transport.open()
     File 
"/home/sandbox-user/venv/lib/python3.12/site-packages/thrift/transport/TTransport.py",
 line 381, in open
       self.send_sasl_msg(self.OK, self.sasl.process())
                                   ^^^^^^^^^^^^^^^^^^^
     File "/usr/lib/python3/dist-packages/puresasl/client.py", line 16, in 
wrapped
       return f(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/lib/python3/dist-packages/puresasl/client.py", line 148, in 
process
       return self._chosen_mech.process(challenge)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/lib/python3/dist-packages/puresasl/mechanisms.py", line 495, in 
process
       kerberos.authGSSClientStep(self.context, '')
   kerberos.GSSError: (('Unspecified GSS failure.  Minor code may provide more 
information', 851968), ('Server hive/cluster1-hive-ser...@cluster1.com not 
found in Kerberos database', -1765328377))
   ```
   
   NOTE: I can connect just find with java iceberg in the same situation.
   
   I then ran the script with KRB5_TRACE=/dev/stdout and captured the following 
additional output:
   
   ```
   [345] 1747909692.415510: ccselect module realm chose cache 
FILE:/tmp/krb5cc_1001 with client principal 
hiveuser/cluster1-hive-ser...@cluster1.com for server principal 
hive/cluster1-hive-ser...@cluster1.com
   [345] 1747909692.415511: Getting credentials 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
hive/cluster1-hive-ser...@cluster1.com using ccache FILE:/tmp/krb5cc_1001
   [345] 1747909692.415512: Retrieving 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
krb5_ccache_conf_data/start_realm@X-CACHECONF: from FILE:/tmp/krb5cc_1001 with 
result: -1765328243/Matching credential not found (filename: /tmp/krb5cc_1001)
   [345] 1747909692.415513: Retrieving 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
hive/cluster1-hive-ser...@cluster1.com from FILE:/tmp/krb5cc_1001 with result: 
-1765328243/Matching credential not found (filename: /tmp/krb5cc_1001)
   [345] 1747909692.415514: Retrieving 
hiveuser/cluster1-hive-ser...@cluster1.com -> krbtgt/cluster1....@cluster1.com 
from FILE:/tmp/krb5cc_1001 with result: 0/Success
   [345] 1747909692.415515: Starting with TGT for client realm: 
hiveuser/cluster1-hive-ser...@cluster1.com -> krbtgt/cluster1....@cluster1.com
   [345] 1747909692.415516: Requesting tickets for 
hive/cluster1-hive-ser...@cluster1.com, referrals on
   [345] 1747909692.415517: Generated subkey for TGS request: aes256-cts/6798
   [345] 1747909692.415518: etypes requested in TGS request: aes256-cts
   [345] 1747909692.415520: Encoding request body and padata into FAST request
   [345] 1747909692.415521: Sending request (1080 bytes) to CLUSTER1.COM
   [345] 1747909692.415522: Resolving hostname cluster1-kerberos-server
   [345] 1747909692.415523: Sending initial UDP request to dgram 192.168.0.5:88
   [345] 1747909692.415524: Received answer (468 bytes) from dgram 
192.168.0.5:88
   [345] 1747909692.415525: Response was not from primary KDC
   [345] 1747909692.415526: Decoding FAST response
   [345] 1747909692.415527: TGS request result: -1765328377/Server 
hive/cluster1-hive-ser...@cluster1.com not found in Kerberos database
   [345] 1747909692.415528: Requesting tickets for 
hive/cluster1-hive-ser...@cluster1.com, referrals off
   [345] 1747909692.415529: Generated subkey for TGS request: aes256-cts/5F8A
   [345] 1747909692.415530: etypes requested in TGS request: aes256-cts
   [345] 1747909692.415532: Encoding request body and padata into FAST request
   [345] 1747909692.415533: Sending request (1080 bytes) to CLUSTER1.COM
   [345] 1747909692.415534: Resolving hostname cluster1-kerberos-server
   [345] 1747909692.415535: Sending initial UDP request to dgram 192.168.0.5:88
   [345] 1747909692.415536: Received answer (468 bytes) from dgram 
192.168.0.5:88
   [345] 1747909692.415537: Response was not from primary KDC
   [345] 1747909692.415538: Decoding FAST response
   [345] 1747909692.415539: TGS request result: -1765328377/Server 
hive/cluster1-hive-ser...@cluster1.com not found in Kerberos database
   ```
   
   To me this line stands out:
   
   ```
   [345] 1747909692.415511: Getting credentials 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
hive/cluster1-hive-ser...@cluster1.com using ccache FILE:/tmp/krb5cc_1001
   ```
   
   It was not clear to me why there was a remapping of "hiveuser" prefix in the 
principal to "hive" and I wasn't sure where that remapping was coming from.  At 
first I thought it might be something in my krb5.conf (or perhaps something 
that should be there but isn't).  But that fact that this works fine with java 
iceberg makes me question that.
   
   In an effort to try to explain the above I was looking in the pyiceberg code 
and found this line in pyiceberg/catalog/hive.py
   
   ```
               return TTransport.TSaslClientTransport(socket, 
host=url_parts.hostname, service="hive")
   ```
   
   When I speculatively changed that service="hive" part to service="hiveuser" 
in that code and re-ran the script it then worked as expected:
   
   ```
   [350] 1747910147.748592: ccselect module realm chose cache 
FILE:/tmp/krb5cc_1001 with client principal 
hiveuser/cluster1-hive-ser...@cluster1.com for server principal 
hiveuser/cluster1-hive-ser...@cluster1.com
   [350] 1747910147.748593: Getting credentials 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
hiveuser/cluster1-hive-ser...@cluster1.com using ccache FILE:/tmp/krb5cc_1001
   [350] 1747910147.748594: Retrieving 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
krb5_ccache_conf_data/start_realm@X-CACHECONF: from FILE:/tmp/krb5cc_1001 with 
result: -1765328243/Matching credential not found (filename: /tmp/krb5cc_1001)
   [350] 1747910147.748595: Retrieving 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
hiveuser/cluster1-hive-ser...@cluster1.com from FILE:/tmp/krb5cc_1001 with 
result: 0/Success
   [350] 1747910147.748596: Creating authenticator for 
hiveuser/cluster1-hive-ser...@cluster1.com -> 
hiveuser/cluster1-hive-ser...@cluster1.com, seqnum 821973613, subkey 
aes256-cts/C292, session key aes256-cts/8A76
   [350] 1747910147.748598: Read AP-REP, time 1747910147.748597, subkey (null), 
seqnum 214032946
   Initial Namespaces: [('default',)]
   ```
   
   Obviously this band-aid is very specific to my situation but the fact that 
it worked makes me wonder if that hard-coded "hive" service name needs to be a 
parameter or auto-sensed or otherwise potentially not hard-coded.
   
   My questions are:
   * is there something I'm missing here in my usage of pyiceberg which I can 
use to avoid this problem without having to make this band-aid.
   * if the answer to the above is no then is there some enhancement required 
here pyiceberg/catalog/hive.py to make this "hive" hard-coded service name 
string be configurable.
   
   
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to