gardenia opened a new issue, #2032: URL: https://github.com/apache/iceberg-python/issues/2032
### Apache Iceberg version None ### Please describe the bug 🐞 Hi, I'm using the following code to connect to a kerberized hive metastore: ``` from pyiceberg.catalog import load_catalog # Set up the Iceberg catalog catalog = load_catalog("hive", **{ "type": "hive", "uri": "thrift://cluster1-hive-server:9083", "hive.kerberos-authentication": "true" }) print("Initial Namespaces:", catalog.list_namespaces()) ``` Before running this I did a kinit: kinit -kt /var/keytabs/hive.keytab hiveuser/cluster1-hive-ser...@cluster1.com When I run the script I get the following error: ``` Traceback (most recent call last): File "/home/sandbox-user/connect-to-hive-metastore-and-list-namespaces.py", line 20, in <module> print("Initial Namespaces:", catalog.list_namespaces()) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sandbox-user/venv/lib/python3.12/site-packages/pyiceberg/catalog/hive.py", line 707, in list_namespaces with self._client as open_client: File "/home/sandbox-user/venv/lib/python3.12/site-packages/pyiceberg/catalog/hive.py", line 172, in __enter__ self._transport.open() File "/home/sandbox-user/venv/lib/python3.12/site-packages/thrift/transport/TTransport.py", line 381, in open self.send_sasl_msg(self.OK, self.sasl.process()) ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/puresasl/client.py", line 16, in wrapped return f(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/puresasl/client.py", line 148, in process return self._chosen_mech.process(challenge) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/puresasl/mechanisms.py", line 495, in process kerberos.authGSSClientStep(self.context, '') kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Server hive/cluster1-hive-ser...@cluster1.com not found in Kerberos database', -1765328377)) ``` NOTE: I can connect just find with java iceberg in the same situation. I then ran the script with KRB5_TRACE=/dev/stdout and captured the following additional output: ``` [345] 1747909692.415510: ccselect module realm chose cache FILE:/tmp/krb5cc_1001 with client principal hiveuser/cluster1-hive-ser...@cluster1.com for server principal hive/cluster1-hive-ser...@cluster1.com [345] 1747909692.415511: Getting credentials hiveuser/cluster1-hive-ser...@cluster1.com -> hive/cluster1-hive-ser...@cluster1.com using ccache FILE:/tmp/krb5cc_1001 [345] 1747909692.415512: Retrieving hiveuser/cluster1-hive-ser...@cluster1.com -> krb5_ccache_conf_data/start_realm@X-CACHECONF: from FILE:/tmp/krb5cc_1001 with result: -1765328243/Matching credential not found (filename: /tmp/krb5cc_1001) [345] 1747909692.415513: Retrieving hiveuser/cluster1-hive-ser...@cluster1.com -> hive/cluster1-hive-ser...@cluster1.com from FILE:/tmp/krb5cc_1001 with result: -1765328243/Matching credential not found (filename: /tmp/krb5cc_1001) [345] 1747909692.415514: Retrieving hiveuser/cluster1-hive-ser...@cluster1.com -> krbtgt/cluster1....@cluster1.com from FILE:/tmp/krb5cc_1001 with result: 0/Success [345] 1747909692.415515: Starting with TGT for client realm: hiveuser/cluster1-hive-ser...@cluster1.com -> krbtgt/cluster1....@cluster1.com [345] 1747909692.415516: Requesting tickets for hive/cluster1-hive-ser...@cluster1.com, referrals on [345] 1747909692.415517: Generated subkey for TGS request: aes256-cts/6798 [345] 1747909692.415518: etypes requested in TGS request: aes256-cts [345] 1747909692.415520: Encoding request body and padata into FAST request [345] 1747909692.415521: Sending request (1080 bytes) to CLUSTER1.COM [345] 1747909692.415522: Resolving hostname cluster1-kerberos-server [345] 1747909692.415523: Sending initial UDP request to dgram 192.168.0.5:88 [345] 1747909692.415524: Received answer (468 bytes) from dgram 192.168.0.5:88 [345] 1747909692.415525: Response was not from primary KDC [345] 1747909692.415526: Decoding FAST response [345] 1747909692.415527: TGS request result: -1765328377/Server hive/cluster1-hive-ser...@cluster1.com not found in Kerberos database [345] 1747909692.415528: Requesting tickets for hive/cluster1-hive-ser...@cluster1.com, referrals off [345] 1747909692.415529: Generated subkey for TGS request: aes256-cts/5F8A [345] 1747909692.415530: etypes requested in TGS request: aes256-cts [345] 1747909692.415532: Encoding request body and padata into FAST request [345] 1747909692.415533: Sending request (1080 bytes) to CLUSTER1.COM [345] 1747909692.415534: Resolving hostname cluster1-kerberos-server [345] 1747909692.415535: Sending initial UDP request to dgram 192.168.0.5:88 [345] 1747909692.415536: Received answer (468 bytes) from dgram 192.168.0.5:88 [345] 1747909692.415537: Response was not from primary KDC [345] 1747909692.415538: Decoding FAST response [345] 1747909692.415539: TGS request result: -1765328377/Server hive/cluster1-hive-ser...@cluster1.com not found in Kerberos database ``` To me this line stands out: ``` [345] 1747909692.415511: Getting credentials hiveuser/cluster1-hive-ser...@cluster1.com -> hive/cluster1-hive-ser...@cluster1.com using ccache FILE:/tmp/krb5cc_1001 ``` It was not clear to me why there was a remapping of "hiveuser" prefix in the principal to "hive" and I wasn't sure where that remapping was coming from. At first I thought it might be something in my krb5.conf (or perhaps something that should be there but isn't). But that fact that this works fine with java iceberg makes me question that. In an effort to try to explain the above I was looking in the pyiceberg code and found this line in pyiceberg/catalog/hive.py ``` return TTransport.TSaslClientTransport(socket, host=url_parts.hostname, service="hive") ``` When I speculatively changed that service="hive" part to service="hiveuser" in that code and re-ran the script it then worked as expected: ``` [350] 1747910147.748592: ccselect module realm chose cache FILE:/tmp/krb5cc_1001 with client principal hiveuser/cluster1-hive-ser...@cluster1.com for server principal hiveuser/cluster1-hive-ser...@cluster1.com [350] 1747910147.748593: Getting credentials hiveuser/cluster1-hive-ser...@cluster1.com -> hiveuser/cluster1-hive-ser...@cluster1.com using ccache FILE:/tmp/krb5cc_1001 [350] 1747910147.748594: Retrieving hiveuser/cluster1-hive-ser...@cluster1.com -> krb5_ccache_conf_data/start_realm@X-CACHECONF: from FILE:/tmp/krb5cc_1001 with result: -1765328243/Matching credential not found (filename: /tmp/krb5cc_1001) [350] 1747910147.748595: Retrieving hiveuser/cluster1-hive-ser...@cluster1.com -> hiveuser/cluster1-hive-ser...@cluster1.com from FILE:/tmp/krb5cc_1001 with result: 0/Success [350] 1747910147.748596: Creating authenticator for hiveuser/cluster1-hive-ser...@cluster1.com -> hiveuser/cluster1-hive-ser...@cluster1.com, seqnum 821973613, subkey aes256-cts/C292, session key aes256-cts/8A76 [350] 1747910147.748598: Read AP-REP, time 1747910147.748597, subkey (null), seqnum 214032946 Initial Namespaces: [('default',)] ``` Obviously this band-aid is very specific to my situation but the fact that it worked makes me wonder if that hard-coded "hive" service name needs to be a parameter or auto-sensed or otherwise potentially not hard-coded. My questions are: * is there something I'm missing here in my usage of pyiceberg which I can use to avoid this problem without having to make this band-aid. * if the answer to the above is no then is there some enhancement required here pyiceberg/catalog/hive.py to make this "hive" hard-coded service name string be configurable. ### Willingness to contribute - [x] I can contribute a fix for this bug independently - [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org