rni-HMC opened a new issue, #1019:
URL: https://github.com/apache/iceberg-python/issues/1019

   ### Question
   
   ### Query engine
   
   HIVE
   
   ### Question
   
   I've followed the [Hive and Iceberg 
Quickstart](https://iceberg.apache.org/hive-quickstart/#docker-images), and 
have the hive4 docker container running. I can successfully reach the 
HiveServer2 Web UI at localhost:10002, and connect via `docker exec -it hive4 
beeline -u 'jdbc:hive2://localhost:10000/'`. 
   
   Using `pyhive`, I am able to run this python script to show the databases:
   
   ```python
   from pyhive import hive
   
   # Set up the Hive connection
   conn = hive.Connection(host="localhost", port=10000)
   print("Hive Connection:", conn)
   
   # Create a cursor and execute a query
   cursor = conn.cursor()
   cursor.execute("SHOW DATABASES")
   print("Cursor:", cursor)
   
   # Fetch and print the results
   databases = cursor.fetchall()
   print("Databases:", databases)
   
   cursor.close()
   conn.close()
   ```
   
   ```
   Hive Connection: <pyhive.hive.Connection object at 0x7f2886d72b20>
   Cursor: <pyhive.hive.Cursor object at 0x7f2884aabfa0>
   Databases: [('default',), ('nyc',)]
   ```
   
   However, when using `pyiceberg`, I'm unable to connect. I have some debug 
code here to show that the ports are open. But the stacktrace shows this 
`TSocket read 0 bytes` error. 
   
   ```python
   from pyiceberg.catalog.hive import HiveCatalog
   import socket
   
   HS2_PORT = 10000
   WEBUI_PORT = 10002
   HOST = "localhost"
   
   
   def check_port(host, port):
       with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
           result = sock.connect_ex((host, port))
           if result == 0:
               print(f"Port {port} on {host} is open")
           else:
               print(f"Port {port} on {host} is closed")
   
   
   check_port(HOST, HS2_PORT)
   check_port(HOST, WEBUI_PORT)
   
   
   def connect_to_hive_catalog() -> HiveCatalog:
       # Set up the Iceberg catalog
       catalog = HiveCatalog(
           "default",
           **{
               "uri": f"thrift://{HOST}:{HS2_PORT}",
           },
       )
       print("Hive Catalog:", catalog)
       return catalog
   
   
   catalog = connect_to_hive_catalog()
   print("Namespaces:", catalog.list_namespaces())
   ```
   ```
   Port 10000 on localhost is open
   Port 10002 on localhost is open
   Hive Catalog: default (<class 'pyiceberg.catalog.hive.HiveCatalog'>)
   Traceback (most recent call last):
     File "iceberg_poc/min_repro_issue.py", line 35, in <module>
       print("Namespaces:", catalog.list_namespaces())
     File 
"/home/richard/Projects/iceberg_poc/iceberg-poc/.venv/lib/python3.8/site-packages/pyiceberg/catalog/hive.py",
 line 644, in list_namespaces
       return list(map(self.identifier_to_tuple, 
open_client.get_all_databases()))
     File 
"/home/richard/Projects/iceberg_poc/iceberg-poc/.venv/lib/python3.8/site-packages/hive_metastore/ThriftHiveMetastore.py",
 line 2798, in get_all_databases
       return self.recv_get_all_databases()
     File 
"/home/richard/Projects/iceberg_poc/iceberg-poc/.venv/lib/python3.8/site-packages/hive_metastore/ThriftHiveMetastore.py",
 line 2809, in recv_get_all_databases
       (fname, mtype, rseqid) = iprot.readMessageBegin()
     File 
"/home/richard/Projects/iceberg_poc/iceberg-poc/.venv/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 134, in readMessageBegin
       sz = self.readI32()
     File 
"/home/richard/Projects/iceberg_poc/iceberg-poc/.venv/lib/python3.8/site-packages/thrift/protocol/TBinaryProtocol.py",
 line 217, in readI32
       buff = self.trans.readAll(4)
     File 
"/home/richard/Projects/iceberg_poc/iceberg-poc/.venv/lib/python3.8/site-packages/thrift/transport/TTransport.py",
 line 62, in readAll
       chunk = self.read(sz - have)
     File 
"/home/richard/Projects/iceberg_poc/iceberg-poc/.venv/lib/python3.8/site-packages/thrift/transport/TSocket.py",
 line 166, in read
       raise TTransportException(type=TTransportException.END_OF_FILE,
   thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
   ```
   
   Additionally, I am able to connect via [DBeaver](https://dbeaver.io/) by 
using JDBC URL: `jdbc:hive2://localhost:10000` 
   
![image](https://github.com/user-attachments/assets/1d677cc5-6e85-4ba4-9937-bf0d76bf2981)
   
   So it seems that there is something wrong with how I am using `pyiceberg`. 
Can someone help me understand what's going on and how to troubleshoot?
   
   Cross-posted from https://github.com/apache/iceberg/issues/10903


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to