muniatl opened a new issue, #10709:
URL: https://github.com/apache/iceberg/issues/10709

   ### Query engine
   
   _No response_
   
   ### Question
   
   I have a piece of code which is working with S3 endpoint and  a Sql Catalog 
with sqlite. However for testing, I want to able to run it against a minio 
deployment that's hosted and running on localhost. I have tried various options 
with no luck. What are the parameters I need to pass to SqlCatalog and 
create_table? My code looks like this:
   we moved the environment from Singapore to east-us and setup vpc such that 
the traffic between EC2 and S3 went over private network. Found no performance 
difference with even single write( hovering around 0.8 second)
   Tried configuring S3 Express One Zone, but couldn't get it to work as 
pyIceberg uses pyArrow and pyArrow currently seems to have compatibility issue. 
Posted on Iceberg, pyIceberg and PyArrow forums 
   Tried direct large file write to S3 using boto3- got a response time of 
about 0.2 seconds( in the range that Yatin heard from AWS folks)
   Tried direct write of many small files to S3 using boto3 - was in the same 
range of about 0.2 to 0.3 seconds
   
   When accessing S3 from within an EC2 according to Rahul and some external 
documents there isn't a need to pass access keys, session token and secret 
explicitly, but pyIceberg doesn't seem to be picking from environment when I 
omit them. This could be some config problem. What was odd is that access 
errors happen intermittently after a session key is fetched a while ago. The 
error goes away when I replace with new key. Need to research a little more 
about IAM settings and pyIceberg with Rahul's help
   
   postgresql+psycopg2://postgres:ph1@localhost:5433/template1
   
   MINIO_ROOT_USER=minio-user
   MINIO_ROOT_PASSWORD=minio-user
   MINIO_VOLUMES="/mnt/minio"
   
   
   
   
   catalog = SqlCatalog(
       "default",
       **{
           "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
           #"uri" : 
f"postgresql+psycopg2://postgres:ph1@localhost:5433/template1",
           "warehouse": "s3://127.0.0.1:9000/iceberg", # have tried 
"s3://iceberg" "s3://127.0.0.1/iceberg" and completely commenting out warehouse
           "s3.endpoint" : "s3://127.0.0.1:9000",
           #"minio-root-user": "admin",
           #"minio-root-password": "password",
           #"minio-domain" : "minio",
           #"s3.access-key-id": "admin",
           #"s3.secret-access-key": "password",
   },
   )
   
   table = catalog.create_table(
           "default1.taxi_dataset",
           schema=df.schema,
       )
   OSError: When getting information for key 
'iceberg/default1.db/taxi_dataset/metadata/00000-671ce9cf-73ff-49a2-a22e-408d8758625b.metadata.json'
 in bucket '127.0.0.1:9000': AWS Error NETWORK_CONNECTION during HeadObject 
operation: curlCode: 6, Couldn't resolve host name. 
   
   I am able to access minio server, login and able to even upload files. Any 
pointers on what are the valid properties to pass for minio much appreciated
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to