Re: [I] Support Nessie catalog [iceberg-python]

via GitHub Fri, 07 Feb 2025 10:45:59 -0800


gmweaver commented on issue #19:
URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2643719088


   I ran into the same issue as @cee-shubham and my initial guess is that S3 
bucket authentication is not using the S3 keys configured on the server and is 
instead trying to use local S3 creds/keys? 
   
   I confirmed that the S3 keys I have on my Nessie server have access to the 
bucket by running a similar append operation via Spark. 
   
   Example code with output:
   
   ```
   >>> catalog = load_catalog(
   ...     "nessie",
   ...     **{
   ...         "uri": "http://nessie:19120/iceberg/main";,
   ...     },
   ... )
   >>> 
   >>> print(catalog.list_namespaces())
   [('demo',)]
   >>> schema = Schema(
   ...     NestedField(1, "id", IntegerType(), required=True),
   ...     NestedField(2, "name", StringType(), required=False),
   ... )
   >>> 
   >>> catalog.create_table("demo.test_pyiceberg_table", schema)
   test_pyiceberg_table(
     1: id: required int,
     2: name: optional string
   ),
   partition by: [],
   sort order: [],
   snapshot: null
   >>> table = catalog.load_table("demo.test_pyiceberg_table")
   >>> table.scan().to_pandas()
   Empty DataFrame
   Columns: [id, name]
   Index: []
   >>> data = pa.Table.from_pydict(
   ...     {
   ...         "id": np.array([1, 2, 3], dtype="int32"),
   ...         "name": ["Alice", "Bob", "Charlie"],
   ...     },
   ...     schema=schema.as_arrow(),
   ... )
   >>> 
   >>> table.append(data)
   Traceback (most recent call last):
     File 
"/Users/garrett.weaver/Library/Caches/pypoetry/virtualenvs/testing-py3.12/lib/python3.12/site-packages/s3fs/core.py",
 line 114, in _error_wrapper
       return await func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/Users/garrett.weaver/Library/Caches/pypoetry/virtualenvs/testing-py3.12/lib/python3.12/site-packages/aiobotocore/client.py",
 line 412, in _make_api_call
       raise error_class(parsed_response, operation_name)
   botocore.exceptions.ClientError: An error occurred (403) when calling the 
PutObject operation: Forbidden
   ```
   
   Similar code on Spark works:
   
   ```
   SPARK_PACKAGES = [
       "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.7.1",
       
"org.projectnessie.nessie-integrations:nessie-spark-extensions-3.4_2.12:0.99.0",
       "software.amazon.awssdk:bundle:2.20.126",
       "software.amazon.awssdk:url-connection-client:2.20.126",
   ]
   
   SPARK_SQL_EXTENSIONS = [
       "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
       "org.projectnessie.spark.extensions.NessieSparkSessionExtensions",
   ]
   
   SPARK_CONFIG = {
       "spark.jars.packages": ",".join(SPARK_PACKAGES),
       "spark.sql.extensions": ",".join(SPARK_SQL_EXTENSIONS),
       "spark.sql.catalog.nessie": "org.apache.iceberg.spark.SparkCatalog",
       "spark.sql.catalog.nessie.uri": "http://nessie:19120/iceberg/main";,
       "spark.sql.catalog.nessie.type": "rest",
   }
   
   spark = SparkSession.builder.config(map=SPARK_CONFIG).getOrCreate()
   
   >>> spark.sql(
   ...     """
   ...     CREATE TABLE IF NOT EXISTS nessie.demo.test_spark_table (
   ...         id INTEGER, 
   ...         name STRING
   ...     ) USING iceberg
   ...     """
   ... )
   DataFrame[]
   >>> 
   >>> spark.sql(
   ...     """
   ...     INSERT INTO nessie.demo.test_spark_table VALUES
   ...     (1, 'Alice'),
   ...     (2, 'Bob')
   ...     """
   ... ).show()
   ++                                                                           
   
   ||
   ++
   ++
   >>> spark.read.format("iceberg").load("nessie.demo.test_spark_table").show()
   +---+-----+                                                                  
   
   | id| name|
   +---+-----+
   |  1|Alice|
   |  2|  Bob|
   +---+-----+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support Nessie catalog [iceberg-python]

Reply via email to