gmweaver commented on issue #19:
URL: https://github.com/apache/iceberg-python/issues/19#issuecomment-2643719088
I ran into the same issue as @cee-shubham and my initial guess is that S3
bucket authentication is not using the S3 keys configured on the server and is
instead trying to use local S3 creds/keys?
I confirmed that the S3 keys I have on my Nessie server have access to the
bucket by running a similar append operation via Spark.
Example code with output:
```
>>> catalog = load_catalog(
... "nessie",
... **{
... "uri": "http://nessie:19120/iceberg/main",
... },
... )
>>>
>>> print(catalog.list_namespaces())
[('demo',)]
>>> schema = Schema(
... NestedField(1, "id", IntegerType(), required=True),
... NestedField(2, "name", StringType(), required=False),
... )
>>>
>>> catalog.create_table("demo.test_pyiceberg_table", schema)
test_pyiceberg_table(
1: id: required int,
2: name: optional string
),
partition by: [],
sort order: [],
snapshot: null
>>> table = catalog.load_table("demo.test_pyiceberg_table")
>>> table.scan().to_pandas()
Empty DataFrame
Columns: [id, name]
Index: []
>>> data = pa.Table.from_pydict(
... {
... "id": np.array([1, 2, 3], dtype="int32"),
... "name": ["Alice", "Bob", "Charlie"],
... },
... schema=schema.as_arrow(),
... )
>>>
>>> table.append(data)
Traceback (most recent call last):
File
"/Users/garrett.weaver/Library/Caches/pypoetry/virtualenvs/testing-py3.12/lib/python3.12/site-packages/s3fs/core.py",
line 114, in _error_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/garrett.weaver/Library/Caches/pypoetry/virtualenvs/testing-py3.12/lib/python3.12/site-packages/aiobotocore/client.py",
line 412, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the
PutObject operation: Forbidden
```
Similar code on Spark works:
```
SPARK_PACKAGES = [
"org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.7.1",
"org.projectnessie.nessie-integrations:nessie-spark-extensions-3.4_2.12:0.99.0",
"software.amazon.awssdk:bundle:2.20.126",
"software.amazon.awssdk:url-connection-client:2.20.126",
]
SPARK_SQL_EXTENSIONS = [
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"org.projectnessie.spark.extensions.NessieSparkSessionExtensions",
]
SPARK_CONFIG = {
"spark.jars.packages": ",".join(SPARK_PACKAGES),
"spark.sql.extensions": ",".join(SPARK_SQL_EXTENSIONS),
"spark.sql.catalog.nessie": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.nessie.uri": "http://nessie:19120/iceberg/main",
"spark.sql.catalog.nessie.type": "rest",
}
spark = SparkSession.builder.config(map=SPARK_CONFIG).getOrCreate()
>>> spark.sql(
... """
... CREATE TABLE IF NOT EXISTS nessie.demo.test_spark_table (
... id INTEGER,
... name STRING
... ) USING iceberg
... """
... )
DataFrame[]
>>>
>>> spark.sql(
... """
... INSERT INTO nessie.demo.test_spark_table VALUES
... (1, 'Alice'),
... (2, 'Bob')
... """
... ).show()
++
||
++
++
>>> spark.read.format("iceberg").load("nessie.demo.test_spark_table").show()
+---+-----+
| id| name|
+---+-----+
| 1|Alice|
| 2| Bob|
+---+-----+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]