fischcheng opened a new issue, #1512:
URL: https://github.com/apache/polaris/issues/1512
### Describe the bug
### Description:
When attempting to load a Polaris catalog using PyIceberg, the call to the
GET /api/catalog/v1/config?warehouse=<warehouse_path> endpoint fails with an
HTTP 404 error. Server logs indicate the reason is "Unable to find warehouse
<warehouse_path>". However, querying the Management API (GET
/api/management/v1/catalogs/{catalog_name}) confirms that a catalog does exist
with the exact matching default-base-location property.
Why would the Catalog API endpoint GET /api/catalog/v1/config fail to find
the warehouse configuration (s3://<warehouse_path> ) when the Management API
(GET /api/management/v1/catalogs/test_catalog) confirms that exact
configuration exists? Is this a potential bug in the 0.9.0 version, or is there
another configuration aspect or permission requirement for the /config endpoint
that might be missing?
### To Reproduce
### Steps to Reproduce:
1. Set up Docker Compose: Using a `docker-compose.yml` similar to the one
below, based on getting-started/eclipselink/docker-compose-minimum.yaml. The
key aspect is the polaris-setup service which creates the initial catalog.
```
services:
polaris:
# IMPORTANT: the image MUST contain the Postgres JDBC driver and
EclipseLink dependencies, see README for instructions
image: apache/polaris:latest
ports:
# API port
- "8181:8181"
# Management port (metrics and health checks)
- "8182:8182"
environment:
polaris.persistence.type: eclipse-link
polaris.persistence.eclipselink.configuration-file:
/deployments/config/eclipselink/persistence-minimum.xml
polaris.realm-context.realms: POLARIS
quarkus.otel.sdk.disabled: "true"
volumes:
- ../assets/eclipselink/:/deployments/config/eclipselink
healthcheck:
test: ["CMD", "curl", "http://localhost:8182/q/health"]
interval: 2s
timeout: 10s
retries: 10
start_period: 10s
```
2. After spinning up, use polaris CLI to create a catalog:
```
./polaris \
--client-id root \
--client-secret s3cr3t \
catalogs create \
--storage-type S3 \
--default-base-location "s3://my-lakehouse" \
--role-arn "arn:aws:iam::role" \
test_catalog
```
3. docker-compose up
4. PyIcerberg code
```
from pyiceberg.catalog import load_catalog
# from pyiceberg.exceptions import NoSuchCatalogError # Or appropriate
exception
polaris_host = "<YOUR_POLARIS_EC2_DNS_OR_IP>" # Redacted host
polaris_api_uri = f"http://{polaris_host}:8181/api/catalog" # Correct prefix
found via logs
s3_warehouse_location = "s3://my-lakehouse-bucket" # Must match
STORAGE_LOCATION used above
polaris_client_id = "root"
polaris_client_secret = "<REDACTED_SECRET>"
s3_role_arn_to_assume = "arn:aws:iam::<AWS_ACCOUNT_ID>:role/docker-iam-role"
# Redacted account ID
aws_region = "us-east-1"
catalog_properties = {
"type": "rest",
"uri": polaris_api_uri,
"credential": f"{polaris_client_id}:{polaris_client_secret}",
"warehouse": s3_warehouse_location,
"py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
f"s3.assume-role.arn": s3_role_arn_to_assume,
f"s3.assume-role.session-name": "pyiceberg-polaris-session",
f"s3.region": aws_region,
}
catalog = load_catalog(
name="polaris_catalog", # Logical name for this instance
**catalog_properties
)
```
### Actual Behavior
The load_catalog call fails. The underlying HTTP request to GET
/api/catalog/v1/config?warehouse=s3%3A%2F%2Fmy-lakehouse-bucket returns an HTTP
404 error.
```
[EL Fine]: sql: ... --SELECT ... FROM ENTITIES_ACTIVE WHERE ... NAME = ?))
bind => [..., s3://my-lakehouse-bucket]
INFO [org.apa.pol.ser.exc.IcebergExceptionMapper] ... Handling
runtimeException Unable to find warehouse s3://my-lakehouse-bucket
INFO [io.qua.htt.access-log] ... "GET
/api/catalog/v1/config?warehouse=s3%3A%2F%2Fmy-lakehouse-bucket HTTP/1.1" 404
111
```
### Expected Behavior
The load_catalog call should succeed, returning a valid catalog object. The
underlying call to GET
/api/catalog/v1/config?warehouse=s3://my-lakehouse-bucket should return HTTP
200 OK with the catalog configuration.
### Additional context
Querying the Management API confirms the catalog configuration seems correct:
```
# Get Token (replace root:<REDACTED_SECRET>)
ACCESS_TOKEN=$(curl -s -X POST -u "root:<REDACTED_SECRET>" -H "Content-Type:
application/x-www-form-urlencoded" -d "grant_type=client_credentials"
http://{HOST}:8181/api/catalog/v1/oauth/tokens | jq -r .access_token)
# Query Management API for specific catalog (replace {HOST})
curl -s -H "Authorization: Bearer ${ACCESS_TOKEN}" -H 'Accept:
application/json' http://{HOST}:8181/api/management/v1/catalogs/test_catalog |
jq .
```
{
"type": "INTERNAL",
"name": "test_catalog",
"properties": {
"default-base-location": "s3://my-lakehouse-bucket" // <-- EXACT MATCH!
},
"createTimestamp": 1746209705140,
"lastUpdateTimestamp": 1746209705140,
"entityVersion": 1,
"storageConfigInfo": {
"roleArn": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/docker-iam-role", //
Redacted
"externalId": null,
"userArn": null,
"region": null,
"storageType": "S3",
"allowedLocations": [
"s3://my-lakehouse-bucket"
]
}
}
This output clearly shows the default-base-location property is correctly
set to s3://my-lakehouse-bucket for the test_catalog.
Troubleshooting Steps Taken:
- Verified API path prefix is /api/catalog/v1/ via server logs. Updated
PyIceberg uri accordingly.
- Verified authentication works (can get token, requests get past 401 when
token is valid).
- Verified server configuration using the Management API (output above
confirms default-base-location matches).
- Attempted deleting and recreating the catalog using curl against the
Management API, ensuring the correct default-base-location was specified in the
payload. The issue persists.
- Verified Polaris basic startup is clean (no obvious errors in startup
logs).
### System information
Polaris Version: apache/polaris:latest (0.9.0)
Deployment: Docker Compose using
getting-started/eclipselink/docker-compose-minimum.yaml structure.
Database: Postgres (implied by eclipselink setup, use an already setup RDS
Postgres)
Client: PyIceberg (0.9.0) using pyiceberg.catalog.load_catalog
Python Version: 3.11
Host OS: Polaris running on Ubuntu EC2 by docker-compose, PyIceberg running
on OSX.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]