Re: [I] SparkSessionCatalog with JDBC catalog: SHOW TABLES IN ... returns error but table exists in JDBC catalog [iceberg]

via GitHub Tue, 19 Mar 2024 21:44:24 -0700


matepek commented on issue #10003:
URL: https://github.com/apache/iceberg/issues/10003#issuecomment-2008652290


   What do you mean by that I'm using JDBC catalog? I thought 
`spark.sql.catalogImplementation = hive` sets it to hive catalog. 
   
   (I know I have a knowledge gap and I'm trying to catch up so I appreciate if 
you correct me and explain.)
   
   My understanding of spark catalogs that there is always a `spark_catalog` 
which is a `hive` catalog because of the `spark.sql.catalogImplementation = 
hive`. 
   
   Also we created an `iceberg_catalog` which uses 
`org.apache.iceberg.spark.SparkCatalog` which was good to manage iceberg tables 
until v1.5, now views too.
   
   So before v1.5 we needed to store the views and "non-managed tables" in hive 
catalog and work together with iceberg  (managed) tables. For that we wrapped 
and set `spark_catalog` using `org.apache.iceberg.spark.SparkSessionCatalog` 
which meant to delegate functionalities between hive and iceberg catalogs. That 
worked okay, Actually we needed some customisation because 
`SparkSessionCatalog` was unable to properly list items from both catalogs so 
whenever we needed this functionality we list the items of the two catalogs and 
concatenated the results. So actually it was not working properly even before. 
It was a necessity to work work views and tables and "non-managed tables".
   
   Since v1.5 the listing seems even less reliable (see this issue). But as we 
are talking more about it I started to think that I might don't need 
`SparkSessionCatalog` anymore since views are managed entities now by 
`org.apache.iceberg.spark.SparkCatalog`. 
   I can just use use `iceberg_catalog` by default and whenever there is a rare 
need for "non-managed" table I can just specify the catalog like 
`spark_catalog.schema_for_non_managed_tables.table_name`. And I'm good.
   
   So now I'm gonna try removing the definition for `spark_catalog` and I hope 
that it will make this work. BRB.
   
   
   
   
   ## REMARKS: 
   
   by ""non-managed table" I mean something which his not managed by iceberg 
which is regular hive table. Ex.:
   ```sql
       create table schema_name.external_table (
               id LONG,
               dt DATE
       )
       partitioned by (dt)
       stored as PARQUET
       location 'gs://bucket/folder/'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] SparkSessionCatalog with JDBC catalog: SHOW TABLES IN ... returns error but table exists in JDBC catalog [iceberg]

Reply via email to