jotarada opened a new issue, #12024:
URL: https://github.com/apache/iceberg/issues/12024
### Apache Iceberg version
1.4.3
### Query engine
Spark
### Please describe the bug 🐞
We have this schema that contains huge amount of tables (8k+) and we notice
timeouts when using hivecatalog iceberg impl, but spark default one is super
fast.
Example:
If we run a spark session with this conf:
```
pyspark --master yarn
--packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.4.3
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
--conf spark.sql.catalog.spark_catalog.type=hive
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.iceberg.type=hive
```
and run `spark.sql("show tables in some_schema").show()` it takes +/- 15secs
as we see it uses the spark impl to access hive tables. We can see that on our
metastore logs:
```
INFO 2025-01-21T18:25:24.565000057Z map[class:HiveMetaStore.audit
log:ugi=jorge.arada ip=10.123.123.123 cmd=source:123.123.123.123 get_database:
some_schema thread:pool-12-thread-19]
INFO 2025-01-21T18:25:24.565000057Z map[class:metastore.HiveMetaStore
log:26: source:10.123.123.123 get_database: some_schema
thread:pool-12-thread-19]
INFO 2025-01-21T18:25:24.571000099Z map[class:metastore.HiveMetaStore
log:26: source:10.123.123.123 get_database: some_schema
thread:pool-12-thread-19]
INFO 2025-01-21T18:25:24.571000099Z map[class:HiveMetaStore.audit
log:ugi=jorge.arada ip=10.123.123.123 cmd=source:123.123.123.123 get_database:
some_schema thread:pool-12-thread-19]
INFO 2025-01-21T18:25:24.579999923Z map[class:HiveMetaStore.audit
log:ugi=jorge.arada ip=123.123.123.123 cmd=source:123.123.123.123 get_tables:
db=some_schema pat=* thread:pool-12-thread-19]
INFO 2025-01-21T18:25:24.579999923Z map[class:metastore.HiveMetaStore
log:26: source:123.123.123.123 get_tables: db=some_schema pat=*
thread:pool-12-thread-19]
```
But if we run `spark.sql("show tables in iceberg.some_schema").show()` it
takes up to 5min and we can see in the logs a different method was called
```
INFO 2025-01-21T18:29:49.118000030Z map[class:HiveMetaStore.audit
log:ugi=jorge.arada ip=123.123.123.123 cmd=source:123.123.123.123
get_all_tables: db=some_schema thread:pool-12-thread-129]
INFO 2025-01-21T18:29:49.118000030Z map[class:metastore.HiveMetaStore
log:135: source:123.123.123.123 get_all_tables: db=some_schema
thread:pool-12-thread-129]
```
Tested on spark 3.3 and 3.5
And from what i could read on the iceberg code it seems to be the same for
iceberg 1.7.X
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]