Blake-Guo opened a new issue, #6613: URL: https://github.com/apache/iceberg/issues/6613
### Apache Iceberg version 0.12.1 ### Query engine Spark ### Please describe the bug 🐞 I explored the multiple SparkSessions (to connect to different data sources/data clusters) to load the Iceberg Table a bit. And I found a wired behavior. If I use the **new** SparkSession (with some incorrect parameters like `spark.sql.catalog.mycatalog.uri`) to access the table created by the previous SparkSession through (1) `spark.read().*.load("*")`, and then try (2) running some SQL on that table as well, everything still works(even with the incorrect parameter). The full test is given as below: ``` @Test public void multipleSparkSessions() throws AnalysisException { // Create the 1st SparkSession String endpoint = String.format("http://localhost:%s/metastore", port); ctx = SparkSession .builder() .master("local") .config("spark.ui.enabled", false) .config("spark.sql.catalog.mycatalog", "org.apache.iceberg.spark.SparkCatalog") .config("spark.sql.catalog.mycatalog.type", "hive") .config("spark.sql.catalog.mycatalog.uri", endpoint) .config("spark.sql.catalog.mycatalog.cache-enabled", "false") .config("spark.sql.sources.partitionOverwriteMode", "dynamic") .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") .getOrCreate(); // Create a table with the SparkSession String tableName = String.format("%s.%s", "test", Integer.toHexString(RANDOM.nextInt())); ctx.sql(String.format("CREATE TABLE mycatalog.%s USING iceberg " + "AS SELECT * FROM VALUES ('michael', 31), ('david', 45) AS (name, age)", tableName)); // Create a new SparkSession SparkSession newSession = ctx.newSession(); newSession.conf().set("spark.sql.catalog.mycatalog.uri", "http://non_exist_address"); // Access the created dataset above with the new SparkSession through session.read()...load() List<Row> dataset2 = newSession.read() .format("iceberg") .load(String.format("mycatalog.%s", tableName)).collectAsList(); dataset2.forEach(r -> System.out.println(r)); // Access the dataset through SQL newSession.sql( String.format("select * from mycatalog.%s", tableName)).collectAsList(); } ``` But if I use the new SparkSession to access the table through (1) `newSession.sql` first, the execution fails, and then (2) the `read().**.load("**")` will fail as well with error `java.lang.RuntimeException: Failed to get table info from metastore test.3d79f679`. IMO this makes more sense, given I provided the incorrect catalog uri, so the SparkSession shouldn't be able to locate that table. ``` @Test public void multipleSparkSessions() throws AnalysisException { ..same as above... // Access the dataset through SQL first assertThrows(java.lang.RuntimeException.class,() -> newSession.sql( String.format("select * from mycatalog.%s", tableName)).collectAsList()); // Access the created dataset above with the new SparkSession through session.read()...load() assertThrows(java.lang.RuntimeException.class,() -> newSession.read() .format("iceberg") .load(String.format("mycatalog.%s", tableName)).collectAsList()); } ``` Any idea what could lead to these two different behaviors with `spark.read().load()` versus `spark.sql()` in different sequences? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org