sethwhite-sf opened a new issue, #8977:
URL: https://github.com/apache/iceberg/issues/8977

   ### Apache Iceberg version
   
   1.4.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We have found that temporary views that reference an iceberg table become 
stale when catalog caching is enabled:  
spark.sql.catalog.catalog-name.cache-enabled=true.  
   
   Initially, when a view is created
   
   Dataset<Row> rdd = spark.read().format("iceberg").load("table1");
   rdd.createOrReplaceTempView("view1");
   
   The view and catalog cache reference the same org.apache.iceberg.Table 
object and the view reflects any changes that the application makes when it is 
queried:
   
   spark.sql("SELECT * from view1").show();
   // query returns latest state of the table
   
   However, once cache expiry occurs (after 30 seconds by default when caching 
is enabled), subsequent updates to the table, such as
   
   spark.sql("DELETE FROM table1 AS t WHERE t.id IS NULL");
   
   cause a new entry for the table to be created in the cache and the view no 
longer sees any of the changes that are made---it becomes stale---because the 
view is still using the original org.apache.iceberg.Table object which 
references an Iceberg table snapshot that is now no longer current.  The view 
and cache are no longer in sync.
   
   spark.sql("SELECT * from view1").show();
   // No longer returns latest state of the table
   
   The unit  test below illustrates the problem.  The test fails when the 
default catalog caching is enabled.
   
   ```
   @Test
     public void testViewConsistencyAfterCacheExpiration() throws Exception {
       Assume.assumeFalse("Avro does not support metadata delete", 
fileFormat.equals("avro"));
       createAndInitUnpartitionedTable();
   
       sql("INSERT INTO TABLE %s VALUES (1, 'hr'), (2, 'hardware'), (null, 
'hr')", tableName);
   
       Dataset<Row> rdd = spark.read().format("iceberg").load(tableName);
       rdd.createOrReplaceTempView("view1");
       assertEquals(
               "Should have expected rows",
               ImmutableList.of(row(null, "hr"),row(1, "hr"), row(2, 
"hardware")),
               sql("SELECT * FROM %s ORDER BY id", "view1"));
   
       Thread.sleep(40000);
      // Default cache expiration is 30 seconds.
   
       sql("DELETE FROM %s AS t WHERE t.id IS NULL", tableName);
       assertEquals(
               "Should have expected rows",
               ImmutableList.of(row(1, "hr"), row(2, "hardware")),
               sql("SELECT * FROM %s ORDER BY id", "view1"));
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to