gaborkaszab commented on code in PR #14440:
URL: https://github.com/apache/iceberg/pull/14440#discussion_r2571733657


##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -48,6 +48,27 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = true;
 
+  /**
+   * Controls policy of caffeine cache
+   *
+   * <p>Behavior of specific values of cache.strategy:
+   *
+   * <ul>
+   *   <li>EXPIRE_AFTER_ACCESS - cache entries are never evicted as long as 
they are being accessed
+   *       frequently

Review Comment:
   This definition seems vague somewhat. Frequently is not very specific. Why 
not flip the definition and say `cache entry is removed from the cache after a 
specified time has passed since its last access or write operation`



##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -48,6 +48,27 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = true;
 
+  /**
+   * Controls policy of caffeine cache

Review Comment:
   I don't think we should express what Cache implementation we used 
internally. Instead of `caffeine cache` we can say `table cache`



##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -48,6 +48,27 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = true;
 
+  /**
+   * Controls policy of caffeine cache
+   *
+   * <p>Behavior of specific values of cache.strategy:
+   *
+   * <ul>
+   *   <li>EXPIRE_AFTER_ACCESS - cache entries are never evicted as long as 
they are being accessed
+   *       frequently
+   *   <li>EXPIRE_AFTER_WRITE - cache entries are evicted frequently after 
cache write
+   * </ul>
+   */
+  public static final String CACHE_EXPIRATION_POLICY = 
"cache.expiration-policy";
+
+  public enum CacheExpirationPolicy {
+    EXPIRE_AFTER_ACCESS,
+    EXPIRE_AFTER_WRITE
+  }
+
+  public static final CacheExpirationPolicy CACHE_EXPIRATION_POLICY_DEFAULT =
+      CacheExpirationPolicy.EXPIRE_AFTER_ACCESS;
+

Review Comment:
   The comment for `CACHE_EXPIRATION_INTERVAL_MS` seems off now. It mentions 
expiration after access but with this patch that's just one use-case.



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java:
##########
@@ -79,6 +80,27 @@ private SparkSQLProperties() {}
   public static final String EXECUTOR_CACHE_ENABLED = 
"spark.sql.iceberg.executor-cache.enabled";
   public static final boolean EXECUTOR_CACHE_ENABLED_DEFAULT = true;
 
+  public static final String EXECUTOR_CACHE_EXPIRATION_POLICY =
+      "spark.sql.iceberg.executor-cache.expiration-policy";
+
+  public enum ExecutorCacheExpirationPolicy {
+    EXPIRE_AFTER_ACCESS,
+    EXPIRE_AFTER_WRITE;
+
+    public static ExecutorCacheExpirationPolicy fromName(String name) {
+      Preconditions.checkArgument(
+          name != null, "ExecutorCacheExpirationPolicy reader type is null");
+      try {
+        return ExecutorCacheExpirationPolicy.valueOf(name.toUpperCase());
+      } catch (IllegalArgumentException e) {
+        throw new IllegalArgumentException("Unknown parquet reader type: " + 
name);

Review Comment:
   Parquet reader type?



##########
docs/docs/spark-configuration.md:
##########
@@ -191,6 +191,7 @@ val spark = SparkSession.builder()
 | spark.sql.iceberg.locality.enabled                     | false               
                                           | Report locality information for 
Spark task placement on executors                                               
                |
 | spark.sql.iceberg.executor-cache.enabled               | true                
                                           | Enables cache for executor-side 
(currently used to cache Delete Files)                                          
                |
 | spark.sql.iceberg.executor-cache.timeout               | 10                  
                                           | Timeout in minutes for executor 
cache entries                                                                   
                |
+| spark.sql.iceberg.executor-cache.expiration-policy    | EXPIRE_AFTER_ACCESS  
                                       | Policy for expiring executor cache 
entries: `EXPIRE_AFTER_ACCESS` (default, expires after last access) or 
`EXPIRE_AFTER_WRITE` (expires after write, regardless of access). |             
| 10                                                             | Timeout in 
minutes for executor cache entries                                              
                                     |

Review Comment:
   I'm not a huge expert in the Spark part of the project, but for me it seems 
that Spark executor cache is a completely different thing. This new property is 
meant to configure CachingCatalog instead, so this doc entry seems off for me.
   
   Update: Ahh, I think I get it. You not just adding this policy to 
CachingCatalog, but to some other caches within the project. I had the 
impression so far that this PR is concentrating only on CachingCatalog. I 
believe that we should keep the scope as narrow as possible. My recommendation 
is that tackle only one task at a time and start e.g. with CachingCatalog then 
a separate PR could cover Spark Executor Cache. The motivations to introduce 
such a policy could be different based on the different purpose of these caches.



##########
core/src/main/java/org/apache/iceberg/util/PropertyUtil.java:
##########
@@ -127,6 +129,21 @@ public static String propertyAsString(
     return defaultValue;
   }
 
+  public static <E extends Enum<E>> E propertyAsEnum(

Review Comment:
   I don't think you need this. Check how SnapshotMode is initialized in 
RESTSessionCatalog:
   
   ```this.snapshotMode =
           SnapshotMode.valueOf(
               PropertyUtil.propertyAsString(
                       mergedProps,
                       RESTCatalogProperties.SNAPSHOT_LOADING_MODE,
                       RESTCatalogProperties.SNAPSHOT_LOADING_MODE_DEFAULT)
                   .toUpperCase(Locale.US));```



##########
docs/docs/configuration.md:
##########
@@ -128,15 +128,16 @@ The value of these properties are not persisted as a part 
of the table metadata.
 
 Iceberg catalogs support using catalog properties to configure catalog 
behaviors. Here is a list of commonly used catalog properties:
 
-| Property                          | Default            | Description         
                                   |
-| --------------------------------- | ------------------ | 
------------------------------------------------------ |
-| catalog-impl                      | null               | a custom `Catalog` 
implementation to use by an engine  |
-| io-impl                           | null               | a custom `FileIO` 
implementation to use in a catalog   |
-| warehouse                         | null               | the root path of 
the data warehouse                    |
-| uri                               | null               | a URI string, such 
as Hive metastore URI               |
-| clients                           | 2                  | client pool size    
                                   |
-| cache-enabled                     | true               | Whether to cache 
catalog entries |
-| cache.expiration-interval-ms      | 30000              | How long catalog 
entries are locally cached, in milliseconds; 0 disables caching, negative 
values disable expiration |
+| Property                    | Default            | Description               
                             |

Review Comment:
   Could you help me what the reason is to change the existing rows in this 
table?



##########
.palantir/revapi.yml:
##########
@@ -1456,6 +1456,12 @@ acceptedBreaks:
         old: "method void 
org.apache.iceberg.PartitionStats::deletedEntry(org.apache.iceberg.Snapshot)"
         new: "method void 
org.apache.iceberg.PartitionStats::deletedEntry(org.apache.iceberg.Snapshot)"
         justification: "Changing deprecated code"
+      - code: "java.method.numberOfParametersChanged"

Review Comment:
   Just for the record: The reason this is protected is that 
TestableCachingCatalog can use this passing in a mocked Ticker for testing. In 
my opinion it should have been annotated with `@VisibleForTesting` and be 
package private, but this is how it is now.
   
   Anyway, We shouldn't break API at any circumstances, so as long as this PR 
has recapitalized changes like this, there is still some adjustments to do to 
keep existing API intact.
   Changing the signature of a protected method still breaks the API, because 
there could be applications inheriting from this class and using the protected 
methods. They would break with this PR.



##########
core/src/main/java/org/apache/iceberg/CatalogProperties.java:
##########
@@ -48,6 +48,27 @@ private CatalogProperties() {}
 
   public static final boolean CACHE_CASE_SENSITIVE_DEFAULT = true;
 
+  /**
+   * Controls policy of caffeine cache
+   *
+   * <p>Behavior of specific values of cache.strategy:
+   *
+   * <ul>
+   *   <li>EXPIRE_AFTER_ACCESS - cache entries are never evicted as long as 
they are being accessed
+   *       frequently
+   *   <li>EXPIRE_AFTER_WRITE - cache entries are evicted frequently after 
cache write

Review Comment:
   `cache entries are evicted frequently` I'm not sure what this means TBH. 
They are evicted after a configured interval. If it's configured short, then 
frequently, if not then not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to