jackye1995 commented on code in PR #6655:
URL: https://github.com/apache/iceberg/pull/6655#discussion_r1085900059


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -67,11 +69,15 @@ public boolean caseSensitive() {
   }
 
   public boolean localityEnabled() {
-    if (table.io() instanceof HadoopFileIO) {
-      HadoopInputFile file = (HadoopInputFile) 
table.io().newInputFile(table.location());
-      String scheme = file.getFileSystem().getScheme();
-      boolean defaultValue = LOCALITY_WHITELIST_FS.contains(scheme);
-      return PropertyUtil.propertyAsBoolean(readOptions, 
SparkReadOptions.LOCALITY, defaultValue);
+    if (table.io() instanceof HadoopFileIO || table.io() instanceof 
ResolvingFileIO) {

Review Comment:
   I am a bit concerned for people using `ResolvingFileIO` with this approach, 
previously we will only create an input file to check locality if it's 
`HadoopFileIO`, but now if the user is using `ResolvingFileIO ` this operation 
will be done for every single file even if it is not a HadoopFileIO for the 
specific location.
   
   I am wondering if we should make `ResolvingFileIO.implFromLocation` a public 
method so we can know the FileIO used for the location without the need to open 
input file.



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java:
##########
@@ -67,11 +69,15 @@ public boolean caseSensitive() {
   }
 
   public boolean localityEnabled() {
-    if (table.io() instanceof HadoopFileIO) {
-      HadoopInputFile file = (HadoopInputFile) 
table.io().newInputFile(table.location());
-      String scheme = file.getFileSystem().getScheme();
-      boolean defaultValue = LOCALITY_WHITELIST_FS.contains(scheme);
-      return PropertyUtil.propertyAsBoolean(readOptions, 
SparkReadOptions.LOCALITY, defaultValue);
+    if (table.io() instanceof HadoopFileIO || table.io() instanceof 
ResolvingFileIO) {

Review Comment:
   I am a bit concerned for people using `ResolvingFileIO` with this approach, 
previously we will only create an input file to check locality if it's 
`HadoopFileIO`, but now if the user is using `ResolvingFileIO` this operation 
will be done for every single file even if it is not a HadoopFileIO for the 
specific location.
   
   I am wondering if we should make `ResolvingFileIO.implFromLocation` a public 
method so we can know the FileIO used for the location without the need to open 
input file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to