jackye1995 commented on code in PR #6655: URL: https://github.com/apache/iceberg/pull/6655#discussion_r1085900059
########## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java: ########## @@ -67,11 +69,15 @@ public boolean caseSensitive() { } public boolean localityEnabled() { - if (table.io() instanceof HadoopFileIO) { - HadoopInputFile file = (HadoopInputFile) table.io().newInputFile(table.location()); - String scheme = file.getFileSystem().getScheme(); - boolean defaultValue = LOCALITY_WHITELIST_FS.contains(scheme); - return PropertyUtil.propertyAsBoolean(readOptions, SparkReadOptions.LOCALITY, defaultValue); + if (table.io() instanceof HadoopFileIO || table.io() instanceof ResolvingFileIO) { Review Comment: I am a bit concerned for people using `ResolvingFileIO` with this approach, previously we will only create an input file to check locality if it's `HadoopFileIO`, but now if the user is using `ResolvingFileIO ` this operation will be done for every single file even if it is not a HadoopFileIO for the specific location. I am wondering if we should make `ResolvingFileIO.implFromLocation` a public method so we can know the FileIO used for the location without the need to open input file. ########## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java: ########## @@ -67,11 +69,15 @@ public boolean caseSensitive() { } public boolean localityEnabled() { - if (table.io() instanceof HadoopFileIO) { - HadoopInputFile file = (HadoopInputFile) table.io().newInputFile(table.location()); - String scheme = file.getFileSystem().getScheme(); - boolean defaultValue = LOCALITY_WHITELIST_FS.contains(scheme); - return PropertyUtil.propertyAsBoolean(readOptions, SparkReadOptions.LOCALITY, defaultValue); + if (table.io() instanceof HadoopFileIO || table.io() instanceof ResolvingFileIO) { Review Comment: I am a bit concerned for people using `ResolvingFileIO` with this approach, previously we will only create an input file to check locality if it's `HadoopFileIO`, but now if the user is using `ResolvingFileIO` this operation will be done for every single file even if it is not a HadoopFileIO for the specific location. I am wondering if we should make `ResolvingFileIO.implFromLocation` a public method so we can know the FileIO used for the location without the need to open input file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org