MisterRaindrop commented on code in PR #703:
URL: https://github.com/apache/iceberg-cpp/pull/703#discussion_r3371439007


##########
src/iceberg/arrow/arrow_io.cc:
##########
@@ -473,9 +473,14 @@ class ArrowOutputFile : public OutputFile {
 }  // namespace
 
 Result<std::string> ArrowFileSystemFileIO::ResolvePath(const std::string& 
file_location) {
-  if (file_location.find("://") != std::string::npos) {
-    ICEBERG_ARROW_ASSIGN_OR_RETURN(auto path, 
arrow_fs_->PathFromUri(file_location));
-    return path;
+  if (auto pos = file_location.find("://"); pos != std::string::npos) {
+    auto path = arrow_fs_->PathFromUri(file_location);
+    if (path.ok()) {
+      return path.ValueOrDie();
+    }
+    // PathFromUri rejects S3-compatible schemes (s3a/s3n, gs://, oss://);
+    // fall back to the scheme-less bucket/key.
+    return file_location.substr(pos + 3);

Review Comment:
    I think this fallback is too broad. As implemented, any URI scheme rejected 
by PathFromUri will be stri
     pped and interpreted as a path in the wrapped filesystem. For example, 
with an S3FileSystem, a mismatch ed URI such as file://bucket/prefix would be 
interpreted as bucket/prefix instead of failing fast.
   
   Can we make the fallback explicit and narrow?
   
     1. The scheme should be in an allowlist for the wrapped filesystem. For 
S3FileSystem, I think we can
     support well-known aliases such as s3a and s3n. Vendor schemes such as oss 
or gs should probably be opt-
     in or discussed separately, because they may also represent native OSS/GCS 
locations.
     2. The PathFromUri failure should match the expected scheme-mismatch / 
unsupported-scheme case. Other
     errors, such as malformed URI, authority handling errors, or local-path 
errors, should still be returned
     to the caller.
   
     In other words, this should be guarded by both a scheme allowlist and the 
expected error condition,
     rather than falling back for every PathFromUri failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to