jdockerty commented on code in PR #520:
URL: https://github.com/apache/iceberg-rust/pull/520#discussion_r1703160577


##########
crates/iceberg/tests/file_io_gcs_test.rs:
##########
@@ -0,0 +1,103 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Integration tests for FileIO Google Cloud Storage (GCS).
+
+use bytes::Bytes;
+use iceberg::io::{FileIO, FileIOBuilder, GCS_BUCKET, GCS_CREDENTIAL_PATH};
+use iceberg_test_utils::set_up;
+
+// static DOCKER_COMPOSE_ENV: RwLock<Option<DockerCompose>> = 
RwLock::new(None);
+
+// TODO: use compose with fake-gcs-server

Review Comment:
   This is here because `fake-gcs-server` allows unauthenticated requests, but 
I don't believe OpenDAL does for GCS, see my other comment for my understanding 
of it.



##########
crates/iceberg/tests/file_io_gcs_test.rs:
##########
@@ -0,0 +1,103 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Integration tests for FileIO Google Cloud Storage (GCS).
+
+use bytes::Bytes;
+use iceberg::io::{FileIO, FileIOBuilder, GCS_BUCKET, GCS_CREDENTIAL_PATH};
+use iceberg_test_utils::set_up;
+
+// static DOCKER_COMPOSE_ENV: RwLock<Option<DockerCompose>> = 
RwLock::new(None);
+
+// TODO: use compose with fake-gcs-server
+//#[ctor]
+//fn before_all() {
+//    let mut guard = DOCKER_COMPOSE_ENV.write().unwrap();
+//    let docker_compose = DockerCompose::new(
+//        normalize_test_name(module_path!()),
+//        format!("{}/testdata/file_io_gcs", env!("CARGO_MANIFEST_DIR")),
+//    );
+//    docker_compose.run();
+//    guard.replace(docker_compose);
+//}
+//
+//#[dtor]
+//fn after_all() {
+//    let mut guard = DOCKER_COMPOSE_ENV.write().unwrap();
+//    guard.take();
+//}
+
+async fn get_file_io_gcs() -> FileIO {
+    set_up();
+
+    FileIOBuilder::new("gcs")
+        .with_props(vec![
+            (GCS_BUCKET, std::env::var("GCS_BUCKET").unwrap().to_string()),
+            (
+                GCS_CREDENTIAL_PATH,
+                std::env::var("GCS_CREDENTIAL_PATH").unwrap().to_string(),
+            ),
+        ])
+        .build()
+        .unwrap()
+}
+
+fn get_gs_path() -> String {
+    format!(
+        "gs://{}",
+        std::env::var("GCS_BUCKET").expect("Only runs with var enabled")
+    )
+}
+
+#[tokio::test]
+#[test_with::env(GCS_BUCKET, GCS_CREDENTIAL_PATH)]

Review Comment:
   **TL;DR: is there a preference on how to proceed here. Do I need to add 
support for unauthenticated requests in OpenDAL's `GcsConfig`? It looks like 
[`reqsign::google::TokenLoader`](https://github.com/Xuanwo/reqsign/blob/ab6c44b8675bf2a0eb496aee12eb85409dd2a071/src/google/token.rs#L120)
 has support for that, but I'm not sure how much this would help here either.**
   
   ---
   
   This is not very pretty, but I thought it best to at least put this up in 
draft to get some feedback/pointers on how to proceed.
   
   I've had to fallback on using an actual GCS bucket, which this does seem to 
work as intended in this case, but the `fake-gcs-server` via compose does not. 
I believe this is because OpenDAL always ends up falling back to a signed 
request through the `GcsConfig` approach which in turns lands at finally trying 
to request from GCE Metadata when all else fails - as every other route is 
returning `None` for an unauthenticated request.
   
   What I mean by this is this, but I could certainly use some pointers:
   
   1. Take the `write_once` for the `file_io.write(...)` path. This attempts to 
[`sign`](https://github.com/apache/opendal/blob/fac74d488ba82181330510f52be97f1338408878/core/src/services/gcs/writer.rs#L58)
 the request.
   2. The `sign` of `GcsCore` attempts to load the token via 
[`load_token`](https://github.com/apache/opendal/blob/fac74d488ba82181330510f52be97f1338408878/core/src/services/gcs/core.rs#L108-L109).
   3. The `load_token` in turn uses the current `token_loader` (a 
`GoogleTokenLoader` from `reqsign`) and calls that [`load` 
method](https://github.com/apache/opendal/blob/fac74d488ba82181330510f52be97f1338408878/core/src/services/gcs/core.rs#L76-L77).
   4. This now puts us into the `reqsign` crate to go through the [loading 
conditional 
flow](https://github.com/Xuanwo/reqsign/blob/ab6c44b8675bf2a0eb496aee12eb85409dd2a071/src/google/token.rs#L192-L196)
   5. For an unauthenticated request, all checks 
[here](https://github.com/Xuanwo/reqsign/blob/ab6c44b8675bf2a0eb496aee12eb85409dd2a071/src/google/token.rs#L207)
 are exhausted and we end at the 
[`load_via_vm_metadata`](https://github.com/Xuanwo/reqsign/blob/ab6c44b8675bf2a0eb496aee12eb85409dd2a071/src/google/token.rs#L224-L226)
 which eventually times out because we are not running in GCP.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to