pvary commented on code in PR #13302:
URL: https://github.com/apache/iceberg/pull/13302#discussion_r2161041458


##########
core/src/main/java/org/apache/iceberg/actions/FileURI.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.actions;
+
+import java.net.URI;
+import java.util.Map;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
+import org.apache.iceberg.relocated.com.google.common.base.Strings;
+
+public class FileURI {
+
+  private String scheme;
+  private String authority;
+  private String path;
+  private String uriAsString;
+
+  public FileURI(
+      String uriAsString, Map<String, String> equalSchemes, Map<String, 
String> equalAuthorities) {
+    URI uri = new Path(uriAsString).toUri();
+    this.scheme = equalSchemes.getOrDefault(uri.getScheme(), uri.getScheme());
+    this.authority = equalAuthorities.getOrDefault(uri.getAuthority(), 
uri.getAuthority());
+    this.path = uri.getPath();
+    this.uriAsString = uriAsString;
+  }
+
+  public FileURI(String scheme, String authority, String path, String 
uriAsString) {
+    this.scheme = scheme;
+    this.authority = authority;
+    this.path = path;
+    this.uriAsString = uriAsString;
+  }
+
+  public FileURI() {}
+
+  public String getScheme() {

Review Comment:
   That's an interesting question to decide on.
   We have metadata location Strings, and file system location Strings emitted 
from the downstream (FSList, MetadataList) operators. We need to serialize and 
shuffle them, so they matched by the `path` component, but need the `scheme` 
and the `authority` for matching, and `uriAsString` for deleting.
   
   It is enough to have the `uriAsString` to travel on the wire. For the key, 
we need to get the `path` both on the emitter side, and on the `AntiJoin` side. 
It could be a fun exercise to check which performs better. OTOH the maintenance 
tasks are not the performance sensitive, so we can wait with the optimization 
until it is needed.
   
   I would definitely opt for a Flink serializer for the `FileURI` class, and 
we can change it when it is needed. Also using Java serialization is one of the 
worst solution, so I would try to avoid it whenever it is not strictly needed. 
Especially that in this case FileURI should not change or the state will be 
corrupted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to