anoopj commented on code in PR #16174:
URL: https://github.com/apache/iceberg/pull/16174#discussion_r3238043150
##########
core/src/main/java/org/apache/iceberg/util/LocationUtil.java:
##########
@@ -57,4 +57,59 @@ public static String tableLocation(TableIdentifier
tableIdentifier, boolean useU
return tableIdentifier.name();
}
}
+
+ /**
+ * Returns true if the location contains a URI scheme (e.g. {@code s3:},
{@code hdfs:}, {@code
+ * file:}), per <a
href="https://datatracker.ietf.org/doc/html/rfc3986#section-3.1">RFC 3986
+ * section 3.1</a>.
+ */
+ private static boolean hasScheme(String location) {
+ if (location.isEmpty()) {
+ return false;
+ }
+
+ // Early termination for relative locations since most commonly start with
/
+ if (location.charAt(0) == '/') {
+ return false;
+ }
+
+ for (int i = 0; i < location.length(); i += 1) {
+ char ch = location.charAt(i);
+ if (ch == ':') {
+ return i > 0;
+ }
+
+ if (!Character.isLetterOrDigit(ch) && ch != '+' && ch != '-' && ch !=
'.') {
+ return false;
+ }
+ }
+
+ return false;
+ }
+
+ /**
+ * Resolves a relative location against a table location. If the location
has a URI scheme, it is
+ * returned as-is. Otherwise, the location is appended to the table location
without any
+ * additional separator.
+ */
+ public static String resolveLocation(String tableLocation, String location) {
+ if (hasScheme(location)) {
+ return location;
+ }
+
+ return tableLocation + location;
+ }
+
+ /**
+ * Relativizes a location against a table location. If the location starts
with the table
+ * location, the prefix is removed and the remaining relative portion is
returned. Otherwise, the
+ * location is returned as-is.
+ */
+ public static String relativizeLocation(String tableLocation, String
location) {
+ if (location.startsWith(tableLocation)) {
Review Comment:
> Can we require that the relative portion must start with the path separator
Alternatively, we can flip it require that the table location must _end_
with a path separator. Both options solve this problem.
Implementors will find it a bit surprising that the relative path starts
with a `/`. For instance, if you use the Rust `Url` libraries you get
surprising results:
```
use url::Url;
fn main() {
let base = Url::parse("s3://bucket/mytable").unwrap();
let iceberg_url = base.join("/data/file.parquet").unwrap();
# prints out: Iceberg absolute: s3://bucket/data/file.parquet (mytable
is dropped)
println!("Iceberg absolute: {iceberg_url}");
}
```
This can be avoided if we flip it and require the location must end with the
separator instead. The downside of this approach is that we might lose the
[early](https://github.com/apache/iceberg/pull/16174/changes#diff-1058977d0d362c569bf467c7db4cc3b695924d7153a0358302f2df83d6bb5b0aR71-R73)
termination done in the current version of this PR and may have to look at
more characters for a colon instead. But honestly it may not make much of a
difference in modern CPUs if JVM is using JIT compilation with SIMD
instructions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]