github-actions[bot] commented on code in PR #63817:
URL: https://github.com/apache/doris/pull/63817#discussion_r3400648246


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/connectivity/CatalogSsrfChecker.java:
##########
@@ -0,0 +1,253 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.datasource.connectivity;
+
+import org.apache.doris.cloud.security.SecurityChecker;
+import org.apache.doris.common.DdlException;
+import org.apache.doris.datasource.property.metastore.MetastoreProperties;
+import org.apache.doris.datasource.property.storage.HdfsProperties;
+import org.apache.doris.datasource.property.storage.StorageProperties;
+import org.apache.doris.foundation.property.ConnectorProperty;
+
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Validates user-supplied catalog URIs against SSRF attacks before a catalog 
is created.
+ *
+ * <p>Unlike {@link CatalogConnectivityTestCoordinator}, this checker opens no 
network
+ * connections; it only invokes {@link 
SecurityChecker#startSSRFChecking(String)} on each URI
+ * so the platform's SSRF rule engine can reject internal / private / loopback 
hosts.
+ * Because no connectivity probe is required, the check runs unconditionally 
on every CREATE
+ * CATALOG, regardless of the {@code test_connection} property.
+ *
+ * <p>Discovery is driven by the {@link ConnectorProperty#checkSsrf()} flag — 
to extend

Review Comment:
   The opt-in discovery still leaves existing catalog endpoints invisible 
unless every field is audited. Two concrete gaps, distinct from the existing 
Azure OAuth2 thread, are `IcebergRestProperties.icebergRestOauth2ServerUri` 
(`iceberg.rest.oauth2.server-uri`), which is passed to Iceberg as 
`OAuth2Properties.OAUTH2_SERVER_URI` for token fetches, and 
`IcebergJdbcMetaStoreProperties.uri` (`iceberg.jdbc.uri`), which is passed as 
`CatalogProperties.URI` to Iceberg's JDBC catalog. Neither has `checkSsrf = 
true` nor another check in this path, so after the core checker is made 
effective those endpoints can still point at loopback/private hosts. Please 
cover these fields (using the JDBC URL checker where appropriate) and add a 
negative or audit test that prevents new `uri`/`endpoint` `ConnectorProperty` 
fields from being silently missed.



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/connectivity/CatalogSsrfChecker.java:
##########
@@ -0,0 +1,253 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.datasource.connectivity;
+
+import org.apache.doris.cloud.security.SecurityChecker;
+import org.apache.doris.common.DdlException;
+import org.apache.doris.datasource.property.metastore.MetastoreProperties;
+import org.apache.doris.datasource.property.storage.HdfsProperties;
+import org.apache.doris.datasource.property.storage.StorageProperties;
+import org.apache.doris.foundation.property.ConnectorProperty;
+
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Validates user-supplied catalog URIs against SSRF attacks before a catalog 
is created.
+ *
+ * <p>Unlike {@link CatalogConnectivityTestCoordinator}, this checker opens no 
network
+ * connections; it only invokes {@link 
SecurityChecker#startSSRFChecking(String)} on each URI
+ * so the platform's SSRF rule engine can reject internal / private / loopback 
hosts.
+ * Because no connectivity probe is required, the check runs unconditionally 
on every CREATE
+ * CATALOG, regardless of the {@code test_connection} property.
+ *
+ * <p>Discovery is driven by the {@link ConnectorProperty#checkSsrf()} flag — 
to extend
+ * coverage to a new property class, simply set {@code checkSsrf = true} on 
its endpoint /
+ * URI field; no change to this class is required.
+ */
+public class CatalogSsrfChecker {
+    private static final Logger LOG = 
LogManager.getLogger(CatalogSsrfChecker.class);
+
+    /** Reflection traversal only descends into property classes under this 
package prefix. */
+    private static final String PROPERTY_PACKAGE_PREFIX = 
"org.apache.doris.datasource.property.";
+
+    /**
+     * HDFS HA exposes one rpc-address per namenode under dynamic keys
+     * (e.g. {@code dfs.namenode.rpc-address.<nameservice>.<nn>}); these are 
stored in
+     * {@link HdfsProperties#getBackendConfigProperties()} rather than as 
declared fields,
+     * so {@link ConnectorProperty#checkSsrf()} cannot reach them. They are 
collected
+     * separately.
+     */
+    private static final String HDFS_NAMENODE_RPC_ADDRESS_PREFIX = 
"dfs.namenode.rpc-address.";
+
+    private CatalogSsrfChecker() {}
+
+    /**
+     * Validate every user-supplied URI on the given catalog properties.
+     *
+     * @throws DdlException if any URI fails the SSRF check
+     */
+    public static void check(String catalogName,
+                             MetastoreProperties metastoreProperties,
+                             Map<StorageProperties.Type, StorageProperties> 
storagePropertiesMap)
+            throws DdlException {
+        List<String> uris = new ArrayList<>();
+        Set<Object> visited = Sets.newIdentityHashSet();
+
+        if (metastoreProperties != null) {
+            collectAnnotatedUris(metastoreProperties, visited, uris);
+        }
+        if (storagePropertiesMap != null) {
+            for (StorageProperties sp : storagePropertiesMap.values()) {
+                collectStorageUris(sp, visited, uris);
+            }
+        }
+        for (String uri : uris) {
+            checkSingleUri(catalogName, uri);
+        }
+    }
+
+    /**
+     * Collect every URI worth SSRF-checking on a single storage property 
object.
+     *
+     * <p>Auto-fallback HDFS storage ({@code explicitlyConfigured=false}) is 
dropped wholesale
+     * — both its annotated {@code fs.defaultFS} field and any dynamic 
namenode rpc-address
+     * entries — because that {@link HdfsProperties} instance was synthesised 
by the
+     * framework for catalogs whose user never configured HDFS, and so its 
values shouldn't
+     * surface as user-supplied URIs.
+     */
+    private static void collectStorageUris(StorageProperties sp, Set<Object> 
visited, List<String> uris) {
+        if (sp instanceof HdfsProperties && !((HdfsProperties) 
sp).isExplicitlyConfigured()) {
+            return;
+        }
+        collectAnnotatedUris(sp, visited, uris);
+        collectDynamicUris(sp, uris);
+    }
+
+    /**
+     * Walk the object graph rooted at {@code root}, collecting String values 
of any field
+     * annotated with {@link ConnectorProperty#checkSsrf()}{@code  = true}. 
Recurses through
+     * fields whose declared type lives under {@link 
#PROPERTY_PACKAGE_PREFIX}; an identity
+     * set prevents revisits.
+     */
+    private static void collectAnnotatedUris(Object root, Set<Object> visited, 
List<String> uris) {
+        if (root == null || !visited.add(root)) {
+            return;
+        }
+        Class<?> clazz = root.getClass();
+        while (clazz != null && clazz != Object.class) {
+            for (Field field : clazz.getDeclaredFields()) {
+                if (Modifier.isStatic(field.getModifiers())) {
+                    continue;
+                }
+                ConnectorProperty annotation = 
field.getAnnotation(ConnectorProperty.class);
+                Object value = readField(field, root);
+                if (value == null) {
+                    continue;
+                }
+                if (annotation != null && annotation.checkSsrf() && value 
instanceof String) {
+                    String s = (String) value;
+                    if (StringUtils.isNotBlank(s)) {
+                        uris.add(s);
+                    }
+                }
+                if (isPropertyContainer(value)) {
+                    collectAnnotatedUris(value, visited, uris);
+                }
+            }
+            clazz = clazz.getSuperclass();
+        }
+    }
+
+    /**
+     * Collect URIs that live behind dynamic property keys (those not bound to 
a static
+     * field via {@code @ConnectorProperty}). Currently this means HDFS 
namenode rpc-address
+     * entries on {@link HdfsProperties}. The {@code isExplicitlyConfigured} 
gating is
+     * applied by {@link #collectStorageUris} before reaching here.
+     */
+    private static void collectDynamicUris(StorageProperties props, 
List<String> uris) {
+        if (!(props instanceof HdfsProperties)) {
+            return;
+        }
+        Map<String, String> backendConfig = ((HdfsProperties) 
props).getBackendConfigProperties();
+        if (backendConfig == null) {
+            return;
+        }
+        for (Map.Entry<String, String> e : backendConfig.entrySet()) {
+            if (e.getKey() != null && 
e.getKey().startsWith(HDFS_NAMENODE_RPC_ADDRESS_PREFIX)
+                    && StringUtils.isNotBlank(e.getValue())) {
+                uris.add(e.getValue());
+            }
+        }
+    }
+
+    private static Object readField(Field field, Object obj) {
+        try {
+            field.setAccessible(true);
+            return field.get(obj);
+        } catch (IllegalAccessException e) {
+            LOG.warn("Failed to read field {} on {}", field.getName(), 
obj.getClass().getName(), e);
+            return null;
+        }
+    }
+
+    /**
+     * True if {@code value} is itself a Doris property POJO that we should 
recurse into
+     * (e.g. {@code HMSBaseProperties} embedded inside {@code 
HiveHMSProperties}).
+     * Returns false for Strings, primitives, collections, framework types, 
etc.
+     */
+    private static boolean isPropertyContainer(Object value) {
+        if (value == null) {
+            return false;
+        }
+        if (value instanceof CharSequence || value instanceof Number || value 
instanceof Boolean
+                || value instanceof Collection || value instanceof Map) {
+            return false;
+        }
+        Class<?> c = value.getClass();
+        if (c.isPrimitive() || c.isArray() || c.isEnum()) {
+            return false;
+        }
+        String name = c.getName();
+        return name.startsWith(PROPERTY_PACKAGE_PREFIX);
+    }
+
+    /**
+     * Run a single URI through the SSRF check. URI may be a full URL ({@code 
thrift://h:p},
+     * {@code hdfs://h:p}, {@code https://...}) or a bare {@code host[:port]}. 
We normalize to
+     * {@code http://...} solely so the underlying SSRF rule engine can parse 
it; no HTTP
+     * connection is opened here.
+     *
+     * <p>HMS allows comma-separated URIs (e.g. {@code 
thrift://h1:p,thrift://h2:p}); each is
+     * validated independently so a single bad host fails the catalog creation.
+     */
+    private static void checkSingleUri(String catalogName, String uri) throws 
DdlException {
+        if (StringUtils.isBlank(uri)) {
+            return;
+        }
+        for (String token : uri.split(",")) {
+            String single = token.trim();
+            if (single.isEmpty()) {
+                continue;
+            }
+            String urlStr = normalizeToHttpUrl(single);
+            if (urlStr == null) {
+                continue;
+            }
+            try {
+                SecurityChecker.getInstance().startSSRFChecking(urlStr);

Review Comment:
   This does not actually validate the target host. 
`SecurityChecker.startSSRFChecking()` delegates to the hook-style 
`startSSRFNetHookChecking(...)`, and every existing FE caller starts the hook 
and then opens the URL before stopping it (`S3Util.validateAndTestEndpoint()`, 
`Util.getInputStreamFromUrl()`, etc.). Here the hook is immediately stopped 
without any connection, so a catalog such as 
`hive.metastore.uris=thrift://127.0.0.1:9083` only runs hook setup and can 
still be persisted with `test_connection=false`; the later HMS/HDFS/Iceberg 
client connection happens outside the hook. Please use a real no-connection 
host validation API / explicit host-resolution check, or perform the actual 
outbound check under the hook, and add a test that fails with a real 
loopback/private host rather than a mocked `startSSRFChecking()` exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to