github-actions[bot] commented on code in PR #63817: URL: https://github.com/apache/doris/pull/63817#discussion_r3400648246
########## fe/fe-core/src/main/java/org/apache/doris/datasource/connectivity/CatalogSsrfChecker.java: ########## @@ -0,0 +1,253 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.datasource.connectivity; + +import org.apache.doris.cloud.security.SecurityChecker; +import org.apache.doris.common.DdlException; +import org.apache.doris.datasource.property.metastore.MetastoreProperties; +import org.apache.doris.datasource.property.storage.HdfsProperties; +import org.apache.doris.datasource.property.storage.StorageProperties; +import org.apache.doris.foundation.property.ConnectorProperty; + +import com.google.common.collect.Sets; +import org.apache.commons.lang3.StringUtils; +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.Logger; + +import java.lang.reflect.Field; +import java.lang.reflect.Modifier; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Validates user-supplied catalog URIs against SSRF attacks before a catalog is created. + * + * <p>Unlike {@link CatalogConnectivityTestCoordinator}, this checker opens no network + * connections; it only invokes {@link SecurityChecker#startSSRFChecking(String)} on each URI + * so the platform's SSRF rule engine can reject internal / private / loopback hosts. + * Because no connectivity probe is required, the check runs unconditionally on every CREATE + * CATALOG, regardless of the {@code test_connection} property. + * + * <p>Discovery is driven by the {@link ConnectorProperty#checkSsrf()} flag — to extend Review Comment: The opt-in discovery still leaves existing catalog endpoints invisible unless every field is audited. Two concrete gaps, distinct from the existing Azure OAuth2 thread, are `IcebergRestProperties.icebergRestOauth2ServerUri` (`iceberg.rest.oauth2.server-uri`), which is passed to Iceberg as `OAuth2Properties.OAUTH2_SERVER_URI` for token fetches, and `IcebergJdbcMetaStoreProperties.uri` (`iceberg.jdbc.uri`), which is passed as `CatalogProperties.URI` to Iceberg's JDBC catalog. Neither has `checkSsrf = true` nor another check in this path, so after the core checker is made effective those endpoints can still point at loopback/private hosts. Please cover these fields (using the JDBC URL checker where appropriate) and add a negative or audit test that prevents new `uri`/`endpoint` `ConnectorProperty` fields from being silently missed. ########## fe/fe-core/src/main/java/org/apache/doris/datasource/connectivity/CatalogSsrfChecker.java: ########## @@ -0,0 +1,253 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.datasource.connectivity; + +import org.apache.doris.cloud.security.SecurityChecker; +import org.apache.doris.common.DdlException; +import org.apache.doris.datasource.property.metastore.MetastoreProperties; +import org.apache.doris.datasource.property.storage.HdfsProperties; +import org.apache.doris.datasource.property.storage.StorageProperties; +import org.apache.doris.foundation.property.ConnectorProperty; + +import com.google.common.collect.Sets; +import org.apache.commons.lang3.StringUtils; +import org.apache.logging.log4j.LogManager; +import org.apache.logging.log4j.Logger; + +import java.lang.reflect.Field; +import java.lang.reflect.Modifier; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Validates user-supplied catalog URIs against SSRF attacks before a catalog is created. + * + * <p>Unlike {@link CatalogConnectivityTestCoordinator}, this checker opens no network + * connections; it only invokes {@link SecurityChecker#startSSRFChecking(String)} on each URI + * so the platform's SSRF rule engine can reject internal / private / loopback hosts. + * Because no connectivity probe is required, the check runs unconditionally on every CREATE + * CATALOG, regardless of the {@code test_connection} property. + * + * <p>Discovery is driven by the {@link ConnectorProperty#checkSsrf()} flag — to extend + * coverage to a new property class, simply set {@code checkSsrf = true} on its endpoint / + * URI field; no change to this class is required. + */ +public class CatalogSsrfChecker { + private static final Logger LOG = LogManager.getLogger(CatalogSsrfChecker.class); + + /** Reflection traversal only descends into property classes under this package prefix. */ + private static final String PROPERTY_PACKAGE_PREFIX = "org.apache.doris.datasource.property."; + + /** + * HDFS HA exposes one rpc-address per namenode under dynamic keys + * (e.g. {@code dfs.namenode.rpc-address.<nameservice>.<nn>}); these are stored in + * {@link HdfsProperties#getBackendConfigProperties()} rather than as declared fields, + * so {@link ConnectorProperty#checkSsrf()} cannot reach them. They are collected + * separately. + */ + private static final String HDFS_NAMENODE_RPC_ADDRESS_PREFIX = "dfs.namenode.rpc-address."; + + private CatalogSsrfChecker() {} + + /** + * Validate every user-supplied URI on the given catalog properties. + * + * @throws DdlException if any URI fails the SSRF check + */ + public static void check(String catalogName, + MetastoreProperties metastoreProperties, + Map<StorageProperties.Type, StorageProperties> storagePropertiesMap) + throws DdlException { + List<String> uris = new ArrayList<>(); + Set<Object> visited = Sets.newIdentityHashSet(); + + if (metastoreProperties != null) { + collectAnnotatedUris(metastoreProperties, visited, uris); + } + if (storagePropertiesMap != null) { + for (StorageProperties sp : storagePropertiesMap.values()) { + collectStorageUris(sp, visited, uris); + } + } + for (String uri : uris) { + checkSingleUri(catalogName, uri); + } + } + + /** + * Collect every URI worth SSRF-checking on a single storage property object. + * + * <p>Auto-fallback HDFS storage ({@code explicitlyConfigured=false}) is dropped wholesale + * — both its annotated {@code fs.defaultFS} field and any dynamic namenode rpc-address + * entries — because that {@link HdfsProperties} instance was synthesised by the + * framework for catalogs whose user never configured HDFS, and so its values shouldn't + * surface as user-supplied URIs. + */ + private static void collectStorageUris(StorageProperties sp, Set<Object> visited, List<String> uris) { + if (sp instanceof HdfsProperties && !((HdfsProperties) sp).isExplicitlyConfigured()) { + return; + } + collectAnnotatedUris(sp, visited, uris); + collectDynamicUris(sp, uris); + } + + /** + * Walk the object graph rooted at {@code root}, collecting String values of any field + * annotated with {@link ConnectorProperty#checkSsrf()}{@code = true}. Recurses through + * fields whose declared type lives under {@link #PROPERTY_PACKAGE_PREFIX}; an identity + * set prevents revisits. + */ + private static void collectAnnotatedUris(Object root, Set<Object> visited, List<String> uris) { + if (root == null || !visited.add(root)) { + return; + } + Class<?> clazz = root.getClass(); + while (clazz != null && clazz != Object.class) { + for (Field field : clazz.getDeclaredFields()) { + if (Modifier.isStatic(field.getModifiers())) { + continue; + } + ConnectorProperty annotation = field.getAnnotation(ConnectorProperty.class); + Object value = readField(field, root); + if (value == null) { + continue; + } + if (annotation != null && annotation.checkSsrf() && value instanceof String) { + String s = (String) value; + if (StringUtils.isNotBlank(s)) { + uris.add(s); + } + } + if (isPropertyContainer(value)) { + collectAnnotatedUris(value, visited, uris); + } + } + clazz = clazz.getSuperclass(); + } + } + + /** + * Collect URIs that live behind dynamic property keys (those not bound to a static + * field via {@code @ConnectorProperty}). Currently this means HDFS namenode rpc-address + * entries on {@link HdfsProperties}. The {@code isExplicitlyConfigured} gating is + * applied by {@link #collectStorageUris} before reaching here. + */ + private static void collectDynamicUris(StorageProperties props, List<String> uris) { + if (!(props instanceof HdfsProperties)) { + return; + } + Map<String, String> backendConfig = ((HdfsProperties) props).getBackendConfigProperties(); + if (backendConfig == null) { + return; + } + for (Map.Entry<String, String> e : backendConfig.entrySet()) { + if (e.getKey() != null && e.getKey().startsWith(HDFS_NAMENODE_RPC_ADDRESS_PREFIX) + && StringUtils.isNotBlank(e.getValue())) { + uris.add(e.getValue()); + } + } + } + + private static Object readField(Field field, Object obj) { + try { + field.setAccessible(true); + return field.get(obj); + } catch (IllegalAccessException e) { + LOG.warn("Failed to read field {} on {}", field.getName(), obj.getClass().getName(), e); + return null; + } + } + + /** + * True if {@code value} is itself a Doris property POJO that we should recurse into + * (e.g. {@code HMSBaseProperties} embedded inside {@code HiveHMSProperties}). + * Returns false for Strings, primitives, collections, framework types, etc. + */ + private static boolean isPropertyContainer(Object value) { + if (value == null) { + return false; + } + if (value instanceof CharSequence || value instanceof Number || value instanceof Boolean + || value instanceof Collection || value instanceof Map) { + return false; + } + Class<?> c = value.getClass(); + if (c.isPrimitive() || c.isArray() || c.isEnum()) { + return false; + } + String name = c.getName(); + return name.startsWith(PROPERTY_PACKAGE_PREFIX); + } + + /** + * Run a single URI through the SSRF check. URI may be a full URL ({@code thrift://h:p}, + * {@code hdfs://h:p}, {@code https://...}) or a bare {@code host[:port]}. We normalize to + * {@code http://...} solely so the underlying SSRF rule engine can parse it; no HTTP + * connection is opened here. + * + * <p>HMS allows comma-separated URIs (e.g. {@code thrift://h1:p,thrift://h2:p}); each is + * validated independently so a single bad host fails the catalog creation. + */ + private static void checkSingleUri(String catalogName, String uri) throws DdlException { + if (StringUtils.isBlank(uri)) { + return; + } + for (String token : uri.split(",")) { + String single = token.trim(); + if (single.isEmpty()) { + continue; + } + String urlStr = normalizeToHttpUrl(single); + if (urlStr == null) { + continue; + } + try { + SecurityChecker.getInstance().startSSRFChecking(urlStr); Review Comment: This does not actually validate the target host. `SecurityChecker.startSSRFChecking()` delegates to the hook-style `startSSRFNetHookChecking(...)`, and every existing FE caller starts the hook and then opens the URL before stopping it (`S3Util.validateAndTestEndpoint()`, `Util.getInputStreamFromUrl()`, etc.). Here the hook is immediately stopped without any connection, so a catalog such as `hive.metastore.uris=thrift://127.0.0.1:9083` only runs hook setup and can still be persisted with `test_connection=false`; the later HMS/HDFS/Iceberg client connection happens outside the hook. Please use a real no-connection host validation API / explicit host-resolution check, or perform the actual outbound check under the hook, and add a test that fails with a real loopback/private host rather than a mocked `startSSRFChecking()` exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
