This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new bbe38a28bca [feat](jdbc) add jdbc function rules (#2470) bbe38a28bca is described below commit bbe38a28bca8dc5f2cd55481d716e67569a1040d Author: Mingyu Chen (Rayner) <morning...@163.com> AuthorDate: Tue Jun 17 10:36:59 2025 +0800 [feat](jdbc) add jdbc function rules (#2470) ## Versions - [x] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- docs/lakehouse/catalogs/jdbc-catalog-overview.md | 67 +++++++++++++++++++--- docs/lakehouse/catalogs/jdbc-clickhouse-catalog.md | 6 ++ .../lakehouse/catalogs/jdbc-catalog-overview.md | 67 +++++++++++++++++++--- .../lakehouse/catalogs/jdbc-clickhouse-catalog.md | 6 ++ 4 files changed, 132 insertions(+), 14 deletions(-) diff --git a/docs/lakehouse/catalogs/jdbc-catalog-overview.md b/docs/lakehouse/catalogs/jdbc-catalog-overview.md index c243558a6f7..bab4afc7e0a 100644 --- a/docs/lakehouse/catalogs/jdbc-catalog-overview.md +++ b/docs/lakehouse/catalogs/jdbc-catalog-overview.md @@ -140,15 +140,68 @@ FROM all_types WHERE smallint_u > 10 GROUP BY smallint_u; #### Function Pushdown -By default, predicate conditions involving functions will not be pushed down. This is because the semantics or behavior of the same function might differ between Doris and the external data source. If the user is confident about the function's semantics, they can enable function pushdown by setting the session variable `enable_ext_func_pred_pushdown` to `true`. +For predicate conditions, the semantics or behavior in Doris and external data sources may be inconsistent. Therefore, Doris restricts and controls predicate pushdown in JDBC external table queries through the following parameter variables: -Additionally, different data sources have specific function blacklists. Functions in these blacklists will not be pushed down. +> Note: Currently, Doris only supports predicate pushdown for MySQL, Clickhouse, and Oracle data sources. More data sources will be supported in the future. -| Data Source | Function Blacklist | Notes | -| ------------- | ------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | -| MySQL | - `DATE_TRUNC`<br>- `MONEY_FORMAT`<br>- `NEGTIVE` | MySQL also allows additional blacklisted functions to be configured via the FE parameter `jdbc_mysql_unsupported_pushdown_functions`, e.g., `jdbc_mysql_unsupported_pushdown_functions=func1,func2`. | -| ClickHouse | - `FROM_UNIXTIME`<br>- `UNIX_TIMESTAMP` | | -| Oracle | - `NVL`<br>- `IFNULL` | | +- `enable_jdbc_oracle_null_predicate_push_down` + + Session variable. The default is `false`. That is, when the predicate condition contains a `NULL` value, the predicate will not be pushed down to the Oracle data source. This is because, before Oracle version 21, Oracle does not support `NULL` as an operator. + + This parameter is supported since 2.1.7 and 3.0.3. + +- `enable_jdbc_cast_predicate_push_down` + + Session variable. The default is `false`. That is, when there is an explicit or implicit CAST in the predicate condition, the predicate will not be pushed down to the JDBC data source. Since the behavior of CAST is inconsistent across different databases, to ensure correctness, CAST is not pushed down by default. However, users can manually verify whether the behavior of CAST is consistent. If so, this parameter can be set to `true` to allow more predicates to be pushed down for bett [...] + + This parameter is supported since 2.1.7 and 3.0.3. + +- Function pushdown blacklist and whitelist + + Functions with the same signature may have inconsistent semantics in Doris and external data sources. Doris has predefined some blacklists and whitelists for function pushdown: + + | Data Source | Blacklist | Whitelist | Description | + | ---------- | ----- | --- | --------- | + | MySQL | - `DATE_TRUNC`<br>- `MONEY_FORMAT`<br>- `NEGTIVE` | | MySQL can also set additional blacklist items through the FE configuration item `jdbc_mysql_unsupported_pushdown_functions`, e.g.: `jdbc_mysql_unsupported_pushdown_functions==func1,func2` | + | Clickhouse | | - `FROM_UNIXTIME`<br>- `UNIX_TIMESTAMP` | | + | Oracle | | - `NVL`<br>- `IFNULL` | | + +- Function rewrite rules + + There are some functions in Doris and external data sources that have consistent behavior but different names. Doris supports rewriting these functions during function pushdown. The following rewrite rules are currently built-in: + + | Data Source | Doris Function | Target Function | + | ---------- | ----- | --- | + | MySQL | nvl | ifnull | + | MySQL | to_date | date | + | Clickhouse | from_unixtime | FROM_UNIXTIME | + | Clickhouse | unix_timestamp | toUnixTimestamp | + | Oracle | ifnull | nvl | + +- Custom function pushdown and rewrite rules + + In subsequent versions after 3.0.7, Doris supports more flexible function pushdown and rewrite rules. Users can set function pushdown and rewrite rules for a specific Catalog in the Catalog properties: + + ```sql + create catalog jdbc properties ( + ... + 'function_rules' = '{"pushdown" : {"supported": ["to_date"], "unsupported" : ["abs"]}, "rewrite" : {"to_date" : "date2"}}' + ) + ``` + + Through `function_rules`, the following rules can be specified: + + - `pushdown` + + Specifies function pushdown rules. The `supported` and `unsupported` arrays specify the function names that can and cannot be pushed down, respectively. If a function exists in both arrays, `supported` takes precedence. + + Doris will first apply the system's predefined blacklists and whitelists, and then apply the user-specified blacklists and whitelists. + + - `rewrite` + + Defines function rewrite rules. As in the example above, the function name `to_date` will be rewritten as `date2` and pushed down. + + Note: Only functions that are allowed to be pushed down will be rewritten. #### Row Count Limitation diff --git a/docs/lakehouse/catalogs/jdbc-clickhouse-catalog.md b/docs/lakehouse/catalogs/jdbc-clickhouse-catalog.md index 1c44d1f70d7..f442a423050 100644 --- a/docs/lakehouse/catalogs/jdbc-clickhouse-catalog.md +++ b/docs/lakehouse/catalogs/jdbc-clickhouse-catalog.md @@ -96,6 +96,12 @@ When mapping ClickHouse, a Database in Doris corresponds to a Database in ClickH | array | array | | | other | UNSUPPORTED | | +## Related Parameters + +- `jdbc_clickhouse_query_final` + + Session variable, default is false. When set to true, `SETTINGS final = 1` will be appended to the SQL statements sent to Clickhouse. + ## Common Issues 1. Encountering `NoClassDefFoundError: net/jpountz/lz4/LZ4Factory` error message when reading Clickhouse data diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-catalog-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-catalog-overview.md index 5d35e3c2dd6..4b551d9740e 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-catalog-overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-catalog-overview.md @@ -141,15 +141,68 @@ FROM all_types WHERE smallint_u > 10 GROUP BY smallint_u; #### 函数下推 -默认不会下推带函数的谓词条件。因为对一同一个函数,可能在 Doris 和外部数据源中的语义或行为是不一致的。如果用户明确知晓函数的语义,可以通过将会话变量 `enable_ext_func_pred_pushdown` 设为 `true` 来开启函数下推功能。 +对于谓词条件,Doris 和外部数据源中的语义或行为可能是不一致的。所以 Doris 通过以下参数变量对 JDBC 外表查询中的谓词条件下推进行限制和控制: -同时,不同的数据源有不同的函数黑名单,黑名单中的函数不会被下推。 +> 注:目前 Doris 只支持对 MySQL、Clickhouse、Oracle 三个数据源进行谓词下推。后续将开放更多数据源。 -| 数据源 | 函数黑名单 | 说明 | -| ---------- | ----- | ---------------------------- | -| MySQL | - `DATE_TRUNC`<br>- `MONEY_FORMAT`<br>- `NEGTIVE` | MySQL 还可以通过 FE 配置项 `jdbc_mysql_unsupported_pushdown_functions` 来设置额外的黑名单,如:`jdbc_mysql_unsupported_pushdown_functions==func1,func2` | -| Clickhouse | - `FROM_UNIXTIME`<br>- `UNIX_TIMESTAMP` | | -| Oracle | - `NVL`<br>- `IFNULL` | | +- `enable_jdbc_oracle_null_predicate_push_down` + + 会话变量。默认为 `false`。即当谓词条件中包含 `NULL` 值时,该谓词条件不会下推给 Oracle 数据源。因为在 Oracle 21 版本之前,Oracle 不支持 `NULL` 作为操作符。 + + 该参数自 2.1.7 和 3.0.3 支持。 + +- `enable_jdbc_cast_predicate_push_down` + + 会话变量。默认为 `false`。即当谓词条件中存在显式或隐式 CAST 时,该谓词条件不会下推给 JDBC 数据源。因为 CAST 的行为在不同数据库中不一致,为确保结果正确性,默认不下推 CAST。但用户可以手动验证 CAST 的行为是否一致,如果一致,可以将此参数设置为 `true`,让更多的谓词下推已获得更好的性能。 + + 该参数自 2.1.7 和 3.0.3 支持。 + +- 函数下推黑白名单 + + 签名相同的函数,在 Doris 和外部数据源中的语义可能是不一致的。Doris 中预定义了一些函数下推的黑白名单: + + | 数据源 | 黑名单 | 白名单 | 说 明 | + | ---------- | ----- | --- | --------- | + | MySQL | - `DATE_TRUNC`<br>- `MONEY_FORMAT`<br>- `NEGTIVE` | | MySQL 还可以通过 FE 配置项 `jdbc_mysql_unsupported_pushdown_functions` 来设置额外的黑名单,如:`jdbc_mysql_unsupported_pushdown_functions==func1,func2` | + | Clickhouse | | - `FROM_UNIXTIME`<br>- `UNIX_TIMESTAMP` | | + | Oracle | | - `NVL`<br>- `IFNULL` | | + +- 函数改写规则 + + Doris 和外部数据源中存在一些行为一致但名称不一样函数。Doris 支持在函数下推时对这些函数进行改写。目前内置了以下改写规则: + + | 数据源 | Doris 函数 | 目的端函数 | + | ---------- | ----- | --- | + | MySQL | nvl | ifnull | + | MySQL | to_date | date | + | Clickhouse | from_unixtime | FROM_UNIXTIME | + | Clickhouse | unix_timestamp | toUnixTimestamp | + | Oracle | ifnull | nvl | + +- 自定义函数下推和改写规则 + + 3.0.7 后续版本中,Doris 支持了更加灵活的函数下推和改写规则,用户可以在 Catalog 属性中设置针对某个 Catalog 的函数下推和改写规则: + + ```sql + create catalog jdbc properties ( + ... + 'function_rules' = '{"pushdown" : {"supported": ["to_date"], "unsupported" : ["abs"]}, "rewrite" : {"to_date" : "date2"}}' + ) + ``` + + 通过 `function_rules` 可以指定以下规则: + + - `pushdown` + + 指定函数下推规则,其中 `supported` 和 `unsupported` 数组中分别指定可以下推和不能下推的函数名称。如果同时存在于两个数组中,以 `supported` 优先。 + + Doris 会先应用上述系统中预定义的黑白名单,再应用用户指定的黑白名单。 + + - `rewrite` + + 定义函数改写规则,如上述示例中,会将 `to_date` 函数名称改写为 `date2` 并下推。 + + 注意,只会改写允许下推的函数。 #### 行数限制 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-clickhouse-catalog.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-clickhouse-catalog.md index 3b0f8dd8dc7..1167a28d889 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-clickhouse-catalog.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/jdbc-clickhouse-catalog.md @@ -96,6 +96,12 @@ CREATE CATALOG clickhouse PROPERTIES ( | array | array | | | other | UNSUPPORTED | | +## 相关参数 + +- `jdbc_clickhouse_query_final` + + 会话变量,默认为 false。当设置为 true 时,发送给 Clickhouse 的 SQL 语句后会添加 `SETTINGS final = 1`。 + ## 常见问题 1. 读取 Clickhouse 数据出现 `NoClassDefFoundError: net/jpountz/lz4/LZ4Factory` 错误信息 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org