This is an automated email from the ASF dual-hosted git repository.
zclllyybb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 927a46dcad9 add docs for aggregation function
datasketches_hll_union_agg (#3711)
927a46dcad9 is described below
commit 927a46dcad9e86d80936f4270528f96ab3931f2d
Author: nooneuse <[email protected]>
AuthorDate: Wed Jun 3 15:03:38 2026 +0800
add docs for aggregation function datasketches_hll_union_agg (#3711)
Please See: https://github.com/apache/doris/pull/63143
Documentation for the datasketches_hll_union_agg aggregate function.
## Versions
- [x] dev
- [x] 4.x
- [ ] 3.x
- [ ] 2.1 or older (not covered by version/language sync gate)
## Languages
- [x] Chinese
- [x] English
- [ ] Japanese candidate translation needed
## Docs Checklist
- [ ] Checked by AI
- [x] Test Cases Built
- [x] Updated required version and language counterparts, or explained
why not
- [ ] If only one language changed, confirmed whether source/translation
counterparts need sync
---------
Co-authored-by: yuanyuhao <[email protected]>
---
.../datasketches_hll_union_agg.md | 118 ++++++++++++++++++++
.../datasketches_hll_union_agg.md | 118 ++++++++++++++++++++
.../datasketches_hll_union_agg.md | 120 +++++++++++++++++++++
sidebars.ts | 1 +
.../datasketches_hll_union_agg.md | 120 +++++++++++++++++++++
versioned_sidebars/version-4.x-sidebars.json | 1 +
6 files changed, 478 insertions(+)
diff --git
a/docs/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
b/docs/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
new file mode 100644
index 00000000000..90eb8de3548
--- /dev/null
+++
b/docs/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
@@ -0,0 +1,118 @@
+---
+{
+"title": "DATASKETCHES_HLL_UNION_AGG",
+"language": "en",
+"description": "The datasketches_hll_union_agg function is an aggregate
function used to union multiple Apache DataSketches HLL sketches and return the
estimated cardinality of the union as a DOUBLE value."
+}
+---
+
+## Description
+
+`datasketches_hll_union_agg` is an aggregate function used to **union**
multiple Apache DataSketches **HLL** (`hll_sketch`) serialized values and
return the **estimated cardinality** (approximate distinct count / NDV) after
union.
+
+This function expects the input to be **serialized bytes of a DataSketches HLL
sketch** (for example, generated by `hll_sketch.serialize_compact()` in the
DataSketches library). It does not accept arbitrary strings.
+
+Aliases:
+
+- `ds_hll_estimate`
+- `datasketches_hll_estimate`
+
+## Syntax
+
+```sql
+datasketches_hll_union_agg(<sketch>)
+```
+
+## Parameters
+
+| Parameter | Description |
+| -- | -- |
+| `<sketch>` | The serialized bytes of an Apache DataSketches HLL sketch.
Supported types: STRING / VARCHAR / VARBINARY. NULL values are ignored. Empty
strings are treated as invalid input and will throw an error. |
+
+## Return Value
+
+Returns a DOUBLE (Float64) cardinality estimate value.
+If there is no valid data in the group (or the input is empty), returns 0.
+If the input bytes cannot be deserialized as a valid DataSketches HLL sketch
(including empty string), an error is thrown (typically with error code
`CORRUPTION`).
+
+## Example
+
+```sql
+-- setup
+CREATE TABLE test_datasketches_hll_union_agg_tbl (
+ id INT,
+ sk STRING
+)
+DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+-- The sketch bytes are inserted via Base64 decoding.
+INSERT INTO test_datasketches_hll_union_agg_tbl VALUES
+ (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')),
+ (2,
from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')),
+ (3, NULL);
+```
+
+```sql
+-- The function returns DOUBLE, so use ROUND/CAST if you want an integer
display.
+SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT)
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++-------------------------------------------------------+
+| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) |
++-------------------------------------------------------+
+| 17 |
++-------------------------------------------------------+
+```
+
+```sql
+-- aliases
+SELECT
+ CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1,
+ CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2,
+ CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++------+------+------+
+| v1 | v2 | v3 |
++------+------+------+
+| 17 | 17 | 17 |
++------+------+------+
+```
+
+```sql
+-- empty input returns 0
+SELECT datasketches_hll_union_agg(sk)
+FROM test_datasketches_hll_union_agg_tbl
+WHERE sk IS NULL;
+```
+
+```text
++--------------------------------+
+| datasketches_hll_union_agg(sk) |
++--------------------------------+
+| 0 |
++--------------------------------+
+```
+
+```sql
+-- invalid sketch bytes will throw
+SELECT datasketches_hll_union_agg(from_base64('AA=='));
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: Attempt to deserialize unknown object type
+```
+
+```sql
+-- empty string is invalid and will throw
+SELECT datasketches_hll_union_agg('');
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: empty input.
+```
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
new file mode 100644
index 00000000000..b4e54b4817f
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
@@ -0,0 +1,118 @@
+---
+{
+ "title": "DATASKETCHES_HLL_UNION_AGG",
+ "language": "zh-CN",
+ "description": "datasketches_hll_union_agg 函数是一种聚合函数,用于对多个 Apache
DataSketches HLL sketch 的序列化结果进行 union 合并,并返回合并后基数的估算值(DOUBLE)。"
+}
+---
+
+## 描述
+
+`datasketches_hll_union_agg` 函数是一种聚合函数,用于对多个 **Apache DataSketches HLL
sketch(hll_sketch)** 的序列化结果进行 **union 合并**,并返回合并后基数的**估算值**(近似去重数 / NDV)。
+
+该函数的输入不是普通字符串,而是 **DataSketches HLL sketch 的序列化字节串**(例如由 DataSketches 的
`hll_sketch.serialize_compact()` 生成)。
+
+别名:
+
+- `ds_hll_estimate`
+- `datasketches_hll_estimate`
+
+## 语法
+
+```sql
+datasketches_hll_union_agg(<sketch>)
+```
+
+## 参数
+
+| 参数 | 说明 |
+| -- | -- |
+| `<sketch>` | DataSketches HLL sketch 的序列化字节串。支持类型:STRING / VARCHAR /
VARBINARY。NULL 会被忽略;空字符串属于非法输入,将报错。 |
+
+## 返回值
+
+返回 DOUBLE(Float64)类型的基数估算值。
+如果没有合法数据(例如全为 NULL,或表为空)则返回 0。
+若输入字节串无法反序列化为合法的 DataSketches HLL sketch(包括空字符串),将报错(通常错误码为 `CORRUPTION`)。
+
+## 举例
+
+```sql
+-- setup
+CREATE TABLE test_datasketches_hll_union_agg_tbl (
+ id INT,
+ sk STRING
+)
+DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+-- 通过 from_base64() 将 Base64 文本解码为 sketch 字节串后写入
+INSERT INTO test_datasketches_hll_union_agg_tbl VALUES
+ (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')),
+ (2,
from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')),
+ (3, NULL);
+```
+
+```sql
+-- 该函数返回 DOUBLE,如需以整数形式展示可配合 ROUND/CAST
+SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT)
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++-------------------------------------------------------+
+| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) |
++-------------------------------------------------------+
+| 17 |
++-------------------------------------------------------+
+```
+
+```sql
+-- 别名用法
+SELECT
+ CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1,
+ CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2,
+ CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++------+------+------+
+| v1 | v2 | v3 |
++------+------+------+
+| 17 | 17 | 17 |
++------+------+------+
+```
+
+```sql
+-- 组内无合法数据返回 0
+SELECT datasketches_hll_union_agg(sk)
+FROM test_datasketches_hll_union_agg_tbl
+WHERE sk IS NULL;
+```
+
+```text
++--------------------------------+
+| datasketches_hll_union_agg(sk) |
++--------------------------------+
+| 0 |
++--------------------------------+
+```
+
+```sql
+-- 非法 sketch 字节串将报错
+SELECT datasketches_hll_union_agg(from_base64('AA=='));
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: Attempt to deserialize unknown object type
+```
+
+```sql
+-- 空字符串属于非法输入,将报错
+SELECT datasketches_hll_union_agg('');
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: empty input.
+```
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
new file mode 100644
index 00000000000..5d1960fa907
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
@@ -0,0 +1,120 @@
+---
+{
+ "title": "DATASKETCHES_HLL_UNION_AGG",
+ "language": "zh-CN",
+ "description": "datasketches_hll_union_agg 函数是一种聚合函数,用于对多个 Apache
DataSketches HLL sketch 的序列化结果进行 union 合并,并返回合并后基数的估算值(DOUBLE)。"
+}
+---
+
+> 从 4.1.2 版本开始支持。
+
+## 描述
+
+`datasketches_hll_union_agg` 函数是一种聚合函数,用于对多个 **Apache DataSketches HLL
sketch(hll_sketch)** 的序列化结果进行 **union 合并**,并返回合并后基数的**估算值**(近似去重数 / NDV)。
+
+该函数的输入不是普通字符串,而是 **DataSketches HLL sketch 的序列化字节串**(例如由 DataSketches 的
`hll_sketch.serialize_compact()` 生成)。
+
+## 别名
+
+- `ds_hll_estimate`
+- `datasketches_hll_estimate`
+
+## 语法
+
+```sql
+datasketches_hll_union_agg(<sketch>)
+```
+
+## 参数
+
+| 参数 | 说明 |
+| -- | -- |
+| `<sketch>` | DataSketches HLL sketch 的序列化字节串。支持类型:STRING / VARCHAR /
VARBINARY。NULL 会被忽略;空字符串属于非法输入,将报错。 |
+
+## 返回值
+
+返回 DOUBLE(Float64)类型的基数估算值。
+如果没有合法数据(例如全为 NULL,或表为空)则返回 0。
+若输入字节串无法反序列化为合法的 DataSketches HLL sketch(包括空字符串),将报错(通常错误码为 `CORRUPTION`)。
+
+## 举例
+
+```sql
+-- setup
+CREATE TABLE test_datasketches_hll_union_agg_tbl (
+ id INT,
+ sk STRING
+)
+DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+-- 通过 from_base64() 将 Base64 文本解码为 sketch 字节串后写入
+INSERT INTO test_datasketches_hll_union_agg_tbl VALUES
+ (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')),
+ (2,
from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')),
+ (3, NULL);
+```
+
+```sql
+-- 该函数返回 DOUBLE,如需以整数形式展示可配合 ROUND/CAST
+SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT)
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++-------------------------------------------------------+
+| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) |
++-------------------------------------------------------+
+| 17 |
++-------------------------------------------------------+
+```
+
+```sql
+-- 别名用法
+SELECT
+ CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1,
+ CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2,
+ CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++------+------+------+
+| v1 | v2 | v3 |
++------+------+------+
+| 17 | 17 | 17 |
++------+------+------+
+```
+
+```sql
+-- 组内无合法数据返回 0
+SELECT datasketches_hll_union_agg(sk)
+FROM test_datasketches_hll_union_agg_tbl
+WHERE sk IS NULL;
+```
+
+```text
++--------------------------------+
+| datasketches_hll_union_agg(sk) |
++--------------------------------+
+| 0 |
++--------------------------------+
+```
+
+```sql
+-- 非法 sketch 字节串将报错
+SELECT datasketches_hll_union_agg(from_base64('AA=='));
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: Attempt to deserialize unknown object type
+```
+
+```sql
+-- 空字符串属于非法输入,将报错
+SELECT datasketches_hll_union_agg('');
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: empty input.
+```
\ No newline at end of file
diff --git a/sidebars.ts b/sidebars.ts
index 327afb8a283..8edfc219570 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -1997,6 +1997,7 @@ const sidebars: SidebarsConfig = {
'sql-manual/sql-functions/aggregate-functions/count-by-enum',
'sql-manual/sql-functions/aggregate-functions/covar',
'sql-manual/sql-functions/aggregate-functions/covar-samp',
+
'sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg',
'sql-manual/sql-functions/aggregate-functions/exponential-moving-average',
'sql-manual/sql-functions/aggregate-functions/group-array-intersect',
'sql-manual/sql-functions/aggregate-functions/group-array-union',
diff --git
a/versioned_docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
b/versioned_docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
new file mode 100644
index 00000000000..9c2d1dea1f0
--- /dev/null
+++
b/versioned_docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
@@ -0,0 +1,120 @@
+---
+{
+ "title": "DATASKETCHES_HLL_UNION_AGG",
+ "language": "en",
+ "description": "The datasketches_hll_union_agg function is an aggregate
function used to union multiple Apache DataSketches HLL sketches and return the
estimated cardinality of the union as a DOUBLE value."
+}
+---
+
+> Supported since version 4.1.2.
+
+## Description
+
+`datasketches_hll_union_agg` is an aggregate function used to **union**
multiple Apache DataSketches **HLL** (`hll_sketch`) serialized values and
return the **estimated cardinality** (approximate distinct count / NDV) after
union.
+
+This function expects the input to be **serialized bytes of a DataSketches HLL
sketch** (for example, generated by `hll_sketch.serialize_compact()` in the
DataSketches library). It does not accept arbitrary strings.
+
+## Alias
+
+- `ds_hll_estimate`
+- `datasketches_hll_estimate`
+
+## Syntax
+
+```sql
+datasketches_hll_union_agg(<sketch>)
+```
+
+## Parameters
+
+| Parameter | Description |
+| -- | -- |
+| `<sketch>` | The serialized bytes of an Apache DataSketches HLL sketch.
Supported types: STRING / VARCHAR / VARBINARY. NULL values are ignored. Empty
strings are treated as invalid input and will throw an error. |
+
+## Return Value
+
+Returns a DOUBLE (Float64) cardinality estimate value.
+If there is no valid data in the group (or the input is empty), returns 0.
+If the input bytes cannot be deserialized as a valid DataSketches HLL sketch
(including empty string), an error is thrown (typically with error code
`CORRUPTION`).
+
+## Example
+
+```sql
+-- setup
+CREATE TABLE test_datasketches_hll_union_agg_tbl (
+ id INT,
+ sk STRING
+)
+DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+-- The sketch bytes are inserted via Base64 decoding.
+INSERT INTO test_datasketches_hll_union_agg_tbl VALUES
+ (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')),
+ (2,
from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')),
+ (3, NULL);
+```
+
+```sql
+-- The function returns DOUBLE, so use ROUND/CAST if you want an integer
display.
+SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT)
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++-------------------------------------------------------+
+| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) |
++-------------------------------------------------------+
+| 17 |
++-------------------------------------------------------+
+```
+
+```sql
+-- aliases
+SELECT
+ CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1,
+ CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2,
+ CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3
+FROM test_datasketches_hll_union_agg_tbl;
+```
+
+```text
++------+------+------+
+| v1 | v2 | v3 |
++------+------+------+
+| 17 | 17 | 17 |
++------+------+------+
+```
+
+```sql
+-- empty input returns 0
+SELECT datasketches_hll_union_agg(sk)
+FROM test_datasketches_hll_union_agg_tbl
+WHERE sk IS NULL;
+```
+
+```text
++--------------------------------+
+| datasketches_hll_union_agg(sk) |
++--------------------------------+
+| 0 |
++--------------------------------+
+```
+
+```sql
+-- invalid sketch bytes will throw
+SELECT datasketches_hll_union_agg(from_base64('AA=='));
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: Attempt to deserialize unknown object type
+```
+
+```sql
+-- empty string is invalid and will throw
+SELECT datasketches_hll_union_agg('');
+```
+
+```text
+ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL
sketch data corrupted when add: empty input.
+```
\ No newline at end of file
diff --git a/versioned_sidebars/version-4.x-sidebars.json
b/versioned_sidebars/version-4.x-sidebars.json
index a329bbc91ae..57c240ad928 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -2168,6 +2168,7 @@
"sql-manual/sql-functions/aggregate-functions/count-by-enum",
"sql-manual/sql-functions/aggregate-functions/covar",
"sql-manual/sql-functions/aggregate-functions/covar-samp",
+
"sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg",
"sql-manual/sql-functions/aggregate-functions/group-array-intersect",
"sql-manual/sql-functions/aggregate-functions/group-array-union",
"sql-manual/sql-functions/aggregate-functions/group-bit-and",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]