This is an automated email from the ASF dual-hosted git repository.
zclll pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 196c19c567f [Feature](func) Support function soundex (#2846)
196c19c567f is described below
commit 196c19c567ffa5f9da7d7417606739966ca79500
Author: linrrarity <[email protected]>
AuthorDate: Tue Sep 9 19:44:00 2025 +0800
[Feature](func) Support function soundex (#2846)
## Versions
- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---------
Co-authored-by: linzhenqi <[email protected]>
---
.../scalar-functions/string-functions/soundex.md | 102 +++++++++++++++++++++
.../scalar-functions/string-functions/soundex.md | 100 ++++++++++++++++++++
sidebars.json | 1 +
3 files changed, 203 insertions(+)
diff --git
a/docs/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
b/docs/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
new file mode 100644
index 00000000000..aa5f15e64b3
--- /dev/null
+++ b/docs/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
@@ -0,0 +1,102 @@
+---
+{
+ "title": "SOUNDEX",
+ "language": "en"
+}
+---
+
+## Description
+
+The SOUNDEX function computes the [American
Soundex](https://en.wikipedia.org/wiki/Soundex) value, which consists of the
first letter followed by a three-digit sound code that represents the English
pronunciation of the input string.
+
+The function ignores all non-letter characters in the string.
+
+## Syntax
+
+```sql
+SOUNDEX ( <expr> )
+```
+
+## Arguments
+
+| Argument | Description |
+|----------|----------------------------|
+| `<expr>` | The string to compute for, only accept ASCII characters. |
+
+## Return Value
+
+Returns a VARCHAR(4) string consisting of an uppercase letter followed by a
three-digit numeric sound code representing English pronunciation.
+
+If the string is empty or contains no letter characters, an empty string is
returned.
+
+If the string to be processed contains non-ASCII characters, the function will
throw an exception during the calculation process.
+
+If the input is NULL, NULL is returned.
+
+## Examples
+
+The following table simulates a list of names.
+```sql
+CREATE TABLE IF NOT EXISTS soundex_test (
+ name VARCHAR(20)
+) DISTRIBUTED BY HASH(name) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO soundex_test (name) VALUES
+('Doris'),
+('Smith'), ('Smyth'),
+('H'), ('P'), ('Lee'),
+('Robert'), ('R@b-e123rt'),
+('123@*%'), (''),
+('Ashcraft'), ('Honeyman'), ('Pfister'), (NULL);
+```
+
+```sql
+SELECT name, soundex(name) AS IDX FROM soundex_test;
+```
+```text
++------------+------+
+| NULL | NULL |
+| | |
+| 123@*% | |
+| Ashcraft | A261 |
+| Doris | D620 |
+| H | H000 |
+| Honeyman | H555 |
+| Lee | L000 |
+| P | P000 |
+| Pfister | P236 |
+| R@b-e123rt | R163 |
+| Robert | R163 |
+| Smith | S530 |
+| Smyth | S530 |
++------------+------+
+```
+
+Behavior for non-ASCII characters:
+
+- When Doris processes the input string character by character, if it
encounters a non-ASCII character before finishing the computation, it will
throw an error. Example:
+
+```sql
+SELECT SOUNDEX('你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage =
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+
+```sql
+-- After processing `Doris` it produces D62 (still missing one digit, not a
complete 4-character code)
+-- When it reads the non-ASCII character `你`, the function errors
+SELECT SOUNDEX('Doris 你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage =
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+
+```sql
+SELECT SOUNDEX('Apache Doris 你好');
+```
+
+```text
++--------------------------------+
+| SOUNDEX('Apache Doris 你好') |
++--------------------------------+
+| A123 |
++--------------------------------+
+```
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
new file mode 100644
index 00000000000..888ca06b7bd
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/scalar-functions/string-functions/soundex.md
@@ -0,0 +1,100 @@
+---
+{
+ "title": "SOUNDEX",
+ "language": "zh-CN"
+}
+---
+
+## 描述
+
+SOUNDEX 函数用于计算[美国 Soundex](https://zh.wikipedia.org/zh-cn/Soundex)
值,其中包括第一个字母,后跟一个 3 位数字的声音编码
+该编码表示用户指定的字符串的英语发音。
+
+该函数会忽略所有字符串中的非字母字符。
+
+## 语法
+
+```sql
+SOUNDEX ( <expr> )
+```
+
+## 参数
+
+| 参数 | 说明 |
+|---------|-----------|
+| `<expr>` | 需要计算的字符串,仅接受 ASCII 字符。 |
+
+## 返回值
+
+返回一个 VARCHAR(4) 字符串,其中包括一个大写字母,后跟代表英语发音的三位数字声音编码。
+
+如果字符串为空,或字符串中不含任何字母字符,则返回空字符串。
+
+如果待处理的字符串包含非 ASCII 字符,函数将在计算过程中抛出异常。
+
+输入为 NULL 时返回 NULL。
+
+## 举例
+
+下格模拟了一个名字列表。
+```sql
+CREATE TABLE IF NOT EXISTS soundex_test (
+ name VARCHAR(20)
+) DISTRIBUTED BY HASH(name) BUCKETS 1
+PROPERTIES ("replication_num" = "1");
+
+INSERT INTO soundex_test (name) VALUES
+('Doris'),
+('Smith'), ('Smyth'),
+('H'), ('P'), ('Lee'),
+('Robert'), ('R@b-e123rt'),
+('123@*%'), (''),
+('Ashcraft'), ('Honeyman'), ('Pfister'), (NULL);
+```
+
+```sql
+SELECT name, soundex(name) AS IDX FROM soundex_test;
+```
+```text
++------------+------+
+| NULL | NULL |
+| | |
+| 123@*% | |
+| Ashcraft | A261 |
+| Doris | D620 |
+| H | H000 |
+| Honeyman | H555 |
+| Lee | L000 |
+| P | P000 |
+| Pfister | P236 |
+| R@b-e123rt | R163 |
+| Robert | R163 |
+| Smith | S530 |
+| Smyth | S530 |
++------------+------+
+```
+
+对非 ASCII 码的行为:
+
+- Doris 在逐字符处理输入字符串时,如果在完成计算之前遇到非 ASCII 字符,会立即抛出错误,示例如下:
+
+```sql
+SELECT SOUNDEX('你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage =
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+```sql
+-- 在处理完 `Doris` 后得到 D62(还缺一位数字,未构成完整的 4 字符编码)
+-- 读到非 ASCII 字符 `你` 后,函数报错
+SELECT SOUNDEX('Doris 你好');
+-- ERROR 1105 (HY000): errCode = 2, detailMessage =
(127.0.0.1)[INVALID_ARGUMENT]soundex only supports ASCII
+```
+```sql
+SELECT SOUNDEX('Apache Doris 你好');
+```
+```text
++--------------------------------+
+| SOUNDEX('Apache Doris 你好') |
++--------------------------------+
+| A123 |
++--------------------------------+
+```
\ No newline at end of file
diff --git a/sidebars.json b/sidebars.json
index 2599e2cad2e..dd7841d6d50 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -1292,6 +1292,7 @@
"sql-manual/sql-functions/scalar-functions/string-functions/rpad",
"sql-manual/sql-functions/scalar-functions/string-functions/rtrim",
"sql-manual/sql-functions/scalar-functions/string-functions/rtrim-in",
+
"sql-manual/sql-functions/scalar-functions/string-functions/soundex",
"sql-manual/sql-functions/scalar-functions/string-functions/strleft",
"sql-manual/sql-functions/scalar-functions/string-functions/strright",
"sql-manual/sql-functions/scalar-functions/string-functions/split-by-regexp",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]