This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new c8abb3cfbb1 [file-format] add enable_text_validate_utf8 doc (#1960)
c8abb3cfbb1 is described below

commit c8abb3cfbb11b3308bc7506a273f8215901c674e
Author: Mingyu Chen (Rayner) <morning...@163.com>
AuthorDate: Wed Feb 5 08:40:22 2025 +0800

    [file-format] add enable_text_validate_utf8 doc (#1960)
---
 docs/lakehouse/file-formats/text.md                    | 18 ++++++++++++++++++
 .../current/lakehouse/file-formats/text.md             | 17 +++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/docs/lakehouse/file-formats/text.md 
b/docs/lakehouse/file-formats/text.md
index e798a6d8423..a6a40f9d92f 100644
--- a/docs/lakehouse/file-formats/text.md
+++ b/docs/lakehouse/file-formats/text.md
@@ -65,3 +65,21 @@ This document introduces the support for reading and writing 
text file formats i
 
   Import functionality supports JSON formats. See the import documentation for 
details.
 
+## Character Set
+
+Currently, Doris only supports the UTF-8 character set encoding. However, some 
data, such as the data in Hive Text-formatted tables, may contain content 
encoded in non-UTF-8 encoding, which will cause reading failures and result in 
the following error:
+
+```text
+Only support csv data in utf8 codec
+```
+
+In this case, you can set the session variable as follows:
+
+```text
+SET enable_text_validate_utf8 = false
+```
+
+This will ignore the UTF-8 encoding check, allowing you to read this content. 
Note that this parameter is only used to skip the check, and non-UTF-8 encoded 
content will still be displayed as garbled text.
+
+This parameter has been supported since version 3.0.4.
+
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md
index d99a78c652d..6066b1870d0 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md
@@ -72,3 +72,20 @@ under the License.
 
   导入功能支持的 JSON 格式,详见导入相关文档。
 
+## 字符集
+
+Doris 目前仅支持 UTF-8 编码的字符集。而某些数据,如 Hive Text 格式表中的数据会包含非 UFT-8 编码的内容,会导致读取失败,并报错:
+
+```text
+Only support csv data in utf8 codec
+```
+
+此时,可以通过设置会话变量:
+
+```sql
+SET enable_text_validate_utf8 = false
+```
+
+来忽略 UFT-8 编码检查,以便能够读取这些内容。注意,这个参数仅用于忽略检查,非 UTF-8 编码的内容仍会显示为乱码。
+
+此参数自 3.0.4 版本支持。


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to