This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new c8abb3cfbb1 [file-format] add enable_text_validate_utf8 doc (#1960) c8abb3cfbb1 is described below commit c8abb3cfbb11b3308bc7506a273f8215901c674e Author: Mingyu Chen (Rayner) <morning...@163.com> AuthorDate: Wed Feb 5 08:40:22 2025 +0800 [file-format] add enable_text_validate_utf8 doc (#1960) --- docs/lakehouse/file-formats/text.md | 18 ++++++++++++++++++ .../current/lakehouse/file-formats/text.md | 17 +++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/docs/lakehouse/file-formats/text.md b/docs/lakehouse/file-formats/text.md index e798a6d8423..a6a40f9d92f 100644 --- a/docs/lakehouse/file-formats/text.md +++ b/docs/lakehouse/file-formats/text.md @@ -65,3 +65,21 @@ This document introduces the support for reading and writing text file formats i Import functionality supports JSON formats. See the import documentation for details. +## Character Set + +Currently, Doris only supports the UTF-8 character set encoding. However, some data, such as the data in Hive Text-formatted tables, may contain content encoded in non-UTF-8 encoding, which will cause reading failures and result in the following error: + +```text +Only support csv data in utf8 codec +``` + +In this case, you can set the session variable as follows: + +```text +SET enable_text_validate_utf8 = false +``` + +This will ignore the UTF-8 encoding check, allowing you to read this content. Note that this parameter is only used to skip the check, and non-UTF-8 encoded content will still be displayed as garbled text. + +This parameter has been supported since version 3.0.4. + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md index d99a78c652d..6066b1870d0 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/file-formats/text.md @@ -72,3 +72,20 @@ under the License. 导入功能支持的 JSON 格式,详见导入相关文档。 +## 字符集 + +Doris 目前仅支持 UTF-8 编码的字符集。而某些数据,如 Hive Text 格式表中的数据会包含非 UFT-8 编码的内容,会导致读取失败,并报错: + +```text +Only support csv data in utf8 codec +``` + +此时,可以通过设置会话变量: + +```sql +SET enable_text_validate_utf8 = false +``` + +来忽略 UFT-8 编码检查,以便能够读取这些内容。注意,这个参数仅用于忽略检查,非 UTF-8 编码的内容仍会显示为乱码。 + +此参数自 3.0.4 版本支持。 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org