This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new ecd2941af1 [faq] add parquet file tools doc (#733) ecd2941af1 is described below commit ecd2941af1f0b03faf5512a7298e5d85f9d384f2 Author: Mingyu Chen <morning...@163.com> AuthorDate: Tue Jun 11 19:38:55 2024 +0800 [faq] add parquet file tools doc (#733) --- docs/faq/lakehouse-faq.md | 13 +++++++++++++ .../current/faq/lakehouse-faq.md | 13 +++++++++++++ 2 files changed, 26 insertions(+) diff --git a/docs/faq/lakehouse-faq.md b/docs/faq/lakehouse-faq.md index 48a58dc7c1..d3a36d18f3 100644 --- a/docs/faq/lakehouse-faq.md +++ b/docs/faq/lakehouse-faq.md @@ -344,3 +344,16 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca- 1. After the Binary type is mapped to Doris, the query is garbled Doris does not natively support the Binary type, so the Binary type in various data lakes or databases is mapped to Doris, usually using the String type for mapping. The String type can only display printable characters. If you need to query the contents of Binary, you can use the `TO_BASE64()` function to convert it to Base64 encoding before proceeding to the next step. + +2. Analyzing Parquet file + + When querying Parquet files, because the formats of Parquet files generated by different systems may vary, such as the number of RowGroups, index values, etc., it is sometimes necessary to check the metadata of the Parquet files for problem location or performance analysis. Here is a tool to help users analyze Parquet files more conveniently: + + 1. Download and unzip [Apache Parquet Cli 1.14.0](https://github.com/morningman/tools/releases/download/apache-parquet-cli-1.14.0/apache-parquet-cli-1.14.0.tar.xz) + 2. Download the Parquet file to be analyzed locally, assuming the path is `/path/to/file.parquet` + 3. Use the following command to analyze the Parquet file meta information: + + `./parquet-tools meta /path/to/file.parquet` + + 4. For more functions, see [Apache Parquet Cli documentation](https://github.com/apache/parquet-java/tree/apache-parquet-1.14.0/parquet-cli) + diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/lakehouse-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/lakehouse-faq.md index 8c47738264..fb1f45830e 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/lakehouse-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/lakehouse-faq.md @@ -342,3 +342,16 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca- 1. Binary 类型映射到 Doris 后,查询乱码 Doris 原生不支持 Binary 类型,所以各类数据湖或数据库中的 Binary 类型映射到 Doris 中,通常使用 String 类型进行映射。String 类型只能展示可打印字符。如果需要查询 Binary 的内容,可以使用 `TO_BASE64()` 函数转换为 Base64 编码后,在进行下一步处理。 + +2. 分析 Parquet 文件 + + 在查询 Parquet 文件时,由于不同系统生成的 Parquet 文件格式可能有所差异,比如 RowGroup 的数量,索引的值等,有时需要检查 Parquet 文件的元数据进行问题定位或性能分析。这里提供一个工具帮助用户更方便的分析 Parquet 文件: + + 1. 下载并解压 [Apache Parquet Cli 1.14.0](https://github.com/morningman/tools/releases/download/apache-parquet-cli-1.14.0/apache-parquet-cli-1.14.0.tar.xz) + 2. 将需要分析的 Parquet 文件下载到本地,假设路径为 `/path/to/file.parquet` + 3. 使用如下命令分析 Parquet 文件元信息: + + `./parquet-tools meta /path/to/file.parquet` + + 4. 更多功能,可参阅 [Apache Parquet Cli 文档](https://github.com/apache/parquet-java/tree/apache-parquet-1.14.0/parquet-cli) + --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org