(doris-website) branch master updated: add ngram desc (#1735)

zykkk Sun, 12 Jan 2025 22:56:01 -0800

This is an automated email from the ASF dual-hosted git repository.

zykkk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 35f94297b42 add ngram desc (#1735)
35f94297b42 is described below

commit 35f94297b424fc4021c52ae9e6607b2357b00aa8
Author: wangtianyi2004 <376612...@qq.com>
AuthorDate: Mon Jan 13 14:55:35 2025 +0800

    add ngram desc (#1735)
---
 .../storage-compute-decoupled-deploy-manually.md        | 14 +++++++++++---
 docs/table-design/data-model/tips.md                    |  2 +-
 docs/table-design/index/ngram-bloomfilter-index.md      |  4 ++--
 docs/table-design/row-store.md                          | 11 ++++++-----
 .../storage-compute-decoupled-deploy-manually.md        | 17 +++++++++++++----
 .../table-design/index/ngram-bloomfilter-index.md       |  4 ++--
 .../current/table-design/row-store.md                   |  2 +-
 .../table-design/index/ngram-bloomfilter-index.md       |  4 ++--
 .../version-2.1/table-design/row-store.md               |  2 +-
 .../storage-compute-decoupled-deploy-manually.md        | 17 +++++++++++++----
 .../table-design/index/ngram-bloomfilter-index.md       |  4 ++--
 .../version-3.0/table-design/row-store.md               |  2 +-
 .../version-2.1/table-design/data-model/tips.md         |  2 +-
 .../table-design/index/ngram-bloomfilter-index.md       |  4 ++--
 versioned_docs/version-2.1/table-design/row-store.md    | 11 ++++++-----
 .../storage-compute-decoupled-deploy-manually.md        | 14 +++++++++++---
 .../version-3.0/table-design/data-model/tips.md         |  2 +-
 .../table-design/index/ngram-bloomfilter-index.md       |  4 ++--
 versioned_docs/version-3.0/table-design/row-store.md    | 11 ++++++-----
 19 files changed, 84 insertions(+), 47 deletions(-)

diff --git 
a/docs/install/deploy-manually/storage-compute-decoupled-deploy-manually.md 
b/docs/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
index a2e2a4fcd47..1a354de126a 100644
--- a/docs/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
+++ b/docs/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
@@ -255,8 +255,16 @@ To add Backend nodes to the cluster, perform the following 
steps for each Backen
 1. Configure `be.conf`
 
    In the `be.conf` file, you need to configure the following key parameters:
+   - deploy_mode
+     - Description: Specifies the startup mode of doris
+     - Format: cloud indicates separation of storage and computing mode, 
others indicate integration of storage and computing mode
+     - Example: cloud
+   - file_cache_path
+     - Description: Disk path and other parameters used for file caching, 
represented in array form, each disk is an item. path specifies the disk path, 
total_size limits the cache size; -1 or 0 will use the entire disk space.
+     - Format: [{"path":"/path/to/file_cache", "total_size":21474836480}, 
{"path":"/path/to/file_cache2", "total_size":21474836480}]
+     - Example: [{"path":"/path/to/file_cache", "total_size":21474836480}, 
{"path":"/path/to/file_cache2", "total_size":21474836480}] - Default: 
[{"path":"${DORIS_HOME}/file_cache"}]
 
-2. Start the BE process
+3. Start the BE process
 
    Use the following command to start the Backend:
 
@@ -264,7 +272,7 @@ To add Backend nodes to the cluster, perform the following 
steps for each Backen
    bin/start_be.sh --daemon
    ```
 
-3. Add BE to the cluster:
+4. Add BE to the cluster:
 
    Connect to any Frontend using MySQL client and execute:
 
@@ -278,7 +286,7 @@ To add Backend nodes to the cluster, perform the following 
steps for each Backen
 
    For more detailed usage, refer to [ADD 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND)
 and [REMOVE 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-DROP-BACKEND).
 
-4. Verify BE status
+5. Verify BE status
 
    Check the Backend log files (`be.log`) to ensure it has successfully 
started and joined the cluster.
 
diff --git a/docs/table-design/data-model/tips.md 
b/docs/table-design/data-model/tips.md
index b61542f13ab..fac0d2eb2b4 100644
--- a/docs/table-design/data-model/tips.md
+++ b/docs/table-design/data-model/tips.md
@@ -145,7 +145,7 @@ AGGREGATE KEY columns. Otherwise, `select sum (count) from 
table;` can only expr
 Another method is to add a `cound` column of value 1 but aggregation type of 
REPLACE. Then `select sum (count) from table;`
 and `select count (*) from table;`  could produce the same results. Moreover, 
this method does not require the absence of same AGGREGATE KEY columns in the 
import data.
 
-### Merge on write of unique model
+### MoW Unique Key Model
 
 The Merge on Write implementation in the Unique Model does not impose the same 
limitation as the Aggregate Model. 
 In Merge on Write, the model adds a  `delete bitmap` for each imported rowset 
to mark the data being overwritten or deleted. With the previous example, after 
Batch 1 is imported, the data status will be as follows:
diff --git a/docs/table-design/index/ngram-bloomfilter-index.md 
b/docs/table-design/index/ngram-bloomfilter-index.md
index be1a3a098c1..278054b30ef 100644
--- a/docs/table-design/index/ngram-bloomfilter-index.md
+++ b/docs/table-design/index/ngram-bloomfilter-index.md
@@ -27,7 +27,7 @@ under the License.
 
 ## Indexing Principles
 
-The NGram BloomFilter index, similar to the BloomFilter index, is a skip index 
based on BloomFilter. 
+n-gram tokenization is a method of splitting a sentence or a piece of text 
into multiple adjacent word groups. The NGram BloomFilter index, similar to the 
BloomFilter index, is a skip index based on BloomFilter. 
 
 Unlike the BloomFilter index, the NGram BloomFilter index is used to 
accelerate text LIKE queries. Instead of storing the original text values, it 
tokenizes the text using NGram and stores each token in the BloomFilter. For 
LIKE queries, the pattern in LIKE '%pattern%' is also tokenized using NGram. 
Each token is checked against the BloomFilter, and if any token is not found, 
the corresponding data block does not meet the LIKE condition and can be 
skipped, reducing IO and accelerating th [...]
 
@@ -58,7 +58,7 @@ Explanation of the syntax:
 1. **`idx_column_name(column_name)`** is mandatory. `column_name` is the 
column to be indexed and must appear in the column definitions above. 
`idx_column_name` is the index name, which must be unique at the table level. 
It is recommended to name it with a prefix `idx_` followed by the column name.
 2. **`USING NGRAM_BF`** is mandatory and specifies that the index type is an 
NGram BloomFilter index.
 3. **`PROPERTIES`** is optional and is used to specify additional properties 
for the NGram BloomFilter index. The supported properties are:
-   - **gram_size**: The N in NGram, specifying the number of consecutive 
characters to form a token. For example, 'an ngram example' with N = 3 would be 
tokenized into 'an ', 'n n', ' ng', 'ngr', 'gra', 'ram' (6 tokens).
+   - **gram_size**: The N in NGram, specifying the number of consecutive 
characters to form a token. For example, 'This is a simple ngram example' with 
N = 3 would be tokenized into 'This is a', 'is a simple', 'a simple ngram', 
'simple ngram example' (4 tokens).
    - **bf_size**: The size of the BloomFilter in bits. bf_size determines the 
size of the index corresponding to each data block. The larger this value, the 
more storage space it occupies, but the lower the probability of hash 
collisions.
 
    It is recommended to set **gram_size** to the minimum length of the string 
in LIKE queries but not less than 2. Generally, "gram_size"="3", 
"bf_size"="1024" is recommended, then adjust based on the Query Profile.
diff --git a/docs/table-design/row-store.md b/docs/table-design/row-store.md
index acefd596ec6..171f618cad2 100644
--- a/docs/table-design/row-store.md
+++ b/docs/table-design/row-store.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Hybrid Storage",
+    "title": "Hybrid Row-Columnar Storage",
     "language": "en"
 }
 ---
@@ -24,11 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Hybrid Storage
+## Hybrid Row-Columnar Storage
 
-Doris defaults to columnar storage, where each column is stored contiguously. 
Columnar storage offers excellent performance for analytical scenarios (such as 
aggregation, filtering, sorting, etc.), as it only reads the necessary columns, 
reducing unnecessary IO. However, in point query scenarios (such as `SELECT 
*`), all columns need to be read, requiring an IO operation for each column, 
which can lead to IOPS becoming a bottleneck, especially for wide tables with 
many columns (e.g., hun [...]
+Doris uses columnar storage by default, with each column stored contiguously. 
Columnar storage offers excellent performance for analytical scenarios (such as 
aggregation, filtering, sorting, etc.), as it only reads the necessary columns, 
reducing unnecessary IO. However, in point query scenarios (such as `SELECT 
*`), all columns need to be read, requiring an IO operation for each column, 
which can lead to IOPS becoming a bottleneck, especially for wide tables with 
many columns (e.g., hun [...]
 
-To address the IOPS bottleneck in point query scenarios, starting from version 
2.0.0, Doris supports hybrid storage. When users create tables, they can 
specify whether to enable row storage. With row storage enabled, each row only 
requires one IO operation for point queries (such as `SELECT *`), significantly 
improving performance.
+To address the IOPS bottleneck in point query scenarios, starting from version 
2.0.0, Doris supports Hybrid Row-Columnar Storage. When users create tables, 
they can specify whether to enable row storage. With row storage enabled, each 
row only requires one IO operation for point queries (such as `SELECT *`), 
significantly improving performance.
 
 The principle of row storage is that an additional column is added during 
storage. This column concatenates all the columns of the corresponding row and 
stores them using a special binary format.
 
@@ -51,7 +51,8 @@ When creating a table, specify whether to enable row storage, 
which columns to e
 "row_store_page_size" = "16384"
 ```
 
-The page is the smallest unit of storage read/write operations, and page_size 
is the size of the row storage page. This means that reading one row also 
requires generating an IO for a page. The larger the value, the better the 
compression effect and the lower the storage space usage, but the higher the IO 
overhead for point queries (since one IO reads at least one page), and vice 
versa. The smaller the value, the higher the storage space, the better the 
point query performance. The defau [...]
+A page is the smallest unit for storage read and write operations, and 
`page_size` refers to the size of a row-store page. This means that reading a 
single row requires generating a page IO. The larger this value is, the better 
the compression effect and the lower the storage space usage. However, the IO 
overhead during point queries increases, resulting in lower performance 
(because each IO operation reads at least one page). Conversely, the smaller 
the value, the higher the storage spa [...]
+
 
 ## Example
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
index a23d60a0920..be350b5b62c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
@@ -254,8 +254,17 @@ ALTER SYSTEM ADD FOLLOWER "host:port";
 1. 配置 be.conf
 
    在 `be.conf` 文件中，需要配置以下关键参数：
-
-2. 启动 BE 进程
+    - deploy_mode
+      - 描述：指定 doris 启动模式
+      - 格式：cloud 表示存算分离模式，其它存算一体模式
+      - 示例：cloud
+    - file_cache_path
+      - 描述： 用于文件缓存的磁盘路径和其他参数，以数组形式表示，每个磁盘一项。path 指定磁盘路径，total_size 限制缓存的大小；-1 
或 0 将使用整个磁盘空间。
+      - 格式： 
[{"path":"/path/to/file_cache"，"total_size":21474836480}，{"path":"/path/to/file_cache2"，"total_size":21474836480}]
+      - 示例： 
[{"path":"/path/to/file_cache"，"total_size":21474836480}，{"path":"/path/to/file_cache2"，"total_size":21474836480}]
+      - 默认： [{"path":"${DORIS_HOME}/file_cache"}]
+
+3. 启动 BE 进程
 
    使用以下命令启动 Backend：
 
@@ -263,7 +272,7 @@ ALTER SYSTEM ADD FOLLOWER "host:port";
    bin/start_be.sh --daemon
    ```
 
-3. 将 BE 添加到集群：
+4. 将 BE 添加到集群：
 
    使用 MySQL 客户端连接到任意 Frontend，并执行：
 
@@ -277,7 +286,7 @@ ALTER SYSTEM ADD FOLLOWER "host:port";
 
    更详细的用法请参考 [ADD 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND)
 和 [REMOVE 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-DROP-BACKEND)。
 
-4. 验证 BE 状态
+5. 验证 BE 状态
 
    检查 Backend 日志文件（`be.log`）以确保它已成功启动并加入集群。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md
index a9564ee82db..ae9c0221295 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/index/ngram-bloomfilter-index.md
@@ -26,7 +26,7 @@ under the License.
 
 ## 索引原理
 
-NGram BloomFilter 索引和 BloomFilter 索引类似，也是基于 BloomFilter 的跳数索引。
+n-gram 分词是将一句话或一段文字拆分成多个相邻的词组的分词方法。NGram BloomFilter 索引和 BloomFilter 索引类似，也是基于 
BloomFilter 的跳数索引。
 
 与 BloomFilter 索引不同的是，NGram BloomFilter 索引用于加速文本 LIKE 查询，它存入 BloomFilter 
的不是原始文本的值，而是对文本进行 NGram 分词，每个词作为值存入 BloomFilter。对于 LIKE 查询，将 LIKE '%pattern%' 的 
pattern 也进行 NGram 分词，判断每个词是否在 BloomFilter 中，如果某个词不在则对应的数据块就不满足 LIKE 
条件，可以跳过这部分数据减少IO加速查询。
 
@@ -64,7 +64,7 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询，而且 LIKE pattern
 
 **3. `PROPERTIES` 是可选的，用于指定 NGram BloomFilter 索引的额外属性，目前支持的属性如下：**
 
-- gram_size：NGram 中的 N，指定 N 个连续字符分词一个词，比如 'an ngram example' 在 N = 3 的时候分成 'an 
', 'n n', ' ng', 'ngr', 'gra', 'ram' 6 个词。
+- gram_size：NGram 中的 N，指定 N 个连续字符分词一个词，比如 'This is a simple ngram example' 在 N 
= 3 的时候分成 'This is a', 'is a simple', 'a simple ngram', 'simple ngram example' 
4 个词。
 
 - bf_size：BloomFilter 的大小，单位是 Bit。bf_size 决定每个数据块对应的索引大小，这个值越大占用存储空间越大，同时 Hash 
碰撞的概率也越低。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/row-store.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/row-store.md
index 72ecbba19c1..dc91e1ac788 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/row-store.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/row-store.md
@@ -51,7 +51,7 @@ Doris 默认采用列式存储，每个列连续存储，在分析场景（如
 "row_store_page_size" = "16384"
 ```
 
-page 是存储读写的最小单元，page_size 是行存 page 的大小，也就是说读一行也需要产生一个 page 的 
IO。这个值越大压缩效果越好存储空间占用越低，但是点查时 IO 开销越大性能越低（因为一次 IO 至少读一个 
page），反过来值越小存储空间约高，点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择，如果更偏向查询性能可以配置较小的值比如 4KB 
甚至更低，如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。
+page 是存储读写的最小单元，page_size 是行存 page 的大小，也就是说读一行也需要产生一个 page 的 
IO。这个值越大压缩效果越好存储空间占用越低，但是点查时 IO 开销越大性能越低（因为一次 IO 至少读一个 
page），反过来值越小存储空间极高，点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择，如果更偏向查询性能可以配置较小的值比如 4KB 
甚至更低，如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。
 
 
 ## 使用示例
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md
index a9564ee82db..ae9c0221295 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/index/ngram-bloomfilter-index.md
@@ -26,7 +26,7 @@ under the License.
 
 ## 索引原理
 
-NGram BloomFilter 索引和 BloomFilter 索引类似，也是基于 BloomFilter 的跳数索引。
+n-gram 分词是将一句话或一段文字拆分成多个相邻的词组的分词方法。NGram BloomFilter 索引和 BloomFilter 索引类似，也是基于 
BloomFilter 的跳数索引。
 
 与 BloomFilter 索引不同的是，NGram BloomFilter 索引用于加速文本 LIKE 查询，它存入 BloomFilter 
的不是原始文本的值，而是对文本进行 NGram 分词，每个词作为值存入 BloomFilter。对于 LIKE 查询，将 LIKE '%pattern%' 的 
pattern 也进行 NGram 分词，判断每个词是否在 BloomFilter 中，如果某个词不在则对应的数据块就不满足 LIKE 
条件，可以跳过这部分数据减少IO加速查询。
 
@@ -64,7 +64,7 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询，而且 LIKE pattern
 
 **3. `PROPERTIES` 是可选的，用于指定 NGram BloomFilter 索引的额外属性，目前支持的属性如下：**
 
-- gram_size：NGram 中的 N，指定 N 个连续字符分词一个词，比如 'an ngram example' 在 N = 3 的时候分成 'an 
', 'n n', ' ng', 'ngr', 'gra', 'ram' 6 个词。
+- gram_size：NGram 中的 N，指定 N 个连续字符分词一个词，比如 'This is a simple ngram example' 在 N 
= 3 的时候分成 'This is a', 'is a simple', 'a simple ngram', 'simple ngram example' 
4 个词。
 
 - bf_size：BloomFilter 的大小，单位是 Bit。bf_size 决定每个数据块对应的索引大小，这个值越大占用存储空间越大，同时 Hash 
碰撞的概率也越低。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/row-store.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/row-store.md
index 959f99cc53b..1ec00e34335 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/row-store.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/row-store.md
@@ -46,7 +46,7 @@ Doris 默认采用列式存储，每个列连续存储，在分析场景（如
 "row_store_page_size" = "16384"
 ```
 
-page 是存储读写的最小单元，page_size 是行存 page 的大小，也就是说读一行也需要产生一个 page 的 
IO。这个值越大压缩效果越好存储空间占用越低，但是点查时 IO 开销越大性能越低（因为一次 IO 至少读一个 
page），反过来值越小存储空间约高，点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择，如果更偏向查询性能可以配置较小的值比如 4KB 
甚至更低，如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。
+page 是存储读写的最小单元，page_size 是行存 page 的大小，也就是说读一行也需要产生一个 page 的 
IO。这个值越大压缩效果越好存储空间占用越低，但是点查时 IO 开销越大性能越低（因为一次 IO 至少读一个 
page），反过来值越小存储空间极高，点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择，如果更偏向查询性能可以配置较小的值比如 4KB 
甚至更低，如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。
 
 
 ## 使用示例
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
index a23d60a0920..be350b5b62c 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
@@ -254,8 +254,17 @@ ALTER SYSTEM ADD FOLLOWER "host:port";
 1. 配置 be.conf
 
    在 `be.conf` 文件中，需要配置以下关键参数：
-
-2. 启动 BE 进程
+    - deploy_mode
+      - 描述：指定 doris 启动模式
+      - 格式：cloud 表示存算分离模式，其它存算一体模式
+      - 示例：cloud
+    - file_cache_path
+      - 描述： 用于文件缓存的磁盘路径和其他参数，以数组形式表示，每个磁盘一项。path 指定磁盘路径，total_size 限制缓存的大小；-1 
或 0 将使用整个磁盘空间。
+      - 格式： 
[{"path":"/path/to/file_cache"，"total_size":21474836480}，{"path":"/path/to/file_cache2"，"total_size":21474836480}]
+      - 示例： 
[{"path":"/path/to/file_cache"，"total_size":21474836480}，{"path":"/path/to/file_cache2"，"total_size":21474836480}]
+      - 默认： [{"path":"${DORIS_HOME}/file_cache"}]
+
+3. 启动 BE 进程
 
    使用以下命令启动 Backend：
 
@@ -263,7 +272,7 @@ ALTER SYSTEM ADD FOLLOWER "host:port";
    bin/start_be.sh --daemon
    ```
 
-3. 将 BE 添加到集群：
+4. 将 BE 添加到集群：
 
    使用 MySQL 客户端连接到任意 Frontend，并执行：
 
@@ -277,7 +286,7 @@ ALTER SYSTEM ADD FOLLOWER "host:port";
 
    更详细的用法请参考 [ADD 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND)
 和 [REMOVE 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-DROP-BACKEND)。
 
-4. 验证 BE 状态
+5. 验证 BE 状态
 
    检查 Backend 日志文件（`be.log`）以确保它已成功启动并加入集群。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md
index a9564ee82db..ae9c0221295 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/index/ngram-bloomfilter-index.md
@@ -26,7 +26,7 @@ under the License.
 
 ## 索引原理
 
-NGram BloomFilter 索引和 BloomFilter 索引类似，也是基于 BloomFilter 的跳数索引。
+n-gram 分词是将一句话或一段文字拆分成多个相邻的词组的分词方法。NGram BloomFilter 索引和 BloomFilter 索引类似，也是基于 
BloomFilter 的跳数索引。
 
 与 BloomFilter 索引不同的是，NGram BloomFilter 索引用于加速文本 LIKE 查询，它存入 BloomFilter 
的不是原始文本的值，而是对文本进行 NGram 分词，每个词作为值存入 BloomFilter。对于 LIKE 查询，将 LIKE '%pattern%' 的 
pattern 也进行 NGram 分词，判断每个词是否在 BloomFilter 中，如果某个词不在则对应的数据块就不满足 LIKE 
条件，可以跳过这部分数据减少IO加速查询。
 
@@ -64,7 +64,7 @@ NGram BloomFilter 索引只能加速字符串 LIKE 查询，而且 LIKE pattern
 
 **3. `PROPERTIES` 是可选的，用于指定 NGram BloomFilter 索引的额外属性，目前支持的属性如下：**
 
-- gram_size：NGram 中的 N，指定 N 个连续字符分词一个词，比如 'an ngram example' 在 N = 3 的时候分成 'an 
', 'n n', ' ng', 'ngr', 'gra', 'ram' 6 个词。
+- gram_size：NGram 中的 N，指定 N 个连续字符分词一个词，比如 'This is a simple ngram example' 在 N 
= 3 的时候分成 'This is a', 'is a simple', 'a simple ngram', 'simple ngram example' 
4 个词。
 
 - bf_size：BloomFilter 的大小，单位是 Bit。bf_size 决定每个数据块对应的索引大小，这个值越大占用存储空间越大，同时 Hash 
碰撞的概率也越低。
 
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/row-store.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/row-store.md
index 72ecbba19c1..dc91e1ac788 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/row-store.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/row-store.md
@@ -51,7 +51,7 @@ Doris 默认采用列式存储，每个列连续存储，在分析场景（如
 "row_store_page_size" = "16384"
 ```
 
-page 是存储读写的最小单元，page_size 是行存 page 的大小，也就是说读一行也需要产生一个 page 的 
IO。这个值越大压缩效果越好存储空间占用越低，但是点查时 IO 开销越大性能越低（因为一次 IO 至少读一个 
page），反过来值越小存储空间约高，点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择，如果更偏向查询性能可以配置较小的值比如 4KB 
甚至更低，如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。
+page 是存储读写的最小单元，page_size 是行存 page 的大小，也就是说读一行也需要产生一个 page 的 
IO。这个值越大压缩效果越好存储空间占用越低，但是点查时 IO 开销越大性能越低（因为一次 IO 至少读一个 
page），反过来值越小存储空间极高，点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择，如果更偏向查询性能可以配置较小的值比如 4KB 
甚至更低，如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。
 
 
 ## 使用示例
diff --git a/versioned_docs/version-2.1/table-design/data-model/tips.md 
b/versioned_docs/version-2.1/table-design/data-model/tips.md
index b61542f13ab..fac0d2eb2b4 100644
--- a/versioned_docs/version-2.1/table-design/data-model/tips.md
+++ b/versioned_docs/version-2.1/table-design/data-model/tips.md
@@ -145,7 +145,7 @@ AGGREGATE KEY columns. Otherwise, `select sum (count) from 
table;` can only expr
 Another method is to add a `cound` column of value 1 but aggregation type of 
REPLACE. Then `select sum (count) from table;`
 and `select count (*) from table;`  could produce the same results. Moreover, 
this method does not require the absence of same AGGREGATE KEY columns in the 
import data.
 
-### Merge on write of unique model
+### MoW Unique Key Model
 
 The Merge on Write implementation in the Unique Model does not impose the same 
limitation as the Aggregate Model. 
 In Merge on Write, the model adds a  `delete bitmap` for each imported rowset 
to mark the data being overwritten or deleted. With the previous example, after 
Batch 1 is imported, the data status will be as follows:
diff --git 
a/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md 
b/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md
index be1a3a098c1..278054b30ef 100644
--- a/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md
+++ b/versioned_docs/version-2.1/table-design/index/ngram-bloomfilter-index.md
@@ -27,7 +27,7 @@ under the License.
 
 ## Indexing Principles
 
-The NGram BloomFilter index, similar to the BloomFilter index, is a skip index 
based on BloomFilter. 
+n-gram tokenization is a method of splitting a sentence or a piece of text 
into multiple adjacent word groups. The NGram BloomFilter index, similar to the 
BloomFilter index, is a skip index based on BloomFilter. 
 
 Unlike the BloomFilter index, the NGram BloomFilter index is used to 
accelerate text LIKE queries. Instead of storing the original text values, it 
tokenizes the text using NGram and stores each token in the BloomFilter. For 
LIKE queries, the pattern in LIKE '%pattern%' is also tokenized using NGram. 
Each token is checked against the BloomFilter, and if any token is not found, 
the corresponding data block does not meet the LIKE condition and can be 
skipped, reducing IO and accelerating th [...]
 
@@ -58,7 +58,7 @@ Explanation of the syntax:
 1. **`idx_column_name(column_name)`** is mandatory. `column_name` is the 
column to be indexed and must appear in the column definitions above. 
`idx_column_name` is the index name, which must be unique at the table level. 
It is recommended to name it with a prefix `idx_` followed by the column name.
 2. **`USING NGRAM_BF`** is mandatory and specifies that the index type is an 
NGram BloomFilter index.
 3. **`PROPERTIES`** is optional and is used to specify additional properties 
for the NGram BloomFilter index. The supported properties are:
-   - **gram_size**: The N in NGram, specifying the number of consecutive 
characters to form a token. For example, 'an ngram example' with N = 3 would be 
tokenized into 'an ', 'n n', ' ng', 'ngr', 'gra', 'ram' (6 tokens).
+   - **gram_size**: The N in NGram, specifying the number of consecutive 
characters to form a token. For example, 'This is a simple ngram example' with 
N = 3 would be tokenized into 'This is a', 'is a simple', 'a simple ngram', 
'simple ngram example' (4 tokens).
    - **bf_size**: The size of the BloomFilter in bits. bf_size determines the 
size of the index corresponding to each data block. The larger this value, the 
more storage space it occupies, but the lower the probability of hash 
collisions.
 
    It is recommended to set **gram_size** to the minimum length of the string 
in LIKE queries but not less than 2. Generally, "gram_size"="3", 
"bf_size"="1024" is recommended, then adjust based on the Query Profile.
diff --git a/versioned_docs/version-2.1/table-design/row-store.md 
b/versioned_docs/version-2.1/table-design/row-store.md
index 571432c59d0..156c9c69640 100644
--- a/versioned_docs/version-2.1/table-design/row-store.md
+++ b/versioned_docs/version-2.1/table-design/row-store.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Hybrid Storage",
+    "title": "Hybrid Row-Columnar Storage",
     "language": "en"
 }
 ---
@@ -24,11 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Hybrid Storage
+## Hybrid Row-Columnar Storage
 
-Doris defaults to columnar storage, where each column is stored contiguously. 
Columnar storage offers excellent performance for analytical scenarios (such as 
aggregation, filtering, sorting, etc.), as it only reads the necessary columns, 
reducing unnecessary IO. However, in point query scenarios (such as `SELECT 
*`), all columns need to be read, requiring an IO operation for each column, 
which can lead to IOPS becoming a bottleneck, especially for wide tables with 
many columns (e.g., hun [...]
+Doris uses columnar storage by default, with each column stored contiguously. 
Columnar storage offers excellent performance for analytical scenarios (such as 
aggregation, filtering, sorting, etc.), as it only reads the necessary columns, 
reducing unnecessary IO. However, in point query scenarios (such as `SELECT 
*`), all columns need to be read, requiring an IO operation for each column, 
which can lead to IOPS becoming a bottleneck, especially for wide tables with 
many columns (e.g., hun [...]
 
-To address the IOPS bottleneck in point query scenarios, starting from version 
2.0.0, Doris supports hybrid storage. When users create tables, they can 
specify whether to enable row storage. With row storage enabled, each row only 
requires one IO operation for point queries (such as `SELECT *`), significantly 
improving performance.
+To address the IOPS bottleneck in point query scenarios, starting from version 
2.0.0, Doris supports Hybrid Row-Columnar Storage. When users create tables, 
they can specify whether to enable row storage. With row storage enabled, each 
row only requires one IO operation for point queries (such as `SELECT *`), 
significantly improving performance.
 
 The principle of row storage is that an additional column is added during 
storage. This column concatenates all the columns of the corresponding row and 
stores them using a special binary format.
 
@@ -46,7 +46,8 @@ When creating a table, specify whether to enable row storage, 
and the storage co
 "row_store_page_size" = "16384"
 ```
 
-The page is the smallest unit of storage read/write operations, and page_size 
is the size of the row storage page. This means that reading one row also 
requires generating an IO for a page. The larger the value, the better the 
compression effect and the lower the storage space usage, but the higher the IO 
overhead for point queries (since one IO reads at least one page), and vice 
versa. The smaller the value, the higher the storage space, the better the 
point query performance. The defau [...]
+A page is the smallest unit for storage read and write operations, and 
`page_size` refers to the size of a row-store page. This means that reading a 
single row requires generating a page IO. The larger this value is, the better 
the compression effect and the lower the storage space usage. However, the IO 
overhead during point queries increases, resulting in lower performance 
(because each IO operation reads at least one page). Conversely, the smaller 
the value, the higher the storage spa [...]
+
 
 ## Example
 
diff --git 
a/versioned_docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
 
b/versioned_docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
index a2e2a4fcd47..1a354de126a 100644
--- 
a/versioned_docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
+++ 
b/versioned_docs/version-3.0/install/deploy-manually/storage-compute-decoupled-deploy-manually.md
@@ -255,8 +255,16 @@ To add Backend nodes to the cluster, perform the following 
steps for each Backen
 1. Configure `be.conf`
 
    In the `be.conf` file, you need to configure the following key parameters:
+   - deploy_mode
+     - Description: Specifies the startup mode of doris
+     - Format: cloud indicates separation of storage and computing mode, 
others indicate integration of storage and computing mode
+     - Example: cloud
+   - file_cache_path
+     - Description: Disk path and other parameters used for file caching, 
represented in array form, each disk is an item. path specifies the disk path, 
total_size limits the cache size; -1 or 0 will use the entire disk space.
+     - Format: [{"path":"/path/to/file_cache", "total_size":21474836480}, 
{"path":"/path/to/file_cache2", "total_size":21474836480}]
+     - Example: [{"path":"/path/to/file_cache", "total_size":21474836480}, 
{"path":"/path/to/file_cache2", "total_size":21474836480}] - Default: 
[{"path":"${DORIS_HOME}/file_cache"}]
 
-2. Start the BE process
+3. Start the BE process
 
    Use the following command to start the Backend:
 
@@ -264,7 +272,7 @@ To add Backend nodes to the cluster, perform the following 
steps for each Backen
    bin/start_be.sh --daemon
    ```
 
-3. Add BE to the cluster:
+4. Add BE to the cluster:
 
    Connect to any Frontend using MySQL client and execute:
 
@@ -278,7 +286,7 @@ To add Backend nodes to the cluster, perform the following 
steps for each Backen
 
    For more detailed usage, refer to [ADD 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-BACKEND)
 and [REMOVE 
BACKEND](../../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-DROP-BACKEND).
 
-4. Verify BE status
+5. Verify BE status
 
    Check the Backend log files (`be.log`) to ensure it has successfully 
started and joined the cluster.
 
diff --git a/versioned_docs/version-3.0/table-design/data-model/tips.md 
b/versioned_docs/version-3.0/table-design/data-model/tips.md
index b61542f13ab..fac0d2eb2b4 100644
--- a/versioned_docs/version-3.0/table-design/data-model/tips.md
+++ b/versioned_docs/version-3.0/table-design/data-model/tips.md
@@ -145,7 +145,7 @@ AGGREGATE KEY columns. Otherwise, `select sum (count) from 
table;` can only expr
 Another method is to add a `cound` column of value 1 but aggregation type of 
REPLACE. Then `select sum (count) from table;`
 and `select count (*) from table;`  could produce the same results. Moreover, 
this method does not require the absence of same AGGREGATE KEY columns in the 
import data.
 
-### Merge on write of unique model
+### MoW Unique Key Model
 
 The Merge on Write implementation in the Unique Model does not impose the same 
limitation as the Aggregate Model. 
 In Merge on Write, the model adds a  `delete bitmap` for each imported rowset 
to mark the data being overwritten or deleted. With the previous example, after 
Batch 1 is imported, the data status will be as follows:
diff --git 
a/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md 
b/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md
index be1a3a098c1..278054b30ef 100644
--- a/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md
+++ b/versioned_docs/version-3.0/table-design/index/ngram-bloomfilter-index.md
@@ -27,7 +27,7 @@ under the License.
 
 ## Indexing Principles
 
-The NGram BloomFilter index, similar to the BloomFilter index, is a skip index 
based on BloomFilter. 
+n-gram tokenization is a method of splitting a sentence or a piece of text 
into multiple adjacent word groups. The NGram BloomFilter index, similar to the 
BloomFilter index, is a skip index based on BloomFilter. 
 
 Unlike the BloomFilter index, the NGram BloomFilter index is used to 
accelerate text LIKE queries. Instead of storing the original text values, it 
tokenizes the text using NGram and stores each token in the BloomFilter. For 
LIKE queries, the pattern in LIKE '%pattern%' is also tokenized using NGram. 
Each token is checked against the BloomFilter, and if any token is not found, 
the corresponding data block does not meet the LIKE condition and can be 
skipped, reducing IO and accelerating th [...]
 
@@ -58,7 +58,7 @@ Explanation of the syntax:
 1. **`idx_column_name(column_name)`** is mandatory. `column_name` is the 
column to be indexed and must appear in the column definitions above. 
`idx_column_name` is the index name, which must be unique at the table level. 
It is recommended to name it with a prefix `idx_` followed by the column name.
 2. **`USING NGRAM_BF`** is mandatory and specifies that the index type is an 
NGram BloomFilter index.
 3. **`PROPERTIES`** is optional and is used to specify additional properties 
for the NGram BloomFilter index. The supported properties are:
-   - **gram_size**: The N in NGram, specifying the number of consecutive 
characters to form a token. For example, 'an ngram example' with N = 3 would be 
tokenized into 'an ', 'n n', ' ng', 'ngr', 'gra', 'ram' (6 tokens).
+   - **gram_size**: The N in NGram, specifying the number of consecutive 
characters to form a token. For example, 'This is a simple ngram example' with 
N = 3 would be tokenized into 'This is a', 'is a simple', 'a simple ngram', 
'simple ngram example' (4 tokens).
    - **bf_size**: The size of the BloomFilter in bits. bf_size determines the 
size of the index corresponding to each data block. The larger this value, the 
more storage space it occupies, but the lower the probability of hash 
collisions.
 
    It is recommended to set **gram_size** to the minimum length of the string 
in LIKE queries but not less than 2. Generally, "gram_size"="3", 
"bf_size"="1024" is recommended, then adjust based on the Query Profile.
diff --git a/versioned_docs/version-3.0/table-design/row-store.md 
b/versioned_docs/version-3.0/table-design/row-store.md
index acefd596ec6..171f618cad2 100644
--- a/versioned_docs/version-3.0/table-design/row-store.md
+++ b/versioned_docs/version-3.0/table-design/row-store.md
@@ -1,6 +1,6 @@
 ---
 {
-    "title": "Hybrid Storage",
+    "title": "Hybrid Row-Columnar Storage",
     "language": "en"
 }
 ---
@@ -24,11 +24,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## Hybrid Storage
+## Hybrid Row-Columnar Storage
 
-Doris defaults to columnar storage, where each column is stored contiguously. 
Columnar storage offers excellent performance for analytical scenarios (such as 
aggregation, filtering, sorting, etc.), as it only reads the necessary columns, 
reducing unnecessary IO. However, in point query scenarios (such as `SELECT 
*`), all columns need to be read, requiring an IO operation for each column, 
which can lead to IOPS becoming a bottleneck, especially for wide tables with 
many columns (e.g., hun [...]
+Doris uses columnar storage by default, with each column stored contiguously. 
Columnar storage offers excellent performance for analytical scenarios (such as 
aggregation, filtering, sorting, etc.), as it only reads the necessary columns, 
reducing unnecessary IO. However, in point query scenarios (such as `SELECT 
*`), all columns need to be read, requiring an IO operation for each column, 
which can lead to IOPS becoming a bottleneck, especially for wide tables with 
many columns (e.g., hun [...]
 
-To address the IOPS bottleneck in point query scenarios, starting from version 
2.0.0, Doris supports hybrid storage. When users create tables, they can 
specify whether to enable row storage. With row storage enabled, each row only 
requires one IO operation for point queries (such as `SELECT *`), significantly 
improving performance.
+To address the IOPS bottleneck in point query scenarios, starting from version 
2.0.0, Doris supports Hybrid Row-Columnar Storage. When users create tables, 
they can specify whether to enable row storage. With row storage enabled, each 
row only requires one IO operation for point queries (such as `SELECT *`), 
significantly improving performance.
 
 The principle of row storage is that an additional column is added during 
storage. This column concatenates all the columns of the corresponding row and 
stores them using a special binary format.
 
@@ -51,7 +51,8 @@ When creating a table, specify whether to enable row storage, 
which columns to e
 "row_store_page_size" = "16384"
 ```
 
-The page is the smallest unit of storage read/write operations, and page_size 
is the size of the row storage page. This means that reading one row also 
requires generating an IO for a page. The larger the value, the better the 
compression effect and the lower the storage space usage, but the higher the IO 
overhead for point queries (since one IO reads at least one page), and vice 
versa. The smaller the value, the higher the storage space, the better the 
point query performance. The defau [...]
+A page is the smallest unit for storage read and write operations, and 
`page_size` refers to the size of a row-store page. This means that reading a 
single row requires generating a page IO. The larger this value is, the better 
the compression effect and the lower the storage space usage. However, the IO 
overhead during point queries increases, resulting in lower performance 
(because each IO operation reads at least one page). Conversely, the smaller 
the value, the higher the storage spa [...]
+
 
 ## Example
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: add ngram desc (#1735)

Reply via email to