[doris] branch master updated: [doc](point query) modify and refine docs (#16735)

jianliangqi Thu, 16 Feb 2023 01:36:44 -0800

This is an automated email from the ASF dual-hosted git repository.

jianliangqi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 262a2ea10d [doc](point query) modify and refine docs (#16735)
262a2ea10d is described below

commit 262a2ea10d38d2831ef9bb9eb9600bd10a691f37
Author: lihangyu <15605149...@163.com>
AuthorDate: Thu Feb 16 17:36:32 2023 +0800

    [doc](point query) modify and refine docs (#16735)
---
 .../docs/advanced/hight-concurrent-point-query.md  | 91 ++++++++++++++++++++++
 docs/en/docs/data-table/best-practice.md           | 60 +-------------
 docs/sidebars.json                                 |  3 +-
 .../docs/advanced/hight-concurrent-point-query.md  | 87 +++++++++++++++++++++
 docs/zh-CN/docs/data-table/best-practice.md        | 56 +------------
 5 files changed, 182 insertions(+), 115 deletions(-)

diff --git a/docs/en/docs/advanced/hight-concurrent-point-query.md 
b/docs/en/docs/advanced/hight-concurrent-point-query.md
new file mode 100644
index 0000000000..62c35cf820
--- /dev/null
+++ b/docs/en/docs/advanced/hight-concurrent-point-query.md
@@ -0,0 +1,91 @@
+--- 
+{
+    "title": "High-concurrency point query",
+    "language": "en"
+}
+--- 
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+  
+# High-concurrency point query based on primary key
+
+<version since="2.0.0">
+</version>
+
+## Background 
+Doris is built on a columnar storage format engine. In high-concurrency 
service scenarios, users always want to retrieve entire rows of data from the 
system. However, when tables are wide, the columnar format greatly amplifies 
random read IO. Doris query engine and planner are too heavy for some simple 
queries, such as point queries. A short path needs to be planned in the FE's 
query plan to handle such queries. FE is the access layer service for SQL 
queries, written in Java. Parsing and [...]
+
+## Row Store format
+We support row format for olap table to reduce point lookup io cost, but to 
enable this format you need to spend more disk space for row format 
store.Currently we store row in an extra column called `row column` for 
simplicity.Row store is disabled by default, users can enable it by adding the 
following property when create table
+```
+"store_row_column" = "true"
+```
+
+## Accelerate point query for merge-on-write model
+As we provided row store format , we could use such store format to speed up 
point query performance for merge-on-write model.For point query on primary 
keys when `enable_unique_key_merge_on_write` enabled, planner will optimize 
such query and execute in a short path in a light weight RPC interface.Bellow 
is an example of point query with row store on merge-on-write model:
+```
+CREATE TABLE `tbl_point_query` (
+  `key` int(11) NULL,
+  `v1` decimal(27, 9) NULL,
+  `v2` varchar(30) NULL,
+  `v3` varchar(30) NULL,
+  `v4` date NULL,
+  `v5` datetime NULL,
+  `v6` float NULL,
+  `v7` datev2 NULL
+) ENGINE=OLAP
+UNIQUE KEY(key)
+COMMENT 'OLAP'
+DISTRIBUTED BY HASH(key) BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1",
+"enable_unique_key_merge_on_write" = "true",
+"light_schema_change" = "true",
+"store_row_column" = "true"
+);
+```
+[NOTE]
+1. `enable_unique_key_merge_on_write` should be enabled, since we need primary 
key for quick point lookup in storage engine
+2. when condition only contains primary key like `select * from 
tbl_point_query where key = 123`, such query will go through the short fast path
+3. `light_schema_change` should also been enabled since we rely `column unique 
id` of each columns when doing point query.
+
+## Using `PreparedStatement`
+In order to reduce CPU cost for parsing query SQL and SQL expressions, we 
provide `PreparedStatement` feature in FE fully compatible with mysql protocol 
(currently only support point queries like above mentioned).Enable it will pre 
caculate PreparedStatement SQL and expresions and caches it in a session level 
memory buffer and will be reused later on.We could improve 4x+ performance by 
using `PreparedStatement` when CPU became hotspot doing such queries.Bellow is 
an JDBC example of using [...]
+
+1. Setup JDBC url and enable server side prepared statement
+```
+url = jdbc:mysql://127.0.0.1:9137/ycsb?useServerPrepStmts=true
+``
+
+2. Using `PreparedStatement`
+```java
+// use `?` for placement holders, readStatement should be reused
+PreparedStatement readStatement = conn.prepareStatement("select * from 
tbl_point_query where key = ?");
+...
+readStatement.setInt(1234);
+ResultSet resultSet = readStatement.executeQuery();
+...
+readStatement.setInt(1235);
+resultSet = readStatement.executeQuery();
+...
+```
+
+
+
diff --git a/docs/en/docs/data-table/best-practice.md 
b/docs/en/docs/data-table/best-practice.md
index b0711bc15f..5f531cfb0b 100644
--- a/docs/en/docs/data-table/best-practice.md
+++ b/docs/en/docs/data-table/best-practice.md
@@ -180,62 +180,4 @@ Users can modify the Schema of an existing table through 
the Schema Change opera
 - Adding or modifying Bloom Filter
 - Adding or removing bitmap index
 
-For details, please refer to [Schema Change](
-
-## Row Store format
-We support row format for olap table to reduce point lookup io cost, but to 
enable this format you need to spend more disk space for row format 
store.Currently we store row in an extra column called `row column` for 
simplicity.Row store is disabled by default, users can enable it by adding the 
following property when create table
-```
-"store_row_column" = "true"
-```
-
-## Accelerate point query for merge-on-write model
-As we provided row store format , we could use such store format to speed up 
point query performance for merge-on-write model.For point query on primary 
keys when `enable_unique_key_merge_on_write` enabled, planner will optimize 
such query and execute in a short path in a light weight RPC interface.Bellow 
is an example of point query with row store on merge-on-write model:
-```
-CREATE TABLE `tbl_point_query` (
-  `key` int(11) NULL,
-  `v1` decimal(27, 9) NULL,
-  `v2` varchar(30) NULL,
-  `v3` varchar(30) NULL,
-  `v4` date NULL,
-  `v5` datetime NULL,
-  `v6` float NULL,
-  `v7` datev2 NULL
-) ENGINE=OLAP
-UNIQUE KEY(key)
-COMMENT 'OLAP'
-DISTRIBUTED BY HASH(key) BUCKETS 1
-PROPERTIES (
-"replication_allocation" = "tag.location.default: 1",
-"enable_unique_key_merge_on_write" = "true",
-"light_schema_change" = "true",
-"store_row_column" = "true"
-);
-```
-[NOTE]
-1. `enable_unique_key_merge_on_write` should be enabled, since we need primary 
key for quick point lookup in storage engine
-2. when condition only contains primary key like `select * from 
tbl_point_query where key = 123`, such query will go through the short fast path
-3. `light_schema_change` should also been enabled since we rely `column unique 
id` of each columns when doing point query.
-
-### Using `PreparedStatement`
-In order to reduce CPU cost for parsing query SQL and SQL expressions, we 
provide `PreparedStatement` feature in FE fully compatible with mysql protocol 
(currently only support point queries like above mentioned).Enable it will pre 
caculate PreparedStatement SQL and expresions and caches it in a session level 
memory buffer and will be reused later on.We could improve 4x+ performance by 
using `PreparedStatement` when CPU became hotspot doing such queries.Bellow is 
an JDBC example of using [...]
-
-1. Setup JDBC url and enable server side prepared statement
-```
-url = jdbc:mysql://127.0.0.1:9137/ycsb?useServerPrepStmts=true
-``
-
-2. Using `PreparedStatement`
-```java
-// use `?` for placement holders, readStatement should be reused
-PreparedStatement readStatement = conn.prepareStatement("select * from 
tbl_point_query where key = ?");
-...
-readStatement.setInt(1234);
-ResultSet resultSet = readStatement.executeQuery();
-...
-readStatement.setInt(1235);
-resultSet = readStatement.executeQuery();
-...
-```
-
-
-
+For details, please refer to [Schema 
Change](../advanced/alter-table/schema-change)
\ No newline at end of file
diff --git a/docs/sidebars.json b/docs/sidebars.json
index c8fe7d14f2..5495562665 100644
--- a/docs/sidebars.json
+++ b/docs/sidebars.json
@@ -179,7 +179,8 @@
                 "advanced/time-zone",
                 "advanced/small-file-mgr",
                 "advanced/cold_hot_separation",
-                "advanced/compute_node"
+                "advanced/compute_node",
+                "advanced/hight-concurrent-point-query"
             ]
         },
         {
diff --git a/docs/zh-CN/docs/advanced/hight-concurrent-point-query.md 
b/docs/zh-CN/docs/advanced/hight-concurrent-point-query.md
new file mode 100644
index 0000000000..6b51a66047
--- /dev/null
+++ b/docs/zh-CN/docs/advanced/hight-concurrent-point-query.md
@@ -0,0 +1,87 @@
+---
+{
+    "title": "高并发点查",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# 基于主键的高并发点查询
+
+<version since="1.2.1">
+</version>
+
+## 背景 
+Doris 基于列存格式引擎构建，在高并发服务场景中，用户总是希望从系统中获取整行数据。但是，当表宽时，列存格式将大大放大随机读取 IO。Doris 
查询引擎和计划对于某些简单的查询（如点查询）来说太重了。需要一个在FE的查询规划中规划短路径来处理这样的查询。FE 是 SQL 查询的访问层服务，使用 
Java 编写，分析和解析 SQL 也会导致高并发查询的高 CPU 
开销。为了解决上诉问题，我们在Doris中引入了行存、短查询路径、PreparedStatment来解决上诉问题， 下面是开启这些优化的指南。
+
+## 行存
+用户可以在Olap表中开启行存模式，但是需要额外的空建来存储行存。目前的行存实现是将行存编码后存在单独的一列中，这样做是用于简化行存的实现。行存模式默认是关闭的，如果您想开启则可以在建表语句的property中指定如下属性
+```
+"store_row_column" = "true"
+```
+
+## 在merge-on-write模型下的点查优化
+上诉的行存用于在merge-on-write模型下减少点查时的IO开销。当`enable_unique_key_merge_on_write`在建表开启时，对于主键的点查会走短路径来对SQL执行进行优化，仅需要执行一次RPC即可执行完成查询。下面是点查结合行存在merge-on-write下的一个例子:
+```
+CREATE TABLE `tbl_point_query` (
+  `key` int(11) NULL,
+  `v1` decimal(27, 9) NULL,
+  `v2` varchar(30) NULL,
+  `v3` varchar(30) NULL,
+  `v4` date NULL,
+  `v5` datetime NULL,
+  `v6` float NULL,
+  `v7` datev2 NULL
+) ENGINE=OLAP
+UNIQUE KEY(key)
+COMMENT 'OLAP'
+DISTRIBUTED BY HASH(key) BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1",
+"enable_unique_key_merge_on_write" = "true",
+"light_schema_change" = "true",
+"store_row_column" = "true"
+);
+```
+[NOTE]
+1. `enable_unique_key_merge_on_write`应该被开启， 存储引擎需要根据主键来快速点查
+2. 当条件只包含主键时，如 `select * from tbl_point_query where key = 123`，类似的查询会走短路径来优化查询
+3. `light_schema_change` 应该被开启， 因为主键点查的优化依赖了轻量级schema change中的`column unique 
id`来定位列
+
+## 使用 `PreparedStatement`
+为了减少SQL解析和表达式计算的开销， 
我们在FE端提供了与mysql协议完全兼容的`PreparedStatement`特性（目前只支持主键点查）。当`PreparedStatement`在FE开启，SQL和其表达式将被提前计算并缓存到session级别的内存缓存中，后续的查询直接使用缓存对象即可。当CPU成为主键点查的瓶颈，
 在开启`PreparedStatement`后，将会有4倍+的性能提升。下面是在JDBC中使用`PreparedStatement`的例子
+1. 设置JDB url并在server端开启prepared statement
+```
+url = jdbc:mysql://127.0.0.1:9137/ycsb?useServerPrepStmts=true
+``
+
+2. 使用 `PreparedStatement`
+```java
+// use `?` for placement holders, readStatement should be reused
+PreparedStatement readStatement = conn.prepareStatement("select * from 
tbl_point_query where key = ?");
+...
+readStatement.setInt(1234);
+ResultSet resultSet = readStatement.executeQuery();
+...
+readStatement.setInt(1235);
+resultSet = readStatement.executeQuery();
+...
+```
diff --git a/docs/zh-CN/docs/data-table/best-practice.md 
b/docs/zh-CN/docs/data-table/best-practice.md
index eed0a125e9..4d63b7ef57 100644
--- a/docs/zh-CN/docs/data-table/best-practice.md
+++ b/docs/zh-CN/docs/data-table/best-practice.md
@@ -179,58 +179,4 @@ ALTER TABLE session_data ADD ROLLUP 
rollup_brower(brower,province,ip,url) DUPLIC
 - 增加、修改 Bloom Filter
 - 增加、删除 bitmap index
 
-具体请参照 [Schema 变更](../advanced/alter-table/schema-change)
-
-## 行存
-用户可以在Olap表中开启行存模式，但是需要额外的空建来存储行存。目前的行存实现是将行存编码后存在单独的一列中，这样做是用于简化行存的实现。行存模式默认是关闭的，如果您想开启则可以在建表语句的property中指定如下属性
-```
-"store_row_column" = "true"
-```
-
-## 在merge-on-write模型下的点查优化
-上诉的行存用于在merge-on-write模型下减少点查时的IO开销。当`enable_unique_key_merge_on_write`在建表开启时，对于主键的点查会走短路径来对SQL执行进行优化，仅需要执行一次RPC即可执行完成查询。下面是点查结合行存在merge-on-write下的一个例子:
-```
-CREATE TABLE `tbl_point_query` (
-  `key` int(11) NULL,
-  `v1` decimal(27, 9) NULL,
-  `v2` varchar(30) NULL,
-  `v3` varchar(30) NULL,
-  `v4` date NULL,
-  `v5` datetime NULL,
-  `v6` float NULL,
-  `v7` datev2 NULL
-) ENGINE=OLAP
-UNIQUE KEY(key)
-COMMENT 'OLAP'
-DISTRIBUTED BY HASH(key) BUCKETS 1
-PROPERTIES (
-"replication_allocation" = "tag.location.default: 1",
-"enable_unique_key_merge_on_write" = "true",
-"light_schema_change" = "true",
-"store_row_column" = "true"
-);
-```
-[NOTE]
-1. `enable_unique_key_merge_on_write`应该被开启， 存储引擎需要根据主键来快速点查
-2. 当条件只包含主键时，如 `select * from tbl_point_query where key = 123`，类似的查询会走短路径来优化查询
-3. `light_schema_change` 应该被开启， 因为主键点查的优化依赖了轻量级schema change中的`column unique 
id`来定位列
-
-### 使用 `PreparedStatement`
-为了减少SQL解析和表达式计算的开销， 
我们在FE端提供了与mysql协议完全兼容的`PreparedStatement`特性（目前只支持主键点查）。当`PreparedStatement`在FE开启，SQL和其表达式将被提前计算并缓存到session级别的内存缓存中，后续的查询直接使用缓存对象即可。当CPU成为主键点查的瓶颈，
 在开启`PreparedStatement`后，将会有4倍+的性能提升。下面是在JDBC中使用`PreparedStatement`的例子
-1. 设置JDB url并在server端开启prepared statement
-```
-url = jdbc:mysql://127.0.0.1:9137/ycsb?useServerPrepStmts=true
-``
-
-2. 使用 `PreparedStatement`
-```java
-// use `?` for placement holders, readStatement should be reused
-PreparedStatement readStatement = conn.prepareStatement("select * from 
tbl_point_query where key = ?");
-...
-readStatement.setInt(1234);
-ResultSet resultSet = readStatement.executeQuery();
-...
-readStatement.setInt(1235);
-resultSet = readStatement.executeQuery();
-...
-```
+具体请参照 [Schema 变更](../advanced/alter-table/schema-change)
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[doris] branch master updated: [doc](point query) modify and refine docs (#16735)

Reply via email to