This is an automated email from the ASF dual-hosted git repository. zykkk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push: new 65d81c04e6 [Docs](inverted index) update docs for build index (#21184) 65d81c04e6 is described below commit 65d81c04e64e46e2883f0cc89843f599ae3091c6 Author: YueW <45946325+tany...@users.noreply.github.com> AuthorDate: Mon Jun 26 19:24:44 2023 +0800 [Docs](inverted index) update docs for build index (#21184) --- docs/en/docs/data-table/index/inverted-index.md | 63 ++++++++++++++++++++++ docs/zh-CN/docs/data-table/index/inverted-index.md | 63 ++++++++++++++++++++++ 2 files changed, 126 insertions(+) diff --git a/docs/en/docs/data-table/index/inverted-index.md b/docs/en/docs/data-table/index/inverted-index.md index 331fa90491..06102632f8 100644 --- a/docs/en/docs/data-table/index/inverted-index.md +++ b/docs/en/docs/data-table/index/inverted-index.md @@ -99,6 +99,8 @@ table_properties; ``` - add an inverted index to existed table + +**Before version 2.0-beta:** ```sql -- syntax 1 CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|chinese|unicode")] [COMMENT 'your comment']; @@ -106,6 +108,26 @@ CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("par ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|chinese|unicode")] [COMMENT 'your comment']; ``` +**After version 2.0-beta (including 2.0-beta):** + +The above 'create/add index' operation only generates inverted index for incremental data. The syntax of build index is added to add inverted index to stock data: +```sql +-- syntax 1, add inverted index to the stock data of the whole table by default +BUILD INDEX index_name ON table_name; +-- syntax 2, partition can be specified, and one or more can be specified +BUILD INDEX index_name ON table_name PARTITIONS(partition_name1, partition_name2); +``` +(**The above 'create/add index' operation needs to be executed before executing the build index**) + +To view the progress of the `build index`, you can use the following statement +```sql +show build index [FROM db_name]; +-- Example 1: Viewing the progress of all build index tasks +show build index; +-- Example 2: Viewing the progress of the build index task for a specified table +show build index where TableName = "table1"; +``` + - drop an inverted index ```sql -- syntax 1 @@ -327,6 +349,11 @@ mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00' mysql> CREATE INDEX idx_timestamp ON hackernews_1m(timestamp) USING INVERTED; Query OK, 0 rows affected (0.03 sec) ``` +**After 2.0-beta (including 2.0-beta), you need to execute `build index` to add inverted index to the stock data:** +```sql +mysql> BUILD INDEX idx_timestamp ON hackernews_1m; +Query OK, 0 rows affected (0.01 sec) +``` - progress of building index can be view by SQL. It just costs 1s (compare FinishTime and CreateTime) to build index for timestamp column with 1 million rows. ```sql @@ -339,6 +366,18 @@ mysql> SHOW ALTER TABLE COLUMN; 1 row in set (0.00 sec) ``` +**After 2.0-beta (including 2.0-beta), you can view the progress of stock data creating index by `show build index`:** +```sql +-- If the table has no partitions, the PartitionName defaults to TableName +mysql> show build index; ++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | ++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| 10191 | hackernews_1m | hackernews_1m | [ADD INDEX idx_timestamp (`timestamp`) USING INVERTED], | 2023-06-26 15:32:33.894 | 2023-06-26 15:32:34.847 | 3 | FINISHED | | NULL | ++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +1 row in set (0.04 sec) +``` + - after the index is build, Doris will automaticaly use index for range query, but the performance is almost the same since it's already fast on the small dataset ```sql mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00'; @@ -365,6 +404,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189; mysql> ALTER TABLE hackernews_1m ADD INDEX idx_parent(parent) USING INVERTED; Query OK, 0 rows affected (0.01 sec) +-- After 2.0-beta (including 2.0-beta), you need to execute `build index` to add inverted index to the stock data: +mysql> BUILD INDEX idx_parent ON hackernews_1m; +Query OK, 0 rows affected (0.01 sec) + mysql> SHOW ALTER TABLE COLUMN; +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ | JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout | @@ -373,6 +416,14 @@ mysql> SHOW ALTER TABLE COLUMN; | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 | hackernews_1m | 10054 | 10008 | 1:378856428 | 4 | FINISHED | | NULL | 2592000 | +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ +mysql> show build index; ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| 11005 | hackernews_1m | hackernews_1m | [ADD INDEX idx_parent (`parent`) USING INVERTED], | 2023-06-26 16:25:10.167 | 2023-06-26 16:25:10.838 | 1002 | FINISHED | | NULL | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +1 row in set (0.01 sec) + mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189; +---------+ | count() | @@ -396,6 +447,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster'; mysql> ALTER TABLE hackernews_1m ADD INDEX idx_author(author) USING INVERTED; Query OK, 0 rows affected (0.01 sec) +-- After 2.0-beta (including 2.0-beta), you need to execute `build index` to add inverted index to the stock data: +mysql> BUILD INDEX idx_author ON hackernews_1m; +Query OK, 0 rows affected (0.01 sec) + -- costs 1.5s to build index for author column with 1 million rows. mysql> SHOW ALTER TABLE COLUMN; +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ @@ -406,6 +461,14 @@ mysql> SHOW ALTER TABLE COLUMN; | 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 | hackernews_1m | 10077 | 10008 | 1:1335127701 | 5 | FINISHED | | NULL | 2592000 | +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ +mysql> show build index order by CreateTime desc limit 1; ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) USING INVERTED], | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004 | FINISHED | | NULL | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +1 row in set (0.01 sec) + -- equal qury on text field autor get 3x speedup mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster'; +---------+ diff --git a/docs/zh-CN/docs/data-table/index/inverted-index.md b/docs/zh-CN/docs/data-table/index/inverted-index.md index 15f7485d8e..e78ce537bc 100644 --- a/docs/zh-CN/docs/data-table/index/inverted-index.md +++ b/docs/zh-CN/docs/data-table/index/inverted-index.md @@ -97,6 +97,8 @@ table_properties; ``` - 已有表增加倒排索引 + +**2.0-beta版本之前:** ```sql -- 语法1 CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|unicode|chinese")] [COMMENT 'your comment']; @@ -104,6 +106,26 @@ CREATE INDEX idx_name ON table_name(column_name) USING INVERTED [PROPERTIES("par ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|unicode|chinese")] [COMMENT 'your comment']; ``` +**2.0-beta版本(含2.0-beta)之后:** + +上述`create/add index`操作只对增量数据生成倒排索引,增加了build index的语法用于对存量数据加倒排索引: +```sql +-- 语法1,默认给全表的存量数据加上倒排索引 +BUILD INDEX index_name ON table_name; +-- 语法2,可指定partition,可指定一个或多个 +BUILD INDEX index_name ON table_name PARTITIONS(partition_name1, partition_name2); +``` +(**在执行build index之前需要已经执行了以上`create/add index`的操作**) + +查看`build index`进展,可通过以下语句进行查看: +```sql +show build index [FROM db_name]; +-- 示例1,查看所有的build index任务进展 +show build index; +-- 示例2,查看指定table的build index任务进展 +show build index where TableName = "table1"; +``` + - 删除倒排索引 ```sql -- 语法1 @@ -324,6 +346,11 @@ mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00' -- CREATE INDEX 是第一种建索引的语法,另外一种在后面展示 mysql> CREATE INDEX idx_timestamp ON hackernews_1m(timestamp) USING INVERTED; Query OK, 0 rows affected (0.03 sec) +``` + **2.0-beta(含2.0-beta)后,需要再执行`build index`才能给存量数据加上倒排索引:** +```sql +mysql> BUILD INDEX idx_timestamp ON hackernews_1m; +Query OK, 0 rows affected (0.01 sec) ``` - 查看索引创建进度,通过FinishTime和CreateTime的差值,可以看到100万条数据对timestamp列建倒排索引只用了1s @@ -337,6 +364,18 @@ mysql> SHOW ALTER TABLE COLUMN; 1 row in set (0.00 sec) ``` +**2.0-beta(含2.0-beta)后,可通过`show builde index`来查看存量数据创建索引进展:** +```sql +-- 若table没有分区,PartitionName默认就是TableName +mysql> show build index; ++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | ++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| 10191 | hackernews_1m | hackernews_1m | [ADD INDEX idx_timestamp (`timestamp`) USING INVERTED], | 2023-06-26 15:32:33.894 | 2023-06-26 15:32:34.847 | 3 | FINISHED | | NULL | ++-------+---------------+---------------+----------------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +1 row in set (0.04 sec) +``` + - 索引创建后,范围查询用同样的查询方式,Doris会自动识别索引进行优化,但是这里由于数据量小性能差别不大 ```sql mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00'; @@ -363,6 +402,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189; mysql> ALTER TABLE hackernews_1m ADD INDEX idx_parent(parent) USING INVERTED; Query OK, 0 rows affected (0.01 sec) +-- 2.0-beta(含2.0-beta)后,需要再执行build index才能给存量数据加上倒排索引: +mysql> BUILD INDEX idx_parent ON hackernews_1m; +Query OK, 0 rows affected (0.01 sec) + mysql> SHOW ALTER TABLE COLUMN; +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ | JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout | @@ -371,6 +414,14 @@ mysql> SHOW ALTER TABLE COLUMN; | 10053 | hackernews_1m | 2023-02-10 19:49:32.893 | 2023-02-10 19:49:33.982 | hackernews_1m | 10054 | 10008 | 1:378856428 | 4 | FINISHED | | NULL | 2592000 | +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ +mysql> show build index; ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| 11005 | hackernews_1m | hackernews_1m | [ADD INDEX idx_parent (`parent`) USING INVERTED], | 2023-06-26 16:25:10.167 | 2023-06-26 16:25:10.838 | 1002 | FINISHED | | NULL | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +1 row in set (0.01 sec) + mysql> SELECT count() FROM hackernews_1m WHERE parent = 11189; +---------+ | count() | @@ -394,6 +445,10 @@ mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster'; mysql> ALTER TABLE hackernews_1m ADD INDEX idx_author(author) USING INVERTED; Query OK, 0 rows affected (0.01 sec) +-- 2.0-beta(含2.0-beta)后,需要再执行build index才能给存量数据加上倒排索引: +mysql> BUILD INDEX idx_author ON hackernews_1m; +Query OK, 0 rows affected (0.01 sec) + -- 100万条author数据增量建索引仅消耗1.5s mysql> SHOW ALTER TABLE COLUMN; +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ @@ -404,6 +459,14 @@ mysql> SHOW ALTER TABLE COLUMN; | 10076 | hackernews_1m | 2023-02-10 19:54:20.046 | 2023-02-10 19:54:21.521 | hackernews_1m | 10077 | 10008 | 1:1335127701 | 5 | FINISHED | | NULL | 2592000 | +-------+---------------+-------------------------+-------------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+ +mysql> show build index order by CreateTime desc limit 1; ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| JobId | TableName | PartitionName | AlterInvertedIndexes | CreateTime | FinishTime | TransactionId | State | Msg | Progress | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +| 13006 | hackernews_1m | hackernews_1m | [ADD INDEX idx_author (`author`) USING INVERTED], | 2023-06-26 17:23:02.610 | 2023-06-26 17:23:03.755 | 3004 | FINISHED | | NULL | ++-------+---------------+---------------+----------------------------------------------------+-------------------------+-------------------------+---------------+----------+------+----------+ +1 row in set (0.01 sec) + -- 创建索引后,字符串等值匹配也有明显加速 mysql> SELECT count() FROM hackernews_1m WHERE author = 'faster'; +---------+ --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org