(doris-website) branch master updated: [docs]Update En docs (#440)

luzhijing Wed, 20 Mar 2024 07:45:47 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new b6ecdcd0cf19 [docs]Update En docs (#440)
b6ecdcd0cf19 is described below

commit b6ecdcd0cf19fa52464b4d7c4d2a3b74cf5a2eb5
Author: Luzhijing <[email protected]>
AuthorDate: Wed Mar 20 22:43:34 2024 +0800

    [docs]Update En docs (#440)
---
 .../version-2.0/query/query-data/select.md         |   2 +-
 .../version-2.0/query/query-data/select.md         | 331 ++++++++++++++++++++-
 .../version-2.0/query/query-data/subquery.md       |  21 +-
 .../query/query-data/window-function.md            |  82 ++++-
 .../query/view-materialized-view/create-view.md    |  57 +++-
 .../version-2.0/table-design/index/bitmap-index.md |  59 +++-
 .../version-2.0/table-design/index/prefix-index.md |  39 ++-
 7 files changed, 584 insertions(+), 7 deletions(-)

diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/query-data/select.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/query-data/select.md
index 2b60ba59934c..d5e1d9fbac92 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/query-data/select.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/query/query-data/select.md
@@ -330,7 +330,7 @@ SELECT * FROM t1 TABLET(10001) TABLESAMPLE(1000 ROWS) 
REPEATABLE 2 limit 1000;
   
   - FROM 后的 table_references 指示参与查询的一个或者多个表。如果列出了多个表，就会执行 JOIN 
操作。而对于每一个指定表，都可以为其定义别名
   
-  - SELECT 后被选择的列，可以在 ORDER IN 和 GROUP BY 中，通过列名、列别名或者代表列位置的整数（从 1 开始）来引用
+  - SELECT 后被选择的列，可以在 ORDER BY 和 GROUP BY 中，通过列名、列别名或者代表列位置的整数（从 1 开始）来引用
   
     ```Plain
     SELECT college, region, seed FROM tournament
diff --git a/versioned_docs/version-2.0/query/query-data/select.md 
b/versioned_docs/version-2.0/query/query-data/select.md
index 7b4297ee12ae..db96b2cfe881 100644
--- a/versioned_docs/version-2.0/query/query-data/select.md
+++ b/versioned_docs/version-2.0/query/query-data/select.md
@@ -24,4 +24,333 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
+## Select
+
+Select syntax
+
+```
+SELECT
+    [hint_statement, ...]
+    [ALL | DISTINCT | DISTINCTROW | ALL EXCEPT ( col_name1 [, col_name2, 
col_name3, ...] )]
+    select_expr [, select_expr ...]
+    [FROM table_references
+      [PARTITION partition_list]
+      [TABLET tabletid_list]
+      [TABLESAMPLE sample_value [ROWS | PERCENT]
+        [REPEATABLE pos_seek]]
+    [WHERE where_condition]
+    [GROUP BY [GROUPING SETS | ROLLUP | CUBE] {col_name | expr | position}]
+    [HAVING where_condition]
+    [ORDER BY {col_name | expr | position}
+      [ASC | DESC], ...]
+    [LIMIT {[offset,] row_count | row_count OFFSET offset}]
+    [INTO OUTFILE 'file_name']
+```
+
+## Syntax explanation
+
+- `select_expr, ...`Specifies the columns to retrieve and display in the 
result set. Aliases can be used, and `as` is optional.
+- `table_references`Specifies the target tables for retrieval, which can be 
one or more tables (including temporary tables generated by subqueries).
+- `where_definition`Specifies the retrieval conditions (expressions). If a 
WHERE clause exists, its conditions filter the row data. The `where_condition` 
is an expression that evaluates to true for each row to be selected. If there 
is no WHERE clause, the statement selects all rows. In a WHERE expression, you 
can use any MySQL-supported functions and operators except aggregate functions.
+- `ALL | DISTINCT`Filters the result set. `ALL` selects all rows, while 
`DISTINCT` or `DISTINCTROW` filters out duplicate rows. The default is `ALL`.
+- `ALL EXCEPT`Filters the result set from `ALL` by specifying one or more 
column names to exclude from the full result set. All matching column names 
will be ignored in the output.
+- `INTO OUTFILE 'file_name'`Saves the result set to a new file (which must not 
exist beforehand), with differences in the saved format.
+- `Group by having`Groups the result set by one or more columns. If `HAVING` 
is present, it filters the groups produced by `GROUP BY`. Extensions to `GROUP 
BY` such as `GROUPING SETS`, `ROLLUP`, and `CUBE` are available and detailed in 
the [GROUPING 
SETS](../../../../docusaurus-plugin-content-docs-community/current/design/grouping_sets_design.md).
+- `Order by`Sorts the final result set. `ORDER BY` sorts the result set by 
comparing values in one or more columns. Sorting operations can be 
time-consuming and resource-intensive because all data needs to be sent to a 
single node for sorting. Sorting requires more memory compared to non-sorted 
operations. If you need to return the top N sorted results, use the `LIMIT` 
clause. 
+- `Limit n`Limits the number of rows in the output result set. `LIMIT m,n` 
means to start outputting from the mth row and return n records. Using `LIMIT 
m,n` is meaningful only when combined with `ORDER BY`, otherwise the data 
returned may be inconsistent each time the query is executed.
+- `Having`The `HAVING` clause does not filter rows in the table but filters 
the results produced by aggregate functions. Typically, `HAVING` is used with 
aggregate functions (such as `COUNT()`, `SUM()`, `AVG()`, `MIN()`, `MAX()`) and 
the `GROUP BY` clause.
+- `SELECT` supports explicit partition selection using `PARTITION`, which 
includes a list of partitions or subpartitions (or both) following the table 
name in `table_reference`.
+- `[TABLET tids] TABLESAMPLE n [ROWS | PERCENT] [REPEATABLE seek]`Limits the 
number of rows read from a table in the `FROM` clause by pseudo-randomly 
selecting several tablets based on the specified number of rows or percentage. 
`REPEATABLE` with a specified seed allows the same sample to be returned again. 
Alternatively, Tablet IDs can be manually specified, but this is only 
applicable to OLAP tables.
+- `hint_statement`Using hints before the select list can influence the 
optimizer's behavior to obtain a desired execution plan. For more information, 
refer to the [joinHint Document](../join-optimization/join-hint).
+
+## Syntax constraints
+
+- `SELECT` can also be used to retrieve calculated rows without referencing 
any tables.
+- All clauses must strictly follow the above format. A `HAVING` clause must 
come after the `GROUP BY` clause and before the `ORDER BY` clause.
+- The alias keyword `AS` is optional. Aliases can be used in `GROUP BY`, 
`ORDER BY,` and `HAVING`.
+- `WHERE` clause: Executes the `WHERE` statement to determine which rows 
should be included in the `GROUP BY` section, while `HAVING` is used to 
determine which rows from the result set should be used.
+- The `HAVING` clause can reference aggregate functions, such as `count, sum, 
max, min, avg`, while the `WHERE` clause cannot. However, the `WHERE` clause 
can reference other functions besides aggregate functions. Column aliases 
cannot be used in the `WHERE` clause to define conditions.
+- Following `GROUP BY` with `WITH ROLLUP` allows for one or more aggregations 
of the results.
+
+## Join syntax
+
+Doris supports the following JOIN syntax.
+
+```
+JOIN
+table_references:
+    table_reference [, table_reference] …
+table_reference:
+    table_factor
+  | join_table
+table_factor:
+    tbl_name [[AS] alias]
+        [{USE|IGNORE|FORCE} INDEX (key_list)]
+  | ( table_references )
+  | { OJ table_reference LEFT OUTER JOIN table_reference
+        ON conditional_expr }
+join_table:
+    table_reference [INNER | CROSS] JOIN table_factor [join_condition]
+  | table_reference LEFT [OUTER] JOIN table_reference join_condition
+  | table_reference NATURAL [LEFT [OUTER]] JOIN table_factor
+  | table_reference RIGHT [OUTER] JOIN table_reference join_condition
+  | table_reference NATURAL [RIGHT [OUTER]] JOIN table_factor
+join_condition:
+    ON conditional_expr
+```
+
+UNION:
+
+```
+SELECT ...
+UNION [ALL| DISTINCT] SELECT ......
+[UNION [ALL| DISTINCT] SELECT ...]
+```
+
+`UNION` is used to combine the results of multiple `SELECT` statements into a 
single result set. The column names from the first `SELECT` statement are used 
as the column names for the returned result. The selected columns listed in the 
corresponding positions of each `SELECT` statement should have the same data 
type. (For example, the first column selected in the first statement should 
have the same type as the first column selected in the other statements.)
+
+By default, `UNION` removes duplicate rows from the result. The optional 
`DISTINCT` keyword has no effect beyond the default, as it also specifies 
duplicate row removal. Using the optional `ALL` keyword, no duplicate row 
removal occurs, and the result includes all matching rows from all `SELECT` 
statements.
+
+WITH:
+
+To specify a common table expression, use a `WITH` clause with one or more 
comma-separated subclauses. Each subclause provides a subquery that generates a 
result set and associates a name with the subquery. The following example 
defines CTEs named `cte1` and `cte2` in the `WITH` clause, and refers to them 
in the top-level `SELECT `following the WITH clause.
+
+```
+WITH
+  cte1 AS (SELECT a，b FROM table1),
+  cte2 AS (SELECT c，d FROM table2)
+SELECT b，d FROM cte1 JOIN cte2
+WHERE cte1.a = cte2.c;
+```
+
+In statements that include this `WITH` clause, each CTE name can be referenced 
to access the corresponding CTE result set. CTE names can be referenced in 
other CTEs, allowing CTEs to be defined based on other CTEs. Currently, 
recursive CTEs are not supported.
+
+## Example
+
+- Query the names of students whose ages are 18, 20, and 25.
+
+```
+select Name from student where age in (18,20,25);
+```
+
+- ALL EXCEPT 
+
+```
+-- Query all information except for the age of the students.
+select * except(age) from student; 
+```
+
+- GROUP BY 
+
+```
+--Query the tb_book table, group by type, and calculate the average price for 
each category of books.
+select type,avg(price) from tb_book group by type;
+```
+
+- DISTINCT 
+
+```
+--Query the tb_book table and remove duplicate type data.
+select distinct type from tb_book;
+```
+
+- ORDER BY 
+
+Sort the query results in ascending order (by default) or descending order 
(DESC). In ascending order, NULL values should appear at the beginning, and in 
descending order, NULL values should appear at the end.
+
+```
+--Query all records from the tb_book table, sort them in descending order by 
id, and display only the first three records.
+select * from tb_book order by id desc limit 3;
+```
+
+- LIKE 
+
+`LIKE` can perform fuzzy queries with two wildcards: `%` and `_`. The `%` 
wildcard matches one or more characters, while the `_` wildcard matches a 
single character.
+
+```
+-- Find all books where the second character is 'h'.
+select * from tb_book where name like('_h%');
+```
+
+- LIMIT (Limit the number of result rows.)
+
+```
+-- Display 3 records in descending order.
+select * from tb_book order by price desc limit 3;
+
+Display 4 records starting from id=1
+select * from tb_book where id limit 1,4;
+```
+
+- CONCAT (Concatenate multiple columns
+
+```
+--Concatenate 'name' and 'price' into a new string for output.
+select id,concat(name,":",price) as info,type from tb_book;
+```
+
+- Functions and expressions
+
+```
+--Calculate the total price of each category of books in the tb_book table.
+select sum(price) as total,type from tb_book group by type;
+--20% off the price
+select *,(price * 0.8) as "20% off" from tb_book;
+```
+
+- UNION 
+
+```
+SELECT a FROM t1 WHERE a = 10 AND B = 1 ORDER by a LIMIT 10
+UNION
+SELECT a FROM t2 WHERE a = 11 AND B = 2 ORDER by a LIMIT 10;
+```
+
+- WITH clause
+
+```
+WITH cte AS
+(
+  SELECT 1 AS col1, 2 AS col2
+  UNION ALL
+  SELECT 3, 4
+)
+SELECT col1, col2 FROM cte;
+```
+
+- JOIN 
+
+```
+SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
+                 ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)
+```
+
+the same as
+
+```
+SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
+                 ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)
+```
+
+- INNER JOIN
+
+```
+SELECT t1.name, t2.salary
+  FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;
+
+SELECT t1.name, t2.salary
+  FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;
+```
+
+- LEFT JOIN
+
+```
+SELECT left_tbl.*
+  FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id
+  WHERE right_tbl.id IS NULL;
+```
+
+- RIGHT JOIN
+
+```
+mysql SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a);
++------+------+------+------+
+| a    | b    | a    | c    |
++------+------+------+------+
+|    2 | y    |    2 | z    |
+| NULL | NULL |    3 | w    |
++------+------+------+------+
+```
+
+- TABLESAMPLE
+
+```
+--Randomly sample 1000 rows in t1 using pseudo-random method. Note that the 
actual process is to select several Tablets based on the statistical 
information of the table, and the total number of rows in the selected Tablets 
may be greater than 1000. Therefore, if you want to return exactly 1000 rows, 
you need to add a Limit clause.
+SELECT * FROM t1 TABLET(10001) TABLESAMPLE(1000 ROWS) REPEATABLE 2 limit 1000;
+```
+
+## Best practice
+
+- Additional explanation about the SELECT clause:
+  - An alias can be specified for `select_expr` using `AS alias_name`. The 
alias serves as the column name for the expression and can be used in `GROUP 
BY`, `ORDER BY`, or `HAVING` clauses.
+  - The `table_references` after `FROM` indicate one or multiple tables 
involved in the query. If multiple tables are listed, a `JOIN` operation will 
be performed. Each specified table can be assigned an alias.
+  - The selected columns after `SELECT` can be referenced in `ORDER BY` and 
`GROUP BY` clauses using column names, column aliases, or integers representing 
the column position (starting from 1).
+
+```
+SELECT college, region, seed FROM tournament
+  ORDER BY region, seed;
+
+SELECT college, region AS r, seed AS s FROM tournament
+  ORDER BY r, s;
+
+SELECT college, region, seed FROM tournament
+  ORDER BY 2, 3;
+```
+
+- If `ORDER BY` appears in a subquery and is also applied to the outer query, 
the outermost `ORDER BY` takes precedence.
+- When using `GROUP BY`, the grouped columns are automatically sorted in 
ascending order (as if an `ORDER BY` clause followed with the same columns). To 
avoid the overhead caused by the automatic sorting of `GROUP BY`, adding `ORDER 
BY NULL` can solve the issue:
+
+```
+SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
+```
+
+- When sorting columns in a `SELECT` statement using `ORDER BY` or `GROUP BY`, 
the server only sorts the values using the initial number of bytes indicated by 
the `max_sort_length` system variable.
+- The `HAVING` clause is typically applied at the end, just before the result 
set is returned to the client, and it is not optimized (whereas `LIMIT` is 
applied after `HAVING`).
+- According to the SQL standard, `HAVING` must reference columns that are 
either in the `GROUP BY` list or used in aggregate functions. However, MySQL 
extends this by allowing `HAVING` to reference columns from the `SELECT` clause 
list and columns from outer subqueries.
+- If a column referenced in `HAVING` is ambiguous, a warning will be 
generated. In the following statement, `col2` is ambiguous:
+
+```
+SELECT COUNT(col1) AS col2 FROM t GROUP BY col2 HAVING col2 = 2;
+```
+
+Do not use `HAVING` where `WHERE` should be used. `HAVING` is intended to be 
used with `GROUP BY`.
+
+The `HAVING` clause can reference aggregate functions, whereas `WHERE` cannot.
+
+```
+SELECT user, MAX(salary) FROM users
+  GROUP BY user HAVING MAX(salary)  10;
+```
+
+- The `LIMIT` clause can be used to restrict the number of rows returned by a 
`SELECT` statement. `LIMIT` can have one or two parameters, both of which must 
be non-negative integers.
+
+```
+-- Retrieve rows 6 to 15 from the result set.
+SELECT * FROM tbl LIMIT 5,10;
+-- If you want to retrieve all rows starting from a certain offset, you can 
set a very large constant as the second parameter. The following query 
retrieves all data starting from the 96th row:
+SELECT * FROM tbl LIMIT 95,18446744073709551615;
+-- If LIMIT has only one parameter, then the parameter specifies the number of 
rows that should be retrieved, and the offset is defaulted to 0, which means 
starting from the first row.
+```
+
+- `SELECT...INTO` allows the query results to be written to a file.
+- Modifiers for the `SELECT` keyword:
+  - Primarily used for removing duplicates.
+  - The `ALL` and `DISTINCT` modifiers specify whether to remove duplicate 
rows (not a specific column) from the result set.
+  - `ALL` is the default modifier, meaning all rows that meet the criteria 
will be retrieved.
+  - `DISTINCT `removes duplicate rows.
+- Key advantages of subqueries:
+  - Subqueries enable structured queries, allowing each part of a statement to 
be isolated.
+  - Some operations require complex joins and associations. Subqueries provide 
alternative methods to perform these operations.
+- Accelerating queries:
+  - Utilize Doris's partitioning and bucketing as data filtering conditions to 
reduce the data scanning range as much as possible.
+  - Make full use of Doris's prefix index fields as data filtering conditions 
to speed up query performance.
+- UNION:Using only the `union` keyword has the same effect as using `union 
distinct`. Since deduplication can be memory-intensive, using `union all` for 
queries can result in faster performance and reduced memory consumption. If 
users want to perform `order by` and `limit` operations on the returned result 
set, they should place the `union` operation within a subquery, then select 
from that subquery, and finally, place the subquery along with `order by` 
outside.
+
+```
+select * from (select age from student_01 union all select age from 
student_02) as t1 
+order by age limit 4;
+
++-------------+
+|     age     |
++-------------+
+|      18     |
+|      19     |
+|      20     |
+|      21     |
++-------------+
+4 rows in set (0.01 sec)
+```
+
+- JOIN
+  - In addition to supporting equi-join in inner join conditions, 
non-equi-join is also supported. However, for performance considerations, it is 
recommended to use equi-join.
+  - Other types of joins only support equi-join.
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/query/query-data/subquery.md 
b/versioned_docs/version-2.0/query/query-data/subquery.md
index 90bd09fb22f2..a470b1575242 100644
--- a/versioned_docs/version-2.0/query/query-data/subquery.md
+++ b/versioned_docs/version-2.0/query/query-data/subquery.md
@@ -24,4 +24,23 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
+A subquery is a `SELECT` statement that is nested within another `SELECT` 
statement. The nested subquery is often referred to as the inner query, while 
the containing query is known as the outer query or the outer query block. The 
subquery returns data that is used as a condition by the outer query to 
determine which data needs to be retrieved. There is no limit to the number of 
nested subqueries you can create.
+
+Like any query, a subquery can return records from a table as single-column 
single-record, single-column multiple-record, or multi-column multiple-record.
+
+## Subquery example in Where clause
+
+```
+SELECT * FROM sub_query_correlated_subquery1 WHERE k1 > (SELECT AVG(k1) FROM 
sub_query_correlated_subquery3) OR k1 < 10 order by k1, k2;
+```
+
+```
+select * from sub_query_correlated_subquery1 where 
sub_query_correlated_subquery1.k1 not in (select 
sub_query_correlated_subquery3.k3 from sub_query_correlated_subquery3 where 
sub_query_correlated_subquery3.v2 = sub_query_correlated_subquery1.k2) or k1 < 
10 order by k1, k2
+```
+
+##  Subquery example in Join clause
+
+```
+select t1.* from t1 left join t2 on t1.k2 = t2.k3 and t1.k1 in ( select t3.k1 
from t3 where t1.k2 = t3.k2 ) or t1.k1 < 10 order by t1.k1, t1.k2;
+select t1.* from t1 left join t2 on t1.k2 = t2.k3 and exists ( select t3.k1 
from t3 where t1.k2 = t3.k2 ) or t1.k1 < 10 order by t1.k1, t1.k2;
+```
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/query/query-data/window-function.md 
b/versioned_docs/version-2.0/query/query-data/window-function.md
index da933b403fef..6114a7dc7813 100644
--- a/versioned_docs/version-2.0/query/query-data/window-function.md
+++ b/versioned_docs/version-2.0/query/query-data/window-function.md
@@ -24,4 +24,84 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
+## Window function
+
+Window functions are a special type of built-in functions in databases. 
Similar to aggregate functions, window functions perform calculations on 
multiple input rows to obtain a single data value. However, the difference lies 
in the fact that window functions process the input data within a specific 
window, rather than grouping and calculating based on the `GROUP BY` clause. 
The data within each window can be sorted and grouped using the `OVER()` 
clause. Window functions calculate a separ [...]
+
+The syntax of window functions as follows:
+
+```
+function(args) OVER(partition_by_clause order_by_clause [window_clause])    
+partition_by_clause ::= PARTITION BY expr [, expr ...]    
+order_by_clause ::= ORDER BY expr [ASC | DESC] [, expr [ASC | DESC] ...]
+```
+
+#### Function
+
+The currently supported functions include `AVG(), COUNT(), DENSE_RANK(), 
FIRST_VALUE(), LAG(), LAST_VALUE(), LEAD(), MAX(), MIN(), RANK(), ROW_NUMBER(), 
SUM().`
+
+#### PARTITION BY clause
+
+The `Partition By` clause is similar to `Group By`. It groups input rows based 
on the specified one or more columns, where rows with the same values are 
placed in the same group.
+
+#### ORDER BY clause
+
+The `Order By` clause within a window function behaves similarly to the 
outer-level `Order By`. It defines the arrangement of input rows, and when 
`Partition By` is specified, the `Order By` determines the order within each 
partition. The only difference from the outer `Order By` is that within the 
`OVER` clause, using `Order By n` (where n is a positive integer) effectively 
does nothing, whereas in the outer context, `Order By n` signifies sorting 
based on the nth column.
+
+This example demonstrates adding an `id` column to the select list, where its 
values are 1, 2, 3, and so on, sorted according to the `date_and_time` column 
in the `events` table.
+
+```
+SELECT   
+row_number() OVER (ORDER BY date_and_time) AS id,   
+c1, c2, c3, c4   
+FROM events;
+```
+
+#### Window clause
+
+The Window clause is used to specify a computational range for window 
functions. It considers the current row and a specified number of rows before 
and after it as the target for the window function's operation. The methods 
supported by the Window clause include: AVG(), COUNT(), FIRST_VALUE(), 
LAST_VALUE(), and SUM(). For MAX() and MIN(), the Window clause can specify a 
starting range of UNBOUNDED PRECEDING.
+
+```
+ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | 
{ UNBOUNDED | n } FOLLOWING] ]
+```
+
+#### Example 
+
+Taking the following stock data as an example, the stock code is JDR, and the 
"closing price" refers to the daily closing quotation.
+
+```
+create table stock_ticker (stock_symbol string, closing_price decimal(8,2), 
closing_date timestamp);    
+...load some data...    
+select * from stock_ticker order by stock_symbol, closing_date
+ | stock_symbol | closing_price | closing_date        |
+ |--------------|---------------|---------------------|
+ | JDR          | 12.86         | 2014-10-02 00:00:00 |
+ | JDR          | 12.89         | 2014-10-03 00:00:00 |
+ | JDR          | 12.94         | 2014-10-04 00:00:00 |
+ | JDR          | 12.55         | 2014-10-05 00:00:00 |
+ | JDR          | 14.03         | 2014-10-06 00:00:00 |
+ | JDR          | 14.75         | 2014-10-07 00:00:00 |
+ | JDR          | 13.98         | 2014-10-08 00:00:00 |
+```
+
+This query utilizes a window function to generate the `moving_average `column, 
which calculates the average stock price over a three-day span, specifically 
the previous day, the current day, and the next day. Since there is no previous 
day's data for the first day and no next day's data for the last day, the 
average is only calculated based on two days for those rows. In this case, the 
`Partition By` clause is not relevant because all the data pertains to the 
stock JDR. However, if there [...]
+
+```
+select stock_symbol, closing_date, closing_price,    
+avg(closing_price) over (partition by stock_symbol order by closing_date    
+rows between 1 preceding and 1 following) as moving_average    
+from stock_ticker;
+ | stock_symbol | closing_date        | closing_price | moving_average |
+ |--------------|---------------------|---------------|----------------|
+ | JDR          | 2014-10-02 00:00:00 | 12.86         | 12.87          |
+ | JDR          | 2014-10-03 00:00:00 | 12.89         | 12.89          |
+ | JDR          | 2014-10-04 00:00:00 | 12.94         | 12.79          |
+ | JDR          | 2014-10-05 00:00:00 | 12.55         | 13.17          |
+ | JDR          | 2014-10-06 00:00:00 | 14.03         | 13.77          |
+ | JDR          | 2014-10-07 00:00:00 | 14.75         | 14.25          |
+ | JDR          | 2014-10-08 00:00:00 | 13.98         | 14.36          |
+```
+
+#### See more
+
+For more window functions, refer to [Window 
Functions](../../sql-manual/sql-functions/window-functions/WINDOW-FUNCTION).
\ No newline at end of file
diff --git 
a/versioned_docs/version-2.0/query/view-materialized-view/create-view.md 
b/versioned_docs/version-2.0/query/view-materialized-view/create-view.md
index 2da5b625b142..b5783ebd43a0 100644
--- a/versioned_docs/version-2.0/query/view-materialized-view/create-view.md
+++ b/versioned_docs/version-2.0/query/view-materialized-view/create-view.md
@@ -24,4 +24,59 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
\ No newline at end of file
+## View
+
+Views (logical views) are stored queries that encapsulate one or multiple 
SELECT statements. Views dynamically access and compute database data when 
executed. Views are read-only and can reference any combination of tables and 
other views.
+
+Views can be used for the following purposes:
+
+- To simplify access or provide secure access by hiding complex SELECT 
statements from users. For example, you can create a view that displays only 
the data users need from various tables while hiding sensitive data in those 
tables.
+- To encapsulate details of table structures that may change over time behind 
a consistent user interface.
+
+Unlike materialized views, logical views are not materialized, which means 
they do not store data on disk. Therefore, they have the following limitations:
+
+- When the underlying table data changes, Doris does not need to refresh the 
view data. However, accessing and computing data through views can incur some 
overhead.
+- Views do not support insert, delete, or update operations.
+
+## Creating View
+
+The syntax for creating a logical view is as follows:
+
+```
+CREATE VIEW [IF NOT EXISTS]
+ [db_name.]view_name
+ (column1[ COMMENT "col comment"][, column2, ...])
+AS query_stmt
+```
+
+Explanation:
+
+- Views are logical and have no physical storage. All queries on views are 
equivalent to queries on the corresponding subquery of the view.
+- query_stmt is any supported SQL statement.
+
+Example:
+
+- Creating a view named `example_view` in the `example_db` database:
+
+```
+CREATE VIEW example_db.example_view (k1, k2, k3, v1)
+AS
+SELECT c1 as k1, k2, k3, SUM(v1) FROM example_table
+WHERE k1 = 20160112 GROUP BY k1,k2,k3;
+```
+
+- Creating a view with comments:
+
+```
+CREATE VIEW example_db.example_view
+(
+    k1 COMMENT "first key",
+    k2 COMMENT "second key",
+    k3 COMMENT "third key",
+    v1 COMMENT "first value"
+)
+COMMENT "my first view"
+AS
+SELECT c1 as k1, k2, k3, SUM(v1) FROM example_table
+WHERE k1 = 20160112 GROUP BY k1,k2,k3;
+```
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/table-design/index/bitmap-index.md 
b/versioned_docs/version-2.0/table-design/index/bitmap-index.md
index a8057a0b19d5..97385c429d2d 100644
--- a/versioned_docs/version-2.0/table-design/index/bitmap-index.md
+++ b/versioned_docs/version-2.0/table-design/index/bitmap-index.md
@@ -24,4 +24,61 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
\ No newline at end of file
+# Bitmap index
+
+Bitmap Index is an index represented by bitmaps, where a bitmap is created for 
each key value in a column. Compared to other indexes, it occupies very little 
storage space and is very fast to create and use. However, it has a 
disadvantage of having a large lock granularity for modification operations, 
making it unsuitable for frequent updates.
+
+![bitmap index](/images/Bitmap-index.png)
+
+### Applicable scenarios
+
+- Columns with high value repetition, recommended to be between 100 and 
100,000, such as occupation, city, etc. If the repetition is too high, there is 
no significant advantage compared to other types of indexes; if the repetition 
is too low, space efficiency and performance will be greatly reduced.
+- Specific types of queries such as `count`, `or`, `and` logical operations 
that only require bitwise operations. For example: querying with multiple 
conditions combined, `select count(*) from table where city = 'Nanjing' and job 
= 'Doctor' and phonetype = 'iphone' and gender = 'Male'.` If bitmap indexes are 
established on each query condition column, the database can perform efficient 
bit operations, accurately locate the required data, reduce disk IO, and the 
smaller the filtered resul [...]
+- Suitable for ad-hoc queries, multi-dimensional analysis, and other 
analytical scenarios. If a table has 100 columns and users use 20 of them as 
query conditions (arbitrarily using several columns from these 20 columns), 
creating 20 bitmap indexes on these columns will allow all queries to utilize 
the indexes.
+
+### Inapplicable scenarios
+
+- Columns with low value repetition, such as ID cards, phone numbers, etc.
+- Columns with excessively high repetition, such as gender. Although bitmap 
indexes can be established, it is not recommended to use them as query 
conditions alone. It is recommended to filter them together with other 
conditions.
+- Columns that often need to be updated.
+
+### Creating bitmap index
+
+Creating a bitmap index named index_name on the column siteid in the table 
named table_name:
+
+```
+CREATE INDEX [IF NOT EXISTS] index_name ON table1 (siteid) USING BITMAP;
+```
+
+## Viewing bitmap index
+
+Displaying indexes under the specified table_name:
+
+```
+SHOW INDEX FROM table_name;
+```
+
+### Deleting index
+
+Deleting the index named index_name under the specified table_name:
+
+```
+DROP INDEX [IF EXISTS] index_name ON table_name;
+```
+
+### Notes:
+
+- Bitmap indexes are only created on single columns.
+- Bitmap indexes can be applied to all columns in Duplicate and Uniq data 
models and key columns in the Aggregate model.
+- The data types supported by bitmap indexes are as follows: 
+  - `TINYINT`
+  - `SMALLINT`
+  - `INT`
+  - `BIGINT`
+  - `CHAR`
+  - `VARCHAR`
+  - `DATE`
+  - `DATETIME`
+  - `LARGEINT`
+  - `DECIMAL`
+  - `BOOL`
\ No newline at end of file
diff --git a/versioned_docs/version-2.0/table-design/index/prefix-index.md 
b/versioned_docs/version-2.0/table-design/index/prefix-index.md
index c21f5a1fce78..2eb2b943538b 100644
--- a/versioned_docs/version-2.0/table-design/index/prefix-index.md
+++ b/versioned_docs/version-2.0/table-design/index/prefix-index.md
@@ -24,5 +24,42 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+## Sort key and prefix index
 
-Coming Soon
\ No newline at end of file
+Doris stores data in a data structure similar to SSTable (Sorted String 
Table), which is an ordered data structure that can sort and store data 
according to specified columns. Performing queries based on sorted columns in 
this data structure is highly efficient.
+
+In the three data models of Aggregate, Unique, and Duplicate, the underlying 
data storage is sorted and stored according to the columns specified in the 
AGGREGATE KEY, UNIQUE KEY, and DUPLICATE KEY of their respective table creation 
statements. These keys are referred to as Sort Keys. With the help of Sort 
Keys, Doris can quickly find the data to be processed by specifying conditions 
on the sorted columns during queries, reducing the complexity of searching and 
thus accelerating the quer [...]
+
+Based on the Sort Keys, Prefix Indexes are introduced. A Prefix Index is a 
sparse index. In the table, a logical data block is formed according to the 
corresponding number of rows. Each logical data block stores an index entry in 
the Prefix Index table. The length of the index entry does not exceed 36 bytes, 
and its content is the prefix composed of the sorted columns of the first row 
of data in the data block. When searching the Prefix Index table, it can help 
determine the starting row [...]
+
+:::tip
+The first 36 bytes of a row of data in a data block serve as the prefix index 
for that row. When encountering a VARCHAR type, the prefix index will be 
truncated directly. If the first column is of the VARCHAR type, truncation will 
occur even if the length does not reach 36 bytes.
+:::
+
+## Example
+
+- If the sort keys of the table are as follows: 5 columns, then the prefix 
index would be: user_id (8 Bytes), age (4 Bytes), message (prefix 20 Bytes).
+
+ColumnNameTypeuser_idBIGINTageINTmessageVARCHAR(100)max_dwell_timeDATETIMEmin_dwell_timeDATETIME
+
+- If the sort keys of the table consist of 5 columns and the first column is 
`user_name` of the VARCHAR type, then the prefix index would be `user_name` 
(truncated to 20 Bytes). Even though the total size of the prefix index has not 
reached 36 bytes, truncation occurs because it encounters a VARCHAR column, and 
no further columns are included.
+
+ColumnNameTypeuser_nameVARCHAR(20)ageINTmessageVARCHAR(100)max_dwell_timeDATETIMEmin_dwell_timeDATETIME
+
+- When our query conditions match the prefix index, it can greatly accelerate 
the query speed. For example, in the first case, executing the following query:
+
+```
+SELECT * FROM table WHERE user_id=1829239 and age=20；
+```
+
+The efficiency of that query would be much higher than the following query:
+
+```
+SELECT * FROM table WHERE age=20；
+```
+
+Therefore, when creating a table, selecting the correct order of columns can 
greatly enhance query efficiency.
+
+## Multiple prefix indexes
+
+Due to the specified column order during table creation, a table typically has 
only one type of prefix index. This may not meet the efficiency requirements 
for queries that use other columns as conditions, which do not hit the prefix 
index. In such cases, multiple prefix indexes can be indirectly implemented by 
creating corresponding strongly consistent materialized views of the single 
table with adjusted column orders. For more details, please refer to  
[Materialized Views](../../query/ [...]
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [docs]Update En docs (#440)

Reply via email to