(doris) branch branch-2.0 updated: [docs & fix](stats) Fix tablesample init failed and some outdated contents in docs #25603 (#26250)

kxiao Fri, 03 Nov 2023 02:47:13 -0700

This is an automated email from the ASF dual-hosted git repository.

kxiao pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/branch-2.0 by this push:
     new cb3b29ddcce [docs & fix](stats) Fix tablesample init failed and some 
outdated contents in docs #25603 (#26250)
cb3b29ddcce is described below

commit cb3b29ddcce1ca1cc4ca652abbbee08309cb7404
Author: AKIRA <33112463+kikyou1...@users.noreply.github.com>
AuthorDate: Fri Nov 3 17:46:58 2023 +0800

    [docs & fix](stats) Fix tablesample init failed and some outdated contents 
in docs #25603 (#26250)
---
 docs/en/docs/query-acceleration/statistics.md      | 846 +++------------------
 docs/zh-CN/docs/query-acceleration/statistics.md   | 650 ++--------------
 .../apache/doris/statistics/OlapAnalysisTask.java  |   4 +-
 3 files changed, 209 insertions(+), 1291 deletions(-)

diff --git a/docs/en/docs/query-acceleration/statistics.md 
b/docs/en/docs/query-acceleration/statistics.md
index 28795d01deb..069c25fb1a8 100644
--- a/docs/en/docs/query-acceleration/statistics.md
+++ b/docs/en/docs/query-acceleration/statistics.md
@@ -28,55 +28,27 @@ under the License.
 
 ## Introduction to statistics information
 
-In SQL databases, the quality of the query optimizer has a significant impact 
on system performance. The optimizer needs to estimate the query cost according 
to the statistics information, especially in the equal-value query scenario, it 
is very important to estimate the cardinality accurately, which can help the 
optimizer to select the optimal query plan, thereby improving the query 
performance.
+Collecting statistics helps the optimizer understand data distribution 
characteristics. When performing Cost-Based Optimization (CBO), the optimizer 
utilizes these statistics to calculate the selectivity of predicates and 
estimate the cost of each execution plan. This enables the selection of more 
efficient plans, significantly improving query performance.
 
-When executing a query, an insufficiently optimized execution plan and an 
optimized execution plan can result in a large difference in execution time, 
which can be several times greater. Therefore, it is very important for the SQL 
query optimizer to collect and analyze statistics so that the optimizer can 
accurately evaluate the cost of different execution plans and select the best 
one.
+Currently, the collected column-level information includes:
 
-The Doris query optimizer uses statistics to determine the most efficient 
execution plan for a query. Statistics maintained by Doris include table-level 
statistics and column-level statistics.
+| Information      | Description              |
+| :--------------- | :------------------------ |
+| `row_count`      | Total number of rows     |
+| `data_size`      | Total data size          |
+| `avg_size_byte`  | Average length of values |
+| `ndv`            | Number of distinct values |
+| `min`            | Minimum value            |
+| `max`            | Maximum value            |
+| `null_count`     | Number of null values    |
 
-Table Statistics:
+## Collecting Statistics
 
-| Information         | Description                                            
            |
-| :------------------ | 
:----------------------------------------------------------------- |
-| `row_count`         | Number of rows in the table                            
            |
-| `data_size`         | Table size (in bytes)                                  
            |
-| `update_rows`       | The number of rows updated after collecting statistics 
information |
-| `healthy`           | The health of the table                                
            |
-| `update_time`       | The time of the latest update                          
            |
-| `last_analyze_time` | The time when the last statistics information was 
collected        |
+### Using the ANALYZE Statement
 
-> Table Health: Indicates the health of the table statistics. When it 
`update_rows` is greater than or equal to `row_count`, the health degree is 0; 
when it `update_rows` is less than `row_count`, the health degree is `100 * (1 
- update_rows/ row_count)`.
+Doris supports users in triggering the collection and updating of statistics 
by submitting the ANALYZE statement.
 
-Column Statistics:
-
-| Information     | Description                           |
-| :-------------- | :------------------------------------ |
-| `row_count`     | Total number of rows for the column   |
-| `data_size`     | Total degree of the column in bytes   |
-| `avg_size_byte` | Average degree of the column in bytes |
-| `ndv`           | Column num distinct value             |
-| `min`           | Column Minimum                        |
-| `max`           | Column Max Value                      |
-| `null_count`    | Number of columns null                |
-| `histogram`     | Column Histogram                      |
-
-Next, we will briefly introduce the histogram and other data structures, as 
well as the collection and maintenance of statistics information in detail.
-
-## Introduction to Histograms
-
-A histogram is a tool used to describe the distribution of data. It divides 
the data into several intervals (buckets) according to the size, and uses 
simple statistics to represent the characteristics of the data in each 
interval. Is an important statistic in a database that describes the 
distribution of data in a column. The most typical application scenario of 
histogram is to help the optimizer choose the optimal execution plan by 
estimating the selectivity of query predicates.
-
-In Doris, an equi-height Histogram is built for each table-specific column. 
The histogram comprises a series of buckets, wherein the statistics of each 
bucket comprises the upper and lower bounds of the bucket, the number of 
elements contained in the bucket, the number of all elements in the previous 
bucket, and the number of different values in the bucket. For details, please 
refer to the SQL function `histogram` or `hist` its instructions.
-
-> Using the bucket method of contour histogram, the sum of numerical frequency 
in each bucket should be close to the total number of `1/N` rows. However, if 
the principle of equal height is strictly followed, some values will fall on 
the boundary of the bucket, resulting in the same value appearing in two 
different buckets. This situation can interfere with the estimation of the 
selection rate. Therefore, in the implementation, Doris modifies the bucketting 
method of the contour histogra [...]
-
-## Collect statistics
-
-### Manual collection
-
-The user triggers a manual collection job through a statement `ANALYZE` to 
collect statistics for the specified table or column based on the supplied 
parameters.
-
-Column statistics collection syntax:
+Syntax:
 
 ```SQL
 ANALYZE < TABLE | DATABASE table_name | db_name >
@@ -86,43 +58,7 @@ ANALYZE < TABLE | DATABASE table_name | db_name >
     [ PROPERTIES ("key" = "value", ...) ];
 ```
 
-Explanation:
-
-- Table_name: The target table for the specified. It can be a 
`db_name.table_name` form.
-- Column_name: The specified target column. Must be `table_name` a column that 
exists in. Multiple column names are separated by commas.
-- Sync: Synchronizes the collection of statistics. Return after collection. If 
not specified, it will be executed asynchronously and the job ID will be 
returned.
-- Incremental: Incrementally gather statistics. Incremental collection of 
histogram statistics is not supported.
-- Period: Collect statistics periodically. The unit is seconds, and when 
specified, the appropriate statistics are collected periodically.
-- Sample percent | rows: Sample collection statistics. You can specify a 
sampling ratio or the number of rows to sample.
-- Buckets: Specifies the maximum number of buckets generated when collecting 
histogram statistics. The default is 128 when not specified.
-- Properties: used to configure statistics job. Currently, only the following 
configuration items are supported
-  - `"sync" = "true"`: Equivalent `with sync`
-  - `"incremental" = "true"`: Equivalent `with incremental`
-  - `"sample.percent" = "50"`: Equivalent `with percent 50`
-  - `"sample.rows" = "1000"`: Equivalent `with rows 1000`
-  - `"num.buckets" = "10"`: Equivalent `with buckets 10`
-  - `"period.seconds" = "300"`: Equivalent `with period 300`
-
-Next, we will use a table `stats_test.example_tbl` as an example to explain 
how to collect statistics. `stats_test.example_tbl` The structure is as follows:
-
-| Column Name     | Type        | AggregationType | Comment                 |
-| --------------- | ----------- | --------------- | ----------------------- |
-| user_id         | LARGEINT    |                 | User ID                 |
-| imp_date        | DATEV2      |                 | Data import date        |
-| city            | VARCHAR(20) |                 | User city               |
-| age             | SMALLINT    |                 | User age                |
-| sex             | TINYINT     |                 | User gender             |
-| last_visit_date | DATETIME    | REPLACE         | User last visit time    |
-| cost            | BIGINT      | SUM             | User total cost         |
-| max_dwell_time  | INT         | MAX             | User maximum dwell time |
-| min_dwell_time  | INT         | MIN             | User minimum dwell time |
-
-Connect Doris:
-
-````Bash
-mysql -uroot -P9030 -h192.168.xxx.xxx```
-
-Create a data table:
+Where:
 
 - `table_name`: Specifies the target table. It can be in the 
`db_name.table_name` format.
 - `partition_name`: The specified target partitions（for hive external table 
only）。Must be partitions exist in `table_name`. Multiple partition names are 
separated by commas. e.g. for single level partition: 
PARTITIONS(`event_date=20230706`), for multi level partition: 
PARTITIONS(`nation=US/city=Washington`). PARTITIONS(*) specifies all 
partitions, PARTITIONS WITH RECENT 30 specifies the latest 30 partitions.
@@ -131,384 +67,53 @@ Create a data table:
 - `sample percent | rows`: Collect statistics using sampling. You can specify 
either the sampling percentage or the number of sampled rows.
 - `sql`: Collect statistics for external partition column with sql. By 
default, it uses meta data for partition columns, which is faster but may 
inaccurate for row count and size. Using sql could collect the accurate stats.
 
-mysql> CREATE TABLE IF NOT EXISTS stats_test.example_tbl (
-        `user_id` LARGEINT NOT NULL,        `date` DATEV2 NOT NULL,        
`city` VARCHAR(20),        `age` SMALLINT,        `sex` TINYINT,        
`last_visit_date` DATETIME REPLACE,        `cost` BIGINT SUM,        
`max_dwell_time` INT MAX,        `min_dwell_time` INT MIN    ) ENGINE=OLAP    
AGGREGATE KEY(`user_id`, `date`, `city`, `age`, `sex`)    PARTITION BY 
LIST(`date`)    (        PARTITION `p_201701` VALUES IN ("2017-10-01"),        
PARTITION `p_201702` VALUES IN ("2017-10-02"),   [...]
-````
-
-Import data:
-
-```SQL
-mysql> INSERT INTO stats_test.example_tbl (`user_id`, `date`, `city`, `age`,
-                                    `sex`, `last_visit_date`, `cost`,          
                          `max_dwell_time`, `min_dwell_time`)    VALUES (10000, 
"2017-10-01", "Beijing", 20, 0, "2017-10-01 07:00:00", 15, 2, 2),        
(10000, "2017-10-01", "Beijing", 20, 0, "2017-10-01 06:00:00", 20, 10, 10),     
   (10001, "2017-10-01", "Beijing", 30, 1, "2017-10-01 17:05:45", 2, 22, 22),   
     (10002, "2017-10-02", "Shanghai", 20, 1, "2017-10-02 12:59:12", 200, 5, 
5),        (10003, "201 [...]
-```
-
-To view data results:
-
-```SQL
-mysql> SELECT * FROM stats_test.example_tbl;
-+---------+------------+-----------+------+------+---------------------+------+----------------+----------------+
-| user_id | date       | city      | age  | sex  | last_visit_date     | cost 
| max_dwell_time | min_dwell_time |
-+---------+------------+-----------+------+------+---------------------+------+----------------+----------------+
-| 10004   | 2017-10-03 | Shenzhen  |   35 |    0 | 2017-10-03 10:20:22 |   11 
|              6 |              6 |
-| 10000   | 2017-10-01 | Beijing   |   20 |    0 | 2017-10-01 06:00:00 |   35 
|             10 |              2 |
-| 10001   | 2017-10-01 | Beijing   |   30 |    1 | 2017-10-01 17:05:45 |    2 
|             22 |             22 |
-| 10004   | 2017-10-01 | Shenzhen  |   35 |    0 | 2017-10-01 10:00:15 |  100 
|              3 |              3 |
-| 10002   | 2017-10-02 | Shanghai  |   20 |    1 | 2017-10-02 12:59:12 |  200 
|              5 |              5 |
-| 10003   | 2017-10-02 | Guangzhou |   32 |    0 | 2017-10-02 11:20:00 |   30 
|             11 |             11 |
-+---------+------------+-----------+------+------+---------------------+------+----------------+----------------+
-```
-
-For the convenience of description, column statistics information is 
hereinafter referred to as statistics information, which stores the number of 
rows, the maximum value, the minimum value, the number of NULL values, and the 
like of a column; and a column histogram is referred to as histogram statistics 
information.
-
-#### Full collection
-
-##### Collect column statistic
-
-Column statistics mainly include the number of rows, the maximum value, the 
minimum value, and the number of NULL values of a column, which are collected 
through `ANALYZE TABLE` statements.
-
-When executing SQL statements, the optimizer will, in most cases, only use 
statistics for some of the columns (for example, `WHERE` the columns that 
appear in the, `JOIN`, `ORDER BY`, `GROUP BY` clauses). If a table has many 
columns, collecting statistics for all columns can be expensive. To reduce 
overhead, you can collect statistics for specific columns only for use by the 
optimizer.
-
-Example:
-
-- Collect `example_tbl` statistics for all columns of a table, using the 
following syntax:
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl;
-+--------+
-| job_id |
-+--------+
-| 51730  |
-+--------+
-```
-
-- Collect `example_tbl` statistics for table `city` `age` `sex` columns, using 
the following syntax:
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl(city, age, sex);
-+--------+
-| job_id |
-+--------+
-| 51808  |
-+--------+
-```
-
-##### Collect histogram information
-
-Column histogram information is used to describe the distribution of columns. 
It divides the data into several intervals (buckets) according to the size, and 
uses simple statistics to represent the characteristics of the data in each 
interval. Collected by `ANALYZE TABLE` statement fit `UPDATE HISTOGRAM`.
-
-Columns can be specified to collect their histogram information in the same 
way that normal statistics are collected. Collecting histogram information 
takes longer than normal statistics, so to reduce overhead, we can just collect 
histogram information for specific columns for the optimizer to use.
-
-Example:
-
-- Collects `example_tbl` histograms for all columns of a table, using the 
following syntax:
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM;
-+--------+
-| job_id |
-+--------+
-| 51838  |
-+--------+
-```
-
-- Collect `example_tbl` histograms for table `city` `age` `sex` columns, using 
the following syntax:
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl(city, age, sex) UPDATE HISTOGRAM;
-+--------+
-| job_id |
-+--------+
-| 51889  |
-+--------+
-```
-
-- Collect `example_tbl` histograms for all columns of the table and set the 
maximum number of buckets, using the following syntax:
-
-```SQL
--- use with buckets
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH BUCKETS 2;
-+--------+
-| job_id |
-+--------+
-| 52018  |
-+--------+
-
--- configure num.buckets
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM 
PROPERTIES("num.buckets" = "2");
-+--------+
-| job_id |
-+--------+
-| 52069  |
-+--------+
-```
-
-#### Incremental collection
-
-For partitioned tables, incremental collection can be used to improve the 
speed of statistics collection if partitions are added or deleted after full 
collection.
-
-When using incremental collection, the system automatically checks for new or 
deleted partitions. There are three situations:
-
-- For newly added partitions, the statistics of the newly added partitions are 
collected and merged/summarized with the historical statistics.
-- Refresh historical statistics for deleted partitions.
-- No new/deleted partition. Do not do anything.
-
-Incremental collection is appropriate for tables with monotonic non-decreasing 
columns such as time columns as partitions, or tables where historical 
partition data is not updated.
-
-Notice：
-
-- Histogram statistics do not support incremental collection.
-- When using incremental collection, you must ensure that the statistics 
information of table inventory is available (that is, other historical 
partition data does not change). Otherwise, the statistics information will be 
inaccurate.
-
-Example:
-
-- Incrementally collect `example_tbl` statistics for a table, using the 
following syntax:
-
-```SQL
--- use with incremental
-mysql> ANALYZE TABLE stats_test.example_tbl WITH INCREMENTAL;
-+--------+
-| job_id |
-+--------+
-| 51910  |
-+--------+
-
--- configure incremental
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("incremental" = "true");
-+--------+
-| job_id |
-+--------+
-| 51910  |
-+--------+
-```
-
-- Incrementally collect `example_tbl` statistics for table `city` `age` `sex` 
columns, using the following syntax:
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl(city, age, sex) WITH INCREMENTAL;
-+--------+
-| job_id |
-+--------+
-| 51988  |
-+--------+
-```
-
-#### Sampling collection
-
-When the amount of table data is large, the system may take time to collect 
statistics. You can use sampling collection to speed up the collection of 
statistics. Specify the proportion of sampling or the number of rows to be 
sampled according to the actual situation.
-
-Example:
-
-- Sampling collects `example_tbl` statistics from a table, using the following 
syntax:
+### Automatic Statistics Collection
 
-```SQL
--- use with sample rows
-mysql> ANALYZE TABLE stats_test.example_tbl WITH SAMPLE ROWS 5;
-+--------+
-| job_id |
-+--------+
-| 52120  |
-+--------+
-
--- use with sample percent
-mysql> ANALYZE TABLE stats_test.example_tbl WITH SAMPLE PERCENT 50;
-+--------+
-| job_id |
-+--------+
-| 52201  |
-+--------+
-
--- configure sample.row
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("sample.rows" = "5");
-+--------+
-| job_id |
-+--------+
-| 52279  |
-+--------+
-
--- configure sample.percent
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("sample.percent" = 
"50");
-+--------+
-| job_id |
-+--------+
-| 52282  |
-+--------+
-```
-
-- Samples collect `example_tbl` histogram information for a table, similar to 
normal statistics, using the following syntax:
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH SAMPLE ROWS 
5;
-+--------+
-| job_id |
-+--------+
-| 52357  |
-+--------+
-```
-
-#### Synchronous collection
-
-Generally, after executing `ANALYZE` the statement, the system will start an 
asynchronous job to collect statistics and return the statistics job ID 
immediately. If you want to wait for the statistics collection to finish and 
return, you can use synchronous collection.
-
-Example:
-
-- Sampling collects `example_tbl` statistics from a table, using the following 
syntax:
-
-```SQL
--- use with sync
-mysql> ANALYZE TABLE stats_test.example_tbl WITH SYNC;
-
--- configure sync
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("sync" = "true");
-```
-
-- Samples collect `example_tbl` histogram information for a table, similar to 
normal statistics, using the following syntax:
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH SYNC;
-```
-
-### Automatic collection
+Users can enable this feature by setting the FE configuration option 
`enable_full_auto_analyze = true`. Once enabled, statistics on qualifying 
tables and columns will be automatically collected during specified time 
intervals. Users can specify the automatic collection time period by setting 
the `full_auto_analyze_start_time` (default is 00:00:00) and 
`full_auto_analyze_end_time` (default is 02:00:00) parameters.
 
-Automatic collection means that the system will automatically generate a job 
to collect statistics when the user specifies `PERIOD` `AUTO` keywords or 
performs related configuration when executing `ANALYZE` a statement.
+This feature collects statistics only for tables and columns that either have 
no statistics or have outdated statistics. When more than 20% of the data in a 
table is updated (this value can be configured using the 
`table_stats_health_threshold` parameter with a default of 80), Doris considers 
the statistics for that table to be outdated.
 
-#### Periodic collection
+For tables with a large amount of data (default is 5GiB), Doris will 
automatically use sampling to collect statistics, reducing the impact on the 
system and completing the collection job as quickly as possible. Users can 
adjust this behavior by setting the `huge_table_lower_bound_size_in_bytes` FE 
parameter. If you want to collect statistics for all tables in full, you can 
set the `enable_auto_sample` FE parameter to false. For tables with data size 
greater than `huge_table_lower_bound_s [...]
 
-Periodic collection means that the corresponding statistics of a table are 
re-collected at a certain time interval.
+The default sample size for automatic sampling is 4194304(2^22) rows, but the 
actual sample size may be larger due to implementation reasons. If you want to 
sample more rows to obtain more accurate data distribution information, you can 
configure the `huge_table_default_sample_rows` FE parameter.
 
-Example:
+### Task Management
 
-- Collect `example_tbl` statistics for a table periodically (every other day), 
using the following syntax:
+#### Viewing Analyze Tasks
 
-```SQL
--- use with period
-mysql> ANALYZE TABLE stats_test.example_tbl WITH PERIOD 86400;
-+--------+
-| job_id |
-+--------+
-| 52409  |
-+--------+
-
--- configure period.seconds
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("period.seconds" = 
"86400");
-+--------+
-| job_id |
-+--------+
-| 52535  |
-+--------+
-```
+You can use `SHOW ANALYZE` to view information about statistics collection 
tasks.
 
-- Collects `example_tbl` histogram information for a table periodically (every 
other day), similar to normal statistics, using the following syntax:
+Syntax:
 
 ```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH PERIOD 86400;
-+--------+
-| job_id |
-+--------+
-| 52684  |
-+--------+
-```
-
-#### Automatic collection
-
-Statistics can be "invalidated" when tables are changed, which can cause the 
optimizer to select the wrong execution plan.
-
-Table statistics may become invalid due to the following causes:
-
-- New field: The new field has no statistics
-- Field change: Original statistics are unavailable
-- Added zone: The new zone has no statistics
-- Zone change: The original statistics are invalid
-- data changes (insert data delete data | | change data) : the statistical 
information is error
-
-The main operations involved include:
-
-- update: updates the data
-- delete: deletes data
-- drop: deletes a partition
-- load: import data and add partitions
-- insert: inserts data and adds partitions
-- alter: Field change, partition change, or new partition
-
-Database, table, partition, field deletion, internal will automatically clear 
these invalid statistics. Adjusting the column order and changing the column 
type do not affect.
-
-The system determines whether to collect statistics again based on the health 
of the table (as defined above). By setting the health threshold, the system 
collects statistics about the table again when the health is lower than a 
certain value. To put it simply, if statistics are collected on a table and the 
data of a partition becomes more or less, or a partition is added or deleted, 
the statistics may be automatically collected. After the statistics are 
collected again, the statistics a [...]
-
-Currently, only tables that are configured by the user to automatically 
collect statistics will be collected, and statistics will not be automatically 
collected for other tables.
-
-Example:
-
-- Automatically analysis statistics for the 'example_tbl' table using the 
following syntax:
-
-```SQL
--- use with auto
-mysql> ANALYZE TABLE stats_test.example_tbl WITH AUTO;
-+--------+
-| job_id |
-+--------+
-| 52539  |
-+--------+
-
--- configure automatic
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("automatic" = "true");
-+--------+
-| job_id |
-+--------+
-| 52565  |
-+--------+
-```
-
-### Manage job
-
-#### View statistics job
-
-Collect information for the job by `SHOW ANALYZE` viewing the statistics.
-
-The syntax is as follows:
-
-```SQL
-SHOW ANALYZE [ table_name | job_id ]
+SHOW ANALYZE < table_name | job_id >
     [ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
 ```
 
-Explanation:
-
-- Table_name: The table name. After it is specified, the statistics job 
information corresponding to the table can be viewed. It can be a 
`db_name.table_name` form. Return all statistics job information if not 
specified.
-- Job_ID: The statistics job ID `ANALYZE`. The value returned when the 
asynchronous collection of statistics is performed. Return all statistics job 
information if not specified.
-
-Currently `SHOW ANALYZE`, 11 columns are output, as follows:
-
-| Column Name            | Description         |
-| :--------------------- | :------------------ |
-| `job_id`               | statistics job ID   |
-| `catalog_name`         | Catalog name        |
-| `db_name`              | Database name       |
-| `tbl_name`             | Variable name       |
-| `col_name`             | Column name         |
-| `job_type`             | job type            |
-| `analysis_type`        | statistics type     |
-| `message`              | job information     |
-| `last_exec_time_in_ms` | Last execution time |
-| `state`                | job state           |
-| `schedule_type`        | Scheduling method   |
+- `table_name`: Specifies the table for which you want to view statistics 
collection tasks. It can be in the form of `db_name.table_name`. If not 
specified, it returns information for all statistics collection tasks.
+- `job_id`: The job ID of the statistics information task returned when 
executing `ANALYZE`. If not specified, it returns information for all 
statistics collection tasks.
 
-> In the system, the statistics job contains multiple subtasks, each of which 
collects a separate column of statistics.
+Output:
 
-Example:
-
-- View statistics job information with ID `20038`, using the following syntax:
+| Column Name           | Description    |
+| :-------------------- | :------------- |
+| `job_id`              | Job ID         |
+| `catalog_name`        | Catalog Name   |
+| `db_name`             | Database Name  |
+| `tbl_name`            | Table Name     |
+| `col_name`            | Column Name    |
+| `job_type`            | Job Type       |
+| `analysis_type`       | Analysis Type  |
+| `message`             | Task Message   |
+| `last_exec_time_in_ms`| Last Execution Time |
+| `state`               | Task State     |
+| `schedule_type`       | Schedule Type  |
 
-```SQL
-mysql> SHOW ANALYZE 20038 
-+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
-| job_id | catalog_name | db_name              | tbl_name | col_name           
   | job_type | analysis_type | message | last_exec_time_in_ms | state    | 
schedule_type |
-+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
-| 20038  | internal     | default_cluster:test | t3       | 
[col4,col2,col3,col1] | MANUAL   | FUNDAMENTALS  |         | 2023-06-01 
17:22:15  | FINISHED | ONCE          |
-+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
 
-```
+You can use `SHOW ANALYZE TASK STATUS [job_id]` to check the completion status 
of collecting statistics for each column.
 
 ```
-mysql> show analyze task status  20038 ;
+mysql> show analyze task status 20038;
 +---------+----------+---------+----------------------+----------+
 | task_id | col_name | message | last_exec_time_in_ms | state    |
 +---------+----------+---------+----------------------+----------+
@@ -517,357 +122,158 @@ mysql> show analyze task status  20038 ;
 | 20041   | col3     |         | 2023-06-01 17:22:15  | FINISHED |
 | 20042   | col1     |         | 2023-06-01 17:22:15  | FINISHED |
 +---------+----------+---------+----------------------+----------+
-
-```
-
-- View all statistics job information, and return the first 3 pieces of 
information in descending order of the last completion time, using the 
following syntax:
-
-```SQL
-mysql> SHOW ANALYZE WHERE state = "FINISHED" ORDER BY last_exec_time_in_ms 
DESC LIMIT 3;
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-| job_id | catalog_name | db_name                    | tbl_name    | col_name  
      | job_type | analysis_type | message | last_exec_time_in_ms | state    | 
schedule_type |
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-| 68603  | internal     | default_cluster:stats_test | example_tbl | age       
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:27  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | sex       
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:26  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | 
last_visit_date | MANUAL   | COLUMN        |         | 2023-05-05 17:53:26  | 
FINISHED | ONCE          |
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
 ```
 
-#### Terminate the statistics job
+#### Terminating Analyze Tasks
 
-To `KILL ANALYZE` terminate a running statistics job.
+You can terminate running statistics collection tasks using `KILL ANALYZE`.
 
-The syntax is as follows:
+Syntax:
 
 ```SQL
 KILL ANALYZE job_id;
 ```
 
-Explanation:
-
-- Job_ID: Statistics job ID. The value returned when an asynchronous 
collection of statistics is performed `ANALYZE`, which can also be obtained by 
a `SHOW ANALYZE` statement.
+- `job_id`: The job ID of the statistics information task. It is returned when 
executing `ANALYZE`, or you can obtain it using the `SHOW ANALYZE` statement.
 
 Example:
 
-- Stop the statistics job whose ID is the 52357.
+- Terminating statistics collection task with job ID 52357.
 
 ```SQL
 mysql> KILL ANALYZE 52357;
 ```
 
-## View statistics
-
-### Table statistics
+#### Viewing Statistics Information
 
-> Temporarily unavailable.
+#### Table Statistics Information
 
-To `SHOW TABLE STATS` view information such as the total number of rows in the 
table and the health of the statistics.
+You can use `SHOW TABLE STATS` to view an overview of statistics collection 
for a table.
 
-The syntax is as follows:
+Syntax:
 
 ```SQL
-SHOW TABLE STATS table_name [ PARTITION (partition_name) ];
+SHOW TABLE STATS table_name;
 ```
 
-Explanation:
-
-- Table_name: The table to which the data is imported. It can be a 
`db_name.table_name` form.
-- Partition_name: The specified target partition. Must be `table_name` a 
partition that exists in. Only one partition can be specified.
+- `table_name`: The name of the table for which you want to view statistics 
collection information. It can be in the form of `db_name.table_name`.
 
-Currently `SHOW TABLE STATS`, 6 columns are output, as follows:
+Output:
 
-| Column Name       | Description                                         |
-| :---------------- | :-------------------------------------------------- |
-| row_count         | Number of rows                                      |
-| update_rows       | Number of rows updated                              |
-| data_size         | Data size. Unit: bytes                              |
-| healthy           | Health                                              |
-| update_time       | Update time                                         |
-| last_analyze_time | Time when statistics information was last collected |
+| Column Name      | Description                            |
+| :--------------- | :------------------------------------- |
+| `row_count`      | Number of rows (may not be the exact count at the time of 
execution) |
+| `method`         | Collection method (FULL/SAMPLE)        |
+| `type`           | Type of statistics data                 |
+| `updated_time`   | Last update time                       |
+| `columns`        | Columns for which statistics were collected |
+| `trigger`        | Trigger method for statistics collection (Auto/User) |
 
-Example:
 
-- To view `example_tbl` statistics for a table, use the following syntax:
+#### Viewing Column Statistics Information
 
-```SQL
-mysql> SHOW TABLE STATS stats_test.example_tbl;
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| row_count | update_rows | healthy | data_size | update_time         | 
last_analyze_time   |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| 8         | 0           | 100     | 6999      | 2023-04-08 15:40:47 | 
2023-04-08 17:43:28 |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-```
+You can use `SHOW COLUMN [cached] STATS` to view information about the number 
of distinct values and NULLs in columns.
 
-- To view `example_tbl` statistics for a table `p_201701` partition, use the 
following syntax:
+Syntax:
 
 ```SQL
-mysql> SHOW TABLE STATS stats_test.example_tbl PARTITION (p_201701);
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| row_count | update_rows | healthy | data_size | update_time         | 
last_analyze_time   |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| 4         | 0           | 100     | 2805      | 2023-04-08 11:48:02 | 
2023-04-08 17:43:27 |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
+SHOW COLUMN [cached] STATS table_name [ (column_name [, ...]) ];
 ```
 
-### View Column Statistics
+- `cached`: Displays statistics information from the current FE memory cache.
+- `table_name`: The name of the table for which you want to view column 
statistics information. It can be in the form of `db_name.table_name`.
+- `column_name`: The specific column(s) you want to view statistics for. It 
must be a column that exists in `table_name`, and multiple column names can be 
separated by commas.
 
-`SHOW COLUMN STATS` To view information such as the number of different values 
and `NULL` the number of columns.
+#### Modifying Statistics Information
 
-The syntax is as follows:
+Users can adjust statistics information using the `ALTER` statement.
 
 ```SQL
-SHOW COLUMN [cached] STATS table_name [ (column_name [, ...]) ] [ PARTITION 
(partition_name) ];
-```
-
-Explanation:
-
-- cached: Cached means to show statistics in current FE memory cache.
-- Table_name: The target table for collecting statistics. It can be a 
`db_name.table_name` form.
-- Column_name: Specified destination column. `table_name` Must be a column 
that exists in. Multiple column names are separated by commas.
-- Partition_name: The specified target partition `table_name` must exist in. 
Only one partition can be specified.
-
-Currently `SHOW COLUMN STATS`, 10 columns are output, as follows:
-
-| Column Name     | Explain                               |
-| :-------------- | :------------------------------------ |
-| `column_name`   | Column name                           |
-| `count`         | Total number of rows for the column   |
-| `ndv`           | Number of distinct values             |
-| `num_null`      | The number of null values             |
-| `data_size`     | Total degree of the column in bytes   |
-| `avg_size_byte` | Average degree of the column in bytes |
-| `min`           | Column Minimum                        |
-| `max`           | Column Max Value                      |
-
-Example:
-
-- To view `example_tbl` statistics for all columns of a table, use the 
following syntax:
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl;
-+-----------------+-------+------+----------+-------------------+-------------------+-----------------------+-----------------------+
-| column_name     | count | ndv  | num_null | data_size         | 
avg_size_byte     | min                   | max                   |
-+-----------------+-------+------+----------+-------------------+-------------------+-----------------------+-----------------------+
-| date            | 6.0   | 3.0  | 0.0      | 28.0              | 4.0          
     | '2017-10-01'          | '2017-10-03'          |
-| cost            | 6.0   | 6.0  | 0.0      | 56.0              | 8.0          
     | 2                     | 200                   |
-| min_dwell_time  | 6.0   | 6.0  | 0.0      | 28.0              | 4.0          
     | 2                     | 22                    |
-| city            | 6.0   | 4.0  | 0.0      | 54.0              | 7.0          
     | 'Beijing'             | 'Shenzhen'            |
-| user_id         | 6.0   | 5.0  | 0.0      | 112.0             | 16.0         
     | 10000                 | 10004                 |
-| sex             | 6.0   | 2.0  | 0.0      | 7.0               | 1.0          
     | 0                     | 1                     |
-| max_dwell_time  | 6.0   | 6.0  | 0.0      | 28.0              | 4.0          
     | 3                     | 22                    |
-| last_visit_date | 6.0   | 6.0  | 0.0      | 112.0             | 16.0         
     | '2017-10-01 06:00:00' | '2017-10-03 10:20:22' |
-| age             | 6.0   | 4.0  | 0.0      | 14.0              | 2.0          
     | 20                    | 35                    |
-+-----------------+-------+------+----------+-------------------+-------------------+-----------------------+-----------------------+
-```
-
-- To view `example_tbl` statistics for a table `p_201701` partition, use the 
following syntax:
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl PARTITION (p_201701);
-+-----------------+-------+------+----------+--------------------+-------------------+-----------------------+-----------------------+
-| column_name     | count | ndv  | num_null | data_size          | 
avg_size_byte     | min                   | max                   |
-+-----------------+-------+------+----------+--------------------+-------------------+-----------------------+-----------------------+
-| date            | 3.0   | 1.0  | 0.0      | 16.0               | 4.0         
      | '2017-10-01'          | '2017-10-01'          |
-| cost            | 3.0   | 3.0  | 0.0      | 32.0               | 8.0         
      | 2                     | 100                   |
-| min_dwell_time  | 3.0   | 3.0  | 0.0      | 16.0               | 4.0         
      | 2                     | 22                    |
-| city            | 3.0   | 2.0  | 0.0      | 29.0               | 7.0         
      | 'Beijing'             | 'Shenzhen'            |
-| user_id         | 3.0   | 3.0  | 0.0      | 64.0               | 16.0        
      | 10000                 | 10004                 |
-| sex             | 3.0   | 2.0  | 0.0      | 4.0                | 1.0         
      | 0                     | 1                     |
-| max_dwell_time  | 3.0   | 3.0  | 0.0      | 16.0               | 4.0         
      | 3                     | 22                    |
-| last_visit_date | 3.0   | 3.0  | 0.0      | 64.0               | 16.0        
      | '2017-10-01 06:00:00' | '2017-10-01 17:05:45' |
-| age             | 3.0   | 3.0  | 0.0      | 8.0                | 2.0         
      | 20                    | 35                    |
-+-----------------+-------+------+----------+--------------------+-------------------+-----------------------+-----------------------+
-```
-
-- To view `example_tbl` statistics for a table `city` `age` `sex` column, use 
the following syntax:
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl(city, age, sex);
-+-------------+-------+------+----------+-------------------+-------------------+-----------+------------+
-| column_name | count | ndv  | num_null | data_size         | avg_size_byte    
 | min       | max        |
-+-------------+-------+------+----------+-------------------+-------------------+-----------+------------+
-| city        | 6.0   | 4.0  | 0.0      | 54.0              | 7.0              
 | 'Beijing' | 'Shenzhen' |
-| sex         | 6.0   | 2.0  | 0.0      | 7.0               | 1.0              
 | 0         | 1          |
-| age         | 6.0   | 4.0  | 0.0      | 14.0              | 2.0              
 | 20        | 35         |
-+-------------+-------+------+----------+-------------------+-------------------+-----------+------------+
+ALTER TABLE table_name MODIFY COLUMN column_name SET STATS ('stat_name' = 
'stat_value', ...) [ PARTITION (partition_name) ];
 ```
 
-- To view `example_tbl` statistics for a table `p_201701` partition `city` 
`age` `sex` column, use the following syntax:
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl(city, age, sex) PARTITION 
(p_201701);
-+-------------+-------+------+----------+--------------------+-------------------+-----------+------------+
-| column_name | count | ndv  | num_null | data_size          | avg_size_byte   
  | min       | max        |
-+-------------+-------+------+----------+--------------------+-------------------+-----------+------------+
-| city        | 3.0   | 2.0  | 0.0      | 29.0               | 7.0             
  | 'Beijing' | 'Shenzhen' |
-| sex         | 3.0   | 2.0  | 0.0      | 4.0                | 1.0             
  | 0         | 1          |
-| age         | 3.0   | 3.0  | 0.0      | 8.0                | 2.0             
  | 20        | 35         |
-+-------------+-------+------+----------+--------------------+-------------------+-----------+------------+
-```
+- `table_name`: The name of the table for which you want to modify statistics 
information. It can be in the form of `db_name.table_name`.
+- `column_name`: The specific column for which you want to modify statistics 
information. It must be a column that exists in `table_name`, and you can 
modify statistics information for one column at a time.
+- `stat_name` and `stat_value`: The corresponding statistics information name 
and its value. Multiple statistics can be modified, separated by commas. You 
can modify statistics such as `row_count`, `ndv`, `num_nulls`, `min_value`, 
`max_value`, and `data_size`.
 
-### View column histogram information
+#### Delete Statistics
 
-To `SHOW COLUMN HISTOGRAM` view the information for each bucket of the 
histogram.
+Users can delete statistics using the `DROP` statement, which allows them to 
specify the table, partition, or column for which they want to delete 
statistics based on the provided parameters. When deleted, both column 
statistics and column histogram information are removed.
 
-The syntax is as follows:
+Syntax:
 
 ```SQL
-SHOW COLUMN HISTOGRAM table_name [ (column_name [, ...]) ];
+DROP [ EXPIRED ] STATS [ table_name [ (column_name [, ...]) ] ];
 ```
 
-Explanation:
-
-- Table_name: The table to which the data is imported. It can be a 
`db_name.table_name` form.
-- Column_name: Specified destination column. `table_name` Must be a column 
that exists in. Multiple column names are separated by commas.
-
-Currently `SHOW COLUMN HISTOGRAM`, 5 columns are output, and each bucket 
contains 5 attributes, as follows:
-
-| Column Name   | Explain                                                     |
-| :------------ | :---------------------------------------------------------- |
-| `column_name` | Column name                                                 |
-| `data_type`   | The data type of the column                                 |
-| `sample_rate` | Proportion is adopted. The default is 1 for full collection |
-| `num_buckets` | Number of buckets included                                  |
-| `buckets`     | Details of the bucket (Json format)                         |
-| `lower`       | The lower bound of the barrel                               |
-| `upper`       | The upper bound of the bucket                               |
-| `count`       | Number of elements contained in the bucket                  |
-| `pre_sum`     | Number of all elements in the previous bucket               |
-| `ndv`         | Number of distinct values in the bucket                     |
-
-Example:
+#### Delete Analyze Job
 
-- To view `example_tbl` histogram information for all columns of a table, use 
the following syntax:
+Used to delete automatic/periodic Analyze jobs based on the job ID.
 
-```SQL
-mysql> SHOW COLUMN HISTOGRAM stats_test.example_tbl;
-+-----------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------+
-| column_name     | data_type   | sample_rate | num_buckets | buckets          
                                                                                
             |
-+-----------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------+
-| date            | DATE        | 1.0         | 1           | 
[{"lower_expr":"2017-10-01","upper_expr":"2017-10-03","count":6.0,"pre_sum":0.0,"ndv":3.0}]
                   |
-| cost            | BIGINT      | 1.0         | 1           | 
[{"lower_expr":"2","upper_expr":"200","count":6.0,"pre_sum":0.0,"ndv":6.0}]     
                              |
-| min_dwell_time  | INT         | 1.0         | 1           | 
[{"lower_expr":"2","upper_expr":"22","count":6.0,"pre_sum":0.0,"ndv":6.0}]      
                              |
-| city            | VARCHAR(20) | 1.0         | 1           | 
[{"lower_expr":"Shanghai","upper_expr":"Shenzhen","count":6.0,"pre_sum":0.0,"ndv":4.0}]
                       |
-| user_id         | LARGEINT    | 1.0         | 1           | 
[{"lower_expr":"10000","upper_expr":"10004","count":6.0,"pre_sum":0.0,"ndv":5.0}]
                             |
-| sex             | TINYINT     | 1.0         | 1           | 
[{"lower_expr":"0","upper_expr":"1","count":6.0,"pre_sum":0.0,"ndv":2.0}]       
                              |
-| max_dwell_time  | INT         | 1.0         | 1           | 
[{"lower_expr":"3","upper_expr":"22","count":6.0,"pre_sum":0.0,"ndv":6.0}]      
                              |
-| last_visit_date | DATETIME    | 1.0         | 1           | 
[{"lower_expr":"2017-10-01 06:00:00","upper_expr":"2017-10-03 
10:20:22","count":6.0,"pre_sum":0.0,"ndv":6.0}] |
-| age             | SMALLINT    | 1.0         | 1           | 
[{"lower_expr":"20","upper_expr":"35","count":6.0,"pre_sum":0.0,"ndv":4.0}]     
                              |
-+-----------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------+
+```sql
+DROP ANALYZE JOB [JOB_ID]
 ```
 
-- To view `example_tbl` histogram information for a table `city` `age` `sex` 
column, use the following syntax:
-
-```SQL
-mysql> SHOW COLUMN HISTOGRAM stats_test.example_tbl(city, age, sex);
-+-------------+-------------+-------------+-------------+----------------------------------------------------------------------------------------+
-| column_name | data_type   | sample_rate | num_buckets | buckets              
                                                                  |
-+-------------+-------------+-------------+-------------+----------------------------------------------------------------------------------------+
-| city        | VARCHAR(20) | 1.0         | 1           | 
[{"lower_expr":"Shanghai","upper_expr":"Shenzhen","count":6.0,"pre_sum":0.0,"ndv":4.0}]|
-| sex         | TINYINT     | 1.0         | 1           | 
[{"lower_expr":"0","upper_expr":"1","count":6.0,"pre_sum":0.0,"ndv":2.0}]       
       |
-| age         | SMALLINT    | 1.0         | 1           | 
[{"lower_expr":"20","upper_expr":"35","count":6.0,"pre_sum":0.0,"ndv":4.0}]     
       |
-+-------------+-------------+-------------+-------------+----------------------------------------------------------------------------------------+
-```
+### View Automatic Collection Task Execution Status
 
-Buckets description:
-
-> Buckets for each column are returned in JSON format. Buckets are arranged 
from small to large. Each Bucket contains the upper and lower bounds, the 
number of elements, the NDV of elements, and the number of elements of all 
previous buckets. Where the number of elements in a column (row _ count) = the 
last bucket element number (count) + the number of elements in all previous 
buckets (pre _ sum). The number of rows for the following columns is 17.
-
-```JSON
-[
-    {        "lower_expr": 2,
-        "upper_expr": 7,
-        "count": 6,
-        "pre_sum": 0,
-        "ndv": 6
-    },
-    {
-        "lower_expr": 10,
-        "upper_expr": 20,
-        "count": 11,
-        "pre_sum": 6,
-        "ndv": 11
-    }
-]
-```
+This command is used to check the completion status of automatic collection 
tasks after enabling automatic collection functionality.
 
-## Modify the statistics
-
-Users can modify the statistics information through statements `ALTER`, and 
modify the corresponding statistics information of the column according to the 
provided parameters.
-
-```SQL
-ALTER TABLE table_name MODIFY COLUMN column_name SET STATS ('stat_name' = 
'stat_value', ...) [ PARTITION (partition_name) ];
+```sql
+SHOW AUTO ANALYZE [table_name]
+    [ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
 ```
 
-Explanation:
+Automatic collection tasks do not support viewing the completion status and 
failure reasons for each column individually. By default, it only retains the 
status of the last 20,000 completed automatic collection tasks.
 
-- Table_name: The table to which the statistics are dropped. It can be a 
`db_name.table_name` form.
-- Column_name: Specified target column. `table_name` Must be a column that 
exists in. Statistics can only be modified one column at a time.
-- Stat _ name and stat _ value: The corresponding stat name and the value of 
the stat info. Multiple stats are comma separated. Statistics that can be 
modified include `row_count`, `ndv`, `num_nulls` `min_value` `max_value`, and 
`data_size`.
-- Partition_name: specifies the target partition. Must be a partition existing 
in `table_name`. Multiple partitions are separated by commas.
+## Configuration Options
 
-Example:
+| fe conf option                                                    | comment  
                                                                                
                                                                                
                                                                                
                                           | default value                  |
+|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
+| statistics_sql_parallel_exec_instance_num               | Controls the 
number of concurrent instances/pipeline tasks for each statistics collection 
SQL on the BE side.                                                             
                                                                                
                                                                                
                              | 1                              |
+| statistics_sql_mem_limit_in_bytes                       | Controls the 
amount of BE memory that each statistics collection SQL can use.                
                                                                                
                                                                                
                                                                                
                 | 2L * 1024 * 1024 * 1024 (2GiB) |
+| statistics_simultaneously_running_task_num              | After submitting 
asynchronous jobs using `ANALYZE TABLE[DATABASE]`, this parameter limits the 
number of columns that can be analyzed simultaneously. All asynchronous tasks 
are collectively constrained by this parameter.                                 
                                                                                
                                                                                
                     [...]
+| analyze_task_timeout_in_minutes                         | Timeout for 
AnalyzeTask execution.                                                          
                                                                                
                                                                                
                                                         | 12 hours             
          |
+| stats_cache_size| The actual memory usage of statistics cache depends 
heavily on the characteristics of the data because the average size of 
maximum/minimum values and the number of buckets in histograms can vary greatly 
in different datasets and scenarios. Additionally, factors like JVM versions 
can also affect it. Below is the memory size occupied by statistics cache with 
100,000 items. The average length of maximum/minimum values per item is 32, the 
average length of column names is [...]
+|enable_auto_sample|Enable automatic sampling for large tables. When enabled, 
statistics will be automatically collected through sampling for tables larger 
than the `huge_table_lower_bound_size_in_bytes` threshold.| false|
+|auto_analyze_job_record_count|Controls the persistence of records for 
automatically triggered statistics collection jobs.|20000|
+|huge_table_default_sample_rows|Defines the number of sample rows for large 
tables when automatic sampling is enabled.|4194304|
+|huge_table_lower_bound_size_in_bytes|Defines the lower size threshold for 
large tables. When `enable_auto_sample` is enabled, statistics will be 
automatically collected through sampling for tables larger than this 
value.|5368 709120|
+|huge_table_auto_analyze_interval_in_millis|Controls the minimum time interval 
for automatic ANALYZE on large tables. Within this interval, tables larger than 
`huge_table_lower_bound_size_in_bytes` will only be analyzed once.|43200000|
+|table_stats_health_threshold|Takes a value between 0-100. When the data 
update volume reaches (100 - table_stats_health_threshold)% since the last 
statistics collection operation, the statistics for the table are considered 
outdated.|80|
 
-- To modify `example_tbl` table `age` column `row_count` statistics, use the 
following syntax:
+|Session Variable|Description|Default Value|
+|---|---|---|
+|full_auto_analyze_start_time|Start time for automatic statistics 
collection|00:00:00|
+|full_auto_analyze_end_time|End time for automatic statistics 
collection|02:00:00|
+|enable_full_auto_analyze|Enable automatic collection functionality|true|
 
-```SQL
-mysql> ALTER TABLE stats_test.example_tbl MODIFY COLUMN age SET STATS 
('row_count'='6001215');
-mysql> SHOW COLUMN STATS stats_test.example_tbl(age);
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| column_name | count     | ndv  | num_null | data_size | avg_size_byte | min  
| max  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| age         | 6001215.0 | 0.0  | 0.0      | 0.0       | 0.0           | N/A  
| N/A  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-```
+ATTENTION: The session variables listed above must be set globally using SET 
GLOBAL.
 
-- Modify `example_tbl` table `age` columns `row_count`, `num_nulls`, 
`data_size` statistics, using the following syntax:
+## Usage Recommendations
 
-```SQL
-mysql> ALTER TABLE stats_test.example_tbl MODIFY COLUMN age SET STATS 
('row_count'='6001215', 'num_nulls'='2023', 'data_size'='600121522');
-mysql> SHOW COLUMN STATS stats_test.example_tbl(age);
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| column_name | count     | ndv  | num_null | data_size | avg_size_byte | min  
| max  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| age         | 6001215.0 | 0.0  | 2023.0   | 600121522 | 0.0           | N/A  
| N/A  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-```
+Based on our testing, on tables with data size (i.e., actual storage space) 
below 128GiB, there is usually no need to modify the default configuration 
settings unless it is necessary to avoid resource contention during peak 
business hours by adjusting the execution time of the automatic collection 
feature.
 
-## Delete statistics
+Depending on the cluster configuration, automatic collection tasks typically 
consume around 20% of CPU resources. Therefore, users should adjust the 
execution time of the automatic collection feature to avoid resource contention 
during peak business hours, depending on their specific business needs.
 
-The user deletes the statistics for the specified table, partition, or column 
based on the supplied parameters through the delete statistics statement 
`DROP`. Both column statistics and column histogram information are deleted.
+Since ANALYZE is a resource-intensive operation, it is best to avoid executing 
such operations during peak business hours to prevent disruption to the 
business. Additionally, in cases of high cluster load, ANALYZE operations are 
more likely to fail. Furthermore, it is advisable to avoid performing full 
ANALYZE on the entire database or table. Typically, it is sufficient to perform 
ANALYZE on columns that are frequently used as predicate conditions, in JOIN 
conditions, as aggregation fiel [...]
 
-Grammar
+* Performing ANALYZE on the columns involved in complex queries before 
submitting the complex query, as poorly planned complex queries can consume a 
significant amount of system resources and may lead to resource exhaustion or 
timeouts.
+* If you have configured periodic data import routines for Doris, it is 
recommended to execute ANALYZE after the data import is complete to ensure that 
subsequent query planning can use the most up-to-date statistics. You can 
automate this setting using Doris's existing job scheduling framework.
+* When significant changes occur in the table's data, such as creating a new 
table and completing data import, it is recommended to run ANALYZE on the 
corresponding table.
 
-```SQL
-DROP [ EXPIRED ] STATS [ table_name [ (column_name [, ...]) ] ];
-```
-
-Explanation:
+## Common Issues
 
-- Table_name: The table to which you want to delete the statistics. It can be 
a `db_name.table_name` form.
-- Column_name: The specified target column. Must be `table_name` a column that 
exists in. Multiple column names are separated by commas.
-- Expired: statistics cleanup. Table cannot be specified. Invalid statistics 
and out-of-date statistics jobs information in the system will be deleted.
+### ANALYZE WITH SYNC Execution Failed: Failed to analyze following columns...
 
-Example:
+The SQL execution time is controlled by the `query_timeout` session variable, 
which has a default value of 300 seconds. Statements like `ANALYZE 
DATABASE/TABLE` often take longer, easily exceeding this time limit and being 
canceled. It is recommended to increase the value of `query_timeout` based on 
the data volume of the ANALYZE object.
 
-- Clean up statistics, using the following syntax:
+### ANALYZE Submission Error: Stats table not available...
 
-```SQL
-mysql> DROP EXPIRED STATS;
-```
+When ANALYZE is executed, statistics data is written to the internal table 
`__internal_schema.column_statistics`. FE checks the tablet status of this 
table before executing ANALYZE. If there are unavailable tablets, the task is 
rejected. Please check the BE cluster status if this error occurs.
 
-- To delete `example_tbl` statistics for a table, use the following syntax:
+Users can use `SHOW BACKENDS\G` to verify the BE (Backend) status. If the BE 
status is normal, you can use the command `ADMIN SHOW REPLICA STATUS FROM 
__internal_schema.[tbl_in_this_db]` to check the tablet status within this 
database, ensuring that the tablet status is also normal.
 
-```SQL
-mysql> DROP STATS stats_test.example_tbl;
-```
-
-- To delete `example_tbl` statistics for a table `city`, `age` `sex` column, 
use the following syntax:
-
-```SQL
-mysql> DROP STATS stats_test.example_tbl(city, age, sex);
-```
+### Failure of ANALYZE on Large Tables
 
-## ANALYZE configuration item
+Due to resource limitations, ANALYZE on some large tables may timeout or 
exceed BE memory limits. In such cases, it is recommended to use `ANALYZE ... 
WITH SAMPLE...`. 
 
-To be added.
diff --git a/docs/zh-CN/docs/query-acceleration/statistics.md 
b/docs/zh-CN/docs/query-acceleration/statistics.md
index 0362f989c1e..d9aac9b6780 100644
--- a/docs/zh-CN/docs/query-acceleration/statistics.md
+++ b/docs/zh-CN/docs/query-acceleration/statistics.md
@@ -26,57 +26,28 @@ under the License.
 
 # 统计信息
 
-## 统计信息简介
 
-在 SQL 
数据库中，查询优化器的质量对系统性能有重要影响。优化器需要依据统计信息来估算查询代价，尤其在等值查询场景中，精确估算基数是非常重要的，基数估算可以帮助优化器选择最优的查询计划，从而提高查询性能。
+通过收集统计信息有助于优化器了解数据分布特性，在进行CBO（基于成本优化）时优化器会利用这些统计信息来计算谓词的选择性，并估算每个执行计划的成本。从而选择更优的计划以大幅提升查询效率。
 
-在执行一个查询时，未经过充分优化的执行计划和经过优化的执行计划可能会导致执行时间上的巨大差异，这个差距可以高达数倍。因此，对于 SQL 
查询优化器来说，收集和分析统计信息是非常重要的，只有通过这些统计信息，优化器才能够准确地评估不同执行计划的成本，并选择最佳的执行计划。
-
-Doris 查询优化器使用统计信息来确定查询最有效的执行计划。Doris 维护的统计信息包括表级别的统计信息和列级别的统计信息。
-
-表统计信息：
-
-| 信息                | 描述                       |
-| :------------------ | :------------------------- |
-| `row_count`         | 表的行数                   |
-| `data_size`         | 表的⼤⼩（单位 byte）      |
-| `update_rows`       | 收集统计信息后所更新的行数 |
-| `healthy`           | 表的健康度                 |
-| `update_time`       | 最近更新的时间             |
-| `last_analyze_time` | 上次收集统计信息的时间     |
-
-> 表的健康度：表示表统计信息的健康程度。当 `update_rows` 大于等于 `row_count` 时，健康度为 0；当 `update_rows` 
小于 `row_count` 时，健康度为 `100 * (1 - update_rows / row_count)` 。
-
-列统计信息：
+当前收集列的如下信息：
 
 | 信息            | 描述                       |
 | :-------------- | :------------------------- |
-| `row_count`     | 列的总行数                 |
-| `data_size`     | 列的总⻓度（单位 byte）    |
-| `avg_size_byte` | 列的平均⻓度（单位 bytes） |
-| `ndv`           | 列 num distinct value      |
-| `min`           | 列最小值                   |
-| `max`           | 列最⼤值                   |
-| `null_count`    | 列 null 个数               |
-| `histogram`     | 列直方图                   |
-
-接下来将简单介绍其中出现的直方图等数据结构，以及详细介绍统计信息的收集和维护。
-
-## 直方图简介
-
-直方图（histogram）是一种用于描述数据分布情况的工具，它将数据根据大小分成若干个区间（桶），并使用简单的统计量来表示每个区间中数据的特征。是数据库中的一种重要的统计信息，可以描述列中的数据分布情况。直方图最典型的应用场景是通过估算查询谓词的选择率来帮助优化器选择最优的执行计划。
-
-在 Doris 中，会对每个表具体的列构建一个等高直方图（Equi-height Histogram）。直方图包括一系列桶， 
其中每个桶的统计量包括桶的上下界、桶包含的元素数量、前面桶所有元素的数量、桶中不同值的个数。具体可以参考 SQL 函数 `histogram` 或者 
`hist` 的使用说明。
-
-> 采用等高直方图的分桶方法，每个桶中的数值频次之和都应该接近于总行数的 
`1/N`。但是，如果严格遵守等高原则进行分桶，会出现某些值落在桶的边界上的情况，导致同一个值出现在两个不同的桶中。这种情况会对选择率的估算造成干扰。因此，在实现中，Doris
 对等高直方图的分桶方法进行了修改：如果将一个值加入到某个桶中导致该桶中的数据频次超过了总行数的 `1/N`，则根据哪种情况更接近 
`1/N`，将该值放入该桶或下一个桶中。
+| `row_count`     | 总行数                 |
+| `data_size`     | 总数据量    |
+| `avg_size_byte` | 值的平均⻓度 |
+| `ndv`           | 不同值数量      |
+| `min`           | 最小值                   |
+| `max`           | 最⼤值                   |
+| `null_count`    | 空值数量               |
 
 ## 收集统计信息
 
-### 手动收集
+### 使用ANALYZE语句
 
-⽤户通过 `ANALYZE` 语句触发手动收集任务，根据提供的参数，收集指定的表或列的统计信息。
+Doris支持用户通过提交ANALYZE语句来触发统计信息的收集和更新。
 
-列统计信息收集语法：
+语法：
 
 ```SQL
 ANALYZE < TABLE | DATABASE table_name | db_name > 
@@ -89,217 +60,24 @@ ANALYZE < TABLE | DATABASE table_name | db_name >
 其中：
 
 - table_name: 指定的的目标表。可以是  `db_name.table_name`  形式。
+- partition_name: 指定的目标分区（目前只针对Hive外表）。必须是  `table_name`  
中存在的分区，多个列名称用逗号分隔。分区名样例: 
单层分区PARTITIONS(`event_date=20230706`)，多层分区PARTITIONS(`nation=CN/city=Beijing`)。PARTITIONS
 (*)指定所有分区，PARTITIONS WITH RECENT 100指定最新的100个分区。
 - column_name: 指定的目标列。必须是  `table_name`  中存在的列，多个列名称用逗号分隔。
 - sync：同步收集统计信息。收集完后返回。若不指定则异步执行并返回任务 ID。
-- incremental：增量收集统计信息。不支持增量收集直方图统计信息。
-- period：周期性收集统计信息。单位为秒，指定后会定期收集相应的统计信息。
 - sample percent | rows：抽样收集统计信息。可以指定抽样比例或者抽样行数。
 - 
sql：执行sql来收集外表分区列统计信息。默认从元数据收集分区列信息，这样效率比较高但是行数和数据量大小可能不准。用户可以指定使用sql来收集，这样可以收集到准确的分区列信息。
 
-增量收集适合类似时间列这样的单调不减列作为分区的表，或者历史分区数据不会更新的表。
-
-注意：
-
-- 直方图统计信息不支持增量收集。
-- 使用增量收集时，必须保证表存量的统计信息可用（即其他历史分区数据不发生变化），否则会导致统计信息有误差。
-
-示例：
-
-- 增量收集 `example_tbl` 表的统计信息，使用以下语法：
-
-```SQL
--- 使用with incremental
-mysql> ANALYZE TABLE stats_test.example_tbl WITH INCREMENTAL;
-+--------+
-| job_id |
-+--------+
-| 51910  |
-+--------+
-
--- 配置incremental
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("incremental" = "true");
-+--------+
-| job_id |
-+--------+
-| 51910  |
-+--------+
-```
-
-- 增量收集 `example_tbl` 表 `city`, `age`, `sex` 列的统计信息，使用以下语法：
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl(city, age, sex) WITH INCREMENTAL;
-+--------+
-| job_id |
-+--------+
-| 51988  |
-+--------+
-```
-
-#### 抽样收集
-
-在表数据量较大时，系统收集统计信息可能会比较耗时，可以使用抽样收集来提高统计信息收集的速度。根据实际情况指定抽样的比例或者抽样的行数。
-
-示例：
-
-- 抽样收集 `example_tbl` 表的统计信息，使用以下语法：
-
-```SQL
--- 使用with sample rows抽样行数
-mysql> ANALYZE TABLE stats_test.example_tbl WITH SAMPLE ROWS 5;
-+--------+
-| job_id |
-+--------+
-| 52120  |
-+--------+
-
--- 使用with sample percent抽样比例
-mysql> ANALYZE TABLE stats_test.example_tbl WITH SAMPLE PERCENT 50;
-+--------+
-| job_id |
-+--------+
-| 52201  |
-+--------+
-
--- 配置sample.row抽样行数
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("sample.rows" = "5");
-+--------+
-| job_id |
-+--------+
-| 52279  |
-+--------+
-
--- 配置sample.percent抽样比例
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("sample.percent" = 
"50");
-+--------+
-| job_id |
-+--------+
-| 52282  |
-+--------+
-```
-
-- 抽样收集 `example_tbl` 表的直方图信息，与普通统计信息类似，使用以下语法：
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH SAMPLE ROWS 
5;
-+--------+
-| job_id |
-+--------+
-| 52357  |
-+--------+
-```
-
-#### 同步收集
-
-一般执行 `ANALYZE` 语句后系统会启动异步任务去收集统计信息并立刻返回统计任务 ID。如果想要等待统计信息收集结束后返会，可以使用同步收集方式。
-
-示例：
-
-- 抽样收集 `example_tbl` 表的统计信息，使用以下语法：
-
-```SQL
--- 使用with sync
-mysql> ANALYZE TABLE stats_test.example_tbl WITH SYNC;
-
--- 配置sync
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("sync" = "true");
-```
-
-- 抽样收集 `example_tbl` 表的直方图信息，与普通统计信息类似，使用以下语法：
-
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH SYNC;
-```
 
 ### 自动收集
 
-自动收集是指用户在执行 `ANALYZE` 语句时，指定 `PERIOD` 或者 `AUTO` 
关键字或者进行相关配置时，系统后续将自动生成任务，进行统计信息的收集。
-
-#### 周期性收集
-
-周期性收集是指在一定时间间隔内，重新收集表相应的统计信息。
-
-示例：
-
-- 周期性（每隔一天）收集 `example_tbl` 表的统计信息，使用以下语法：
-
-```SQL
--- 使用with period
-mysql> ANALYZE TABLE stats_test.example_tbl WITH PERIOD 86400;
-+--------+
-| job_id |
-+--------+
-| 52409  |
-+--------+
-
--- 配置period.seconds
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("period.seconds" = 
"86400");
-+--------+
-| job_id |
-+--------+
-| 52535  |
-+--------+
-```
-
-- 周期性（每隔一天）收集 `example_tbl` 表的直方图信息，与普通统计信息类似，使用以下语法：
+用户可以通过设置FE配置项`enable_full_auto_analyze = 
true`来启用本功能。开启后，将在指定的时间段内自动收集满足条件的库表上的统计信息。用户可以通过设置参数`full_auto_analyze_start_time`（默认为00:00:00）和参数`full_auto_analyze_end_time`（默认为02:00:00）来指定自动收集的时间段。
 
-```SQL
-mysql> ANALYZE TABLE stats_test.example_tbl UPDATE HISTOGRAM WITH PERIOD 86400;
-+--------+
-| job_id |
-+--------+
-| 52684  |
-+--------+
-```
-
-#### 自动收集
-
-表发生变更时可能会导致统计信息“失效”，可能会导致优化器选择错误的执行计划。
-
-导致表统计信息失效的原因包括：
-
-- 新增字段：新增字段无统计信息
-- 字段变更：原有统计信息不可用
-- 新增分区：新增分区无统计信息
-- 分区变更：原有统计信息失效
-- 数据变更(插入数据 | 删除数据 | 更改数据)：统计信息有误差
-
-主要涉及的操作包括：
-
-- update：更新数据
-- delete：删除数据
-- drop：删除分区
-- load：导入数据、新增分区
-- insert：插入数据、新增分区
-- alter：字段变更、分区变更、新增分区
-
-其中库、表、分区、字段删除，内部会自动清除这些无效的统计信息。调整列顺序以及修改列类型不影响。
-
-系统根据表的健康度（参考上文定义）来决定是否需要重新收集统计信息。我们通过设置健康度阈值，当健康度低于某个值时系统将重新收集表对应的统计信息。简单来讲就是对于收集过统计信息的表，如果某一个分区数据变多/变少、或者新增/删除分区，都有可能触发统计信息的自动收集，重新收集后更新表的统计信息和健康度。目前只会收集用户配置了自动收集统计信息的表，其他表不会自动收集统计信息。
-
-示例：
+此功能仅对没有统计信息或者统计信息过时的库表进行收集。当一个表的数据更新了20%（该值可通过参数`table_stats_health_threshold`（默认为80）配置）以上时，Doris会认为该表的统计信息已经过时。
 
-- 自动收集 `example_tbl` 表的统计信息，使用以下语法：
+对于数据量较大（默认为5GiB）的表，Doris会自动采取采样的方式去收集，以尽可能降低对系统造成的负担并尽快完成收集作业，用户可通过设置FE参数`huge_table_lower_bound_size_in_bytes`来调节此行为。如果希望对所有的表都采取全量收集，可配置FE参数`enable_auto_sample`为false。同时对于数据量大于`huge_table_lower_bound_size_in_bytes`的表，Doris保证其收集时间间隔不小于12小时（该时间可通过FE参数`huge_table_auto_analyze_interval_in_millis`控制）。
 
-```SQL
--- 使用with auto
-mysql> ANALYZE TABLE stats_test.example_tbl WITH AUTO;
-+--------+
-| job_id |
-+--------+
-| 52539  |
-+--------+
-
--- 配置automatic
-mysql> ANALYZE TABLE stats_test.example_tbl PROPERTIES("automatic" = "true");
-+--------+
-| job_id |
-+--------+
-| 52565  |
-+--------+
-```
+自动采样默认采样4194304(2^22)行，但由于实现方式的原因实际采样数可能大于该值。如果希望采样更多的行以获得更准确的数据分布信息，可通过FE参数`huge_table_default_sample_rows`配置。
 
-### 管理任务
+### 任务管理
 
 #### 查看统计任务
 
@@ -308,16 +86,14 @@ mysql> ANALYZE TABLE stats_test.example_tbl 
PROPERTIES("automatic" = "true");
 语法如下：
 
 ```SQL
-SHOW ANALYZE [ table_name | job_id ]
+SHOW ANALYZE < table_name | job_id >
     [ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
 ```
 
-其中：
-
 - table_name：表名，指定后可查看该表对应的统计任务信息。可以是  `db_name.table_name`  形式。不指定时返回所有统计任务信息。
 - job_id：统计信息任务 ID，执行 `ANALYZE` 非同步收集统计信息时所返回的值。不指定时返回所有统计任务信息。
 
-目前 `SHOW ANALYZE` 会输出 11 列，具体如下：
+输出：
 
 | 列名                   | 说明         |
 | :--------------------- | :----------- |
@@ -333,21 +109,6 @@ SHOW ANALYZE [ table_name | job_id ]
 | `state`                | 任务状态     |
 | `schedule_type`        | 调度方式     |
 
-> 在系统中，统计信息任务包含多个子任务，每个子任务单独收集一列的统计信息。
-
-示例：
-
-- 查看 ID 为 `20038` 的统计任务信息，使用以下语法：
-
-```SQL
-mysql> SHOW ANALYZE 20038 
-+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
-| job_id | catalog_name | db_name              | tbl_name | col_name           
   | job_type | analysis_type | message | last_exec_time_in_ms | state    | 
schedule_type |
-+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
-| 20038  | internal     | default_cluster:test | t3       | 
[col4,col2,col3,col1] | MANUAL   | FUNDAMENTALS  |         | 2023-06-01 
17:22:15  | FINISHED | ONCE          |
-+--------+--------------+----------------------+----------+-----------------------+----------+---------------+---------+----------------------+----------+---------------+
-
-```
 
 可通过`SHOW ANALYZE TASK STATUS [job_id]`，查看具体每个列统计信息的收集完成情况。
 
@@ -365,49 +126,6 @@ mysql> show analyze task status  20038 ;
 
 ```
 
-- 查看 `example_tbl` 表的的统计任务信息，使用以下语法：
-
-```SQL
-mysql> SHOW ANALYZE stats_test.example_tbl;
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-| job_id | catalog_name | db_name                    | tbl_name    | col_name  
      | job_type | analysis_type | message | last_exec_time_in_ms | state    | 
schedule_type |
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-| 68603  | internal     | default_cluster:stats_test | example_tbl |           
      | MANUAL   | INDEX         |         | 2023-05-05 17:53:27  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | 
last_visit_date | MANUAL   | COLUMN        |         | 2023-05-05 17:53:26  | 
FINISHED | ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | age       
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:27  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | city      
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:25  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | cost      
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:27  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | 
min_dwell_time  | MANUAL   | COLUMN        |         | 2023-05-05 17:53:24  | 
FINISHED | ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | date      
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:27  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | user_id   
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:25  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | 
max_dwell_time  | MANUAL   | COLUMN        |         | 2023-05-05 17:53:26  | 
FINISHED | ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | sex       
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:26  | FINISHED | 
ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | user_id   
      | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:11  | FINISHED | 
ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | sex       
      | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:09  | FINISHED | 
ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | 
last_visit_date | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:10  | 
FINISHED | ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | date      
      | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:10  | FINISHED | 
ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | cost      
      | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:10  | FINISHED | 
ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | age       
      | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:10  | FINISHED | 
ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | 
min_dwell_time  | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:10  | 
FINISHED | ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | 
max_dwell_time  | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:09  | 
FINISHED | ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl |           
      | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:11  | FINISHED | 
ONCE          |
-| 68678  | internal     | default_cluster:stats_test | example_tbl | city      
      | MANUAL   | HISTOGRAM     |         | 2023-05-05 18:00:11  | FINISHED | 
ONCE          |
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-```
-
-- 查看所有的统计任务信息，并按照上次完成时间降序，返回前 3 条信息，使用以下语法：
-
-```SQL
-mysql> SHOW ANALYZE WHERE state = "FINISHED" ORDER BY last_exec_time_in_ms 
DESC LIMIT 3;
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-| job_id | catalog_name | db_name                    | tbl_name    | col_name  
      | job_type | analysis_type | message | last_exec_time_in_ms | state    | 
schedule_type |
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-| 68603  | internal     | default_cluster:stats_test | example_tbl | age       
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:27  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | sex       
      | MANUAL   | COLUMN        |         | 2023-05-05 17:53:26  | FINISHED | 
ONCE          |
-| 68603  | internal     | default_cluster:stats_test | example_tbl | 
last_visit_date | MANUAL   | COLUMN        |         | 2023-05-05 17:53:26  | 
FINISHED | ONCE          |
-+--------+--------------+----------------------------+-------------+-----------------+----------+---------------+---------+----------------------+----------+---------------+
-```
-
 #### 终止统计任务
 
 通过 `KILL ANALYZE` 来终止正在运行的统计任务。
@@ -430,68 +148,43 @@ KILL ANALYZE job_id;
 mysql> KILL ANALYZE 52357;
 ```
 
-## 查看统计信息
+#### 查看统计信息
 
-### 表统计信息
+#### 表统计信息
 
-> 暂不可用。
 
-通过 `SHOW TABLE STATS` 来查看表的总行数以及统计信息健康度等信息。
+通过 `SHOW TABLE STATS` 表的统计信息收集概况。
 
 语法如下：
 
 ```SQL
-SHOW TABLE STATS table_name [ PARTITION (partition_name) ];
+SHOW TABLE STATS table_name;
 ```
 
 其中：
 
 - table_name: 导入数据的目标表。可以是  `db_name.table_name`  形式。
-- partition_name: 指定的目标分区。必须是  `table_name`  中存在的分区，只能指定一个分区。
 
-目前 `SHOW TABLE STATS` 会输出 6 列，具体如下：
+输出：
 
 | 列名                | 说明                   |
 | :------------------ | :--------------------- |
-| `row_count`         | 行数                   |
-| `update_rows`       | 更新的行数             |
-| `data_size`         | 数据大小。单位 byte    |
-| `healthy`           | 健康度                 |
-| `update_time`       | 更新时间               |
-| `last_analyze_time` | 上次收集统计信息的时间 |
-
-示例：
+| `row_count`         | 行数（不反映命令执行时的准确行数）                  |
+| `method`       | 收集方式（全量/采样）            |
+| `type`         | 统计数据的类型    |
+| `updated_time`           | 上次更新时间                 |
+| `columns`       | 收集过统计信息的列               |
+| `trigger` | 统计信息收集触发方式（系统自动触发/用户触发） |
 
-- 查看 `example_tbl` 表的统计信息，使用以下语法：
 
-```SQL
-mysql> SHOW TABLE STATS stats_test.example_tbl;
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| row_count | update_rows | healthy | data_size | update_time         | 
last_analyze_time   |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| 8         | 0           | 100     | 6999      | 2023-04-08 15:40:47 | 
2023-04-08 17:43:28 |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-```
-
-- 查看 `example_tbl` 表 `p_201701` 分区的统计信息，使用以下语法：
-
-```SQL
-mysql> SHOW TABLE STATS stats_test.example_tbl PARTITION (p_201701);
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| row_count | update_rows | healthy | data_size | update_time         | 
last_analyze_time   |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-| 4         | 0           | 100     | 2805      | 2023-04-08 11:48:02 | 
2023-04-08 17:43:27 |
-+-----------+-------------+---------+-----------+---------------------+---------------------+
-```
-
-### 查看列统计信息
+#### 查看列统计信息
 
 通过 `SHOW COLUMN STATS` 来查看列的不同值数以及 `NULL` 数量等信息。
 
 语法如下：
 
 ```SQL
-SHOW COLUMN [cached] STATS table_name [ (column_name [, ...]) ] [ PARTITION 
(partition_name) ];
+SHOW COLUMN [cached] STATS table_name [ (column_name [, ...]) ];
 ```
 
 其中：
@@ -499,177 +192,10 @@ SHOW COLUMN [cached] STATS table_name [ (column_name [, 
...]) ] [ PARTITION (par
 - cached: 展示当前FE内存缓存中的统计信息。
 - table_name: 收集统计信息的目标表。可以是  `db_name.table_name`  形式。
 - column_name: 指定的目标列，必须是  `table_name`  中存在的列，多个列名称用逗号分隔。
-- partition_name: 指定的目标分区，必须是  `table_name`  中存在的分区，只能指定一个分区。
-
-目前 `SHOW COLUMN STATS` 会输出 10 列，具体如下：
-
-| 列名            | 说明                       |
-| :-------------- | :------------------------- |
-| `column_name`   | 列名称                     |
-| `count`         | 列的总行数                 |
-| `ndv`           | 不同值的个数               |
-| `num_null`      | 空值的个数                 |
-| `data_size`     | 列的总⻓度（单位 bytes）   |
-| `avg_size_byte` | 列的平均⻓度（单位 bytes） |
-| `min`           | 列最小值                   |
-| `max`           | 列最⼤值                   |
-
-示例：
-
-- 查看 `example_tbl` 表所有列的统计信息，使用以下语法：
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl;
-+-----------------+-------+------+----------+-------------------+-------------------+-----------------------+-----------------------+
-| column_name     | count | ndv  | num_null | data_size         | 
avg_size_byte     | min                   | max                   |
-+-----------------+-------+------+----------+-------------------+-------------------+-----------------------+-----------------------+
-| date            | 6.0   | 3.0  | 0.0      | 28.0              | 4.0          
     | '2017-10-01'          | '2017-10-03'          |
-| cost            | 6.0   | 6.0  | 0.0      | 56.0              | 8.0          
     | 2                     | 200                   |
-| min_dwell_time  | 6.0   | 6.0  | 0.0      | 28.0              | 4.0          
     | 2                     | 22                    |
-| city            | 6.0   | 4.0  | 0.0      | 54.0              | 7.0          
     | 'Beijing'             | 'Shenzhen'            |
-| user_id         | 6.0   | 5.0  | 0.0      | 112.0             | 16.0         
     | 10000                 | 10004                 |
-| sex             | 6.0   | 2.0  | 0.0      | 7.0               | 1.0          
     | 0                     | 1                     |
-| max_dwell_time  | 6.0   | 6.0  | 0.0      | 28.0              | 4.0          
     | 3                     | 22                    |
-| last_visit_date | 6.0   | 6.0  | 0.0      | 112.0             | 16.0         
     | '2017-10-01 06:00:00' | '2017-10-03 10:20:22' |
-| age             | 6.0   | 4.0  | 0.0      | 14.0              | 2.0          
     | 20                    | 35                    |
-+-----------------+-------+------+----------+-------------------+-------------------+-----------------------+-----------------------+
-```
-
-- 查看 `example_tbl` 表 `p_201701` 分区的统计信息，使用以下语法：
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl PARTITION (p_201701);
-+-----------------+-------+------+----------+--------------------+-------------------+-----------------------+-----------------------+
-| column_name     | count | ndv  | num_null | data_size          | 
avg_size_byte     | min                   | max                   |
-+-----------------+-------+------+----------+--------------------+-------------------+-----------------------+-----------------------+
-| date            | 3.0   | 1.0  | 0.0      | 16.0               | 4.0         
      | '2017-10-01'          | '2017-10-01'          |
-| cost            | 3.0   | 3.0  | 0.0      | 32.0               | 8.0         
      | 2                     | 100                   |
-| min_dwell_time  | 3.0   | 3.0  | 0.0      | 16.0               | 4.0         
      | 2                     | 22                    |
-| city            | 3.0   | 2.0  | 0.0      | 29.0               | 7.0         
      | 'Beijing'             | 'Shenzhen'            |
-| user_id         | 3.0   | 3.0  | 0.0      | 64.0               | 16.0        
      | 10000                 | 10004                 |
-| sex             | 3.0   | 2.0  | 0.0      | 4.0                | 1.0         
      | 0                     | 1                     |
-| max_dwell_time  | 3.0   | 3.0  | 0.0      | 16.0               | 4.0         
      | 3                     | 22                    |
-| last_visit_date | 3.0   | 3.0  | 0.0      | 64.0               | 16.0        
      | '2017-10-01 06:00:00' | '2017-10-01 17:05:45' |
-| age             | 3.0   | 3.0  | 0.0      | 8.0                | 2.0         
      | 20                    | 35                    |
-+-----------------+-------+------+----------+--------------------+-------------------+-----------------------+-----------------------+
-```
-
-- 查看 `example_tbl` 表 `city`, `age`, `sex` 列的统计信息，使用以下语法：
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl(city, age, sex);
-+-------------+-------+------+----------+-------------------+-------------------+-----------+------------+
-| column_name | count | ndv  | num_null | data_size         | avg_size_byte    
 | min       | max        |
-+-------------+-------+------+----------+-------------------+-------------------+-----------+------------+
-| city        | 6.0   | 4.0  | 0.0      | 54.0              | 7.0              
 | 'Beijing' | 'Shenzhen' |
-| sex         | 6.0   | 2.0  | 0.0      | 7.0               | 1.0              
 | 0         | 1          |
-| age         | 6.0   | 4.0  | 0.0      | 14.0              | 2.0              
 | 20        | 35         |
-+-------------+-------+------+----------+-------------------+-------------------+-----------+------------+
-```
-
-- 查看 `example_tbl` 表 `p_201701` 分区 `city`, `age`, `sex` 列的统计信息，使用以下语法：
-
-```SQL
-mysql> SHOW COLUMN STATS stats_test.example_tbl(city, age, sex) PARTITION 
(p_201701);
-+-------------+-------+------+----------+--------------------+-------------------+-----------+------------+
-| column_name | count | ndv  | num_null | data_size          | avg_size_byte   
  | min       | max        |
-+-------------+-------+------+----------+--------------------+-------------------+-----------+------------+
-| city        | 3.0   | 2.0  | 0.0      | 29.0               | 7.0             
  | 'Beijing' | 'Shenzhen' |
-| sex         | 3.0   | 2.0  | 0.0      | 4.0                | 1.0             
  | 0         | 1          |
-| age         | 3.0   | 3.0  | 0.0      | 8.0                | 2.0             
  | 20        | 35         |
-+-------------+-------+------+----------+--------------------+-------------------+-----------+------------+
-```
-
-### 查看列直方图信息
-
-通过 `SHOW COLUMN HISTOGRAM` 来查看直方图每个桶的信息。
-
-语法如下：
-
-```SQL
-SHOW COLUMN HISTOGRAM table_name [ (column_name [, ...]) ];
-```
-
-其中：
-
-- table_name: 导入数据的目标表。可以是  `db_name.table_name`  形式。
-- column_name: 指定的目标列，必须是  `table_name`  中存在的列，多个列名称用逗号分隔。
 
-目前 `SHOW COLUMN HISTOGRAM` 会输出 5 列，其中每个 bucket 又包含 5 个属性，具体如下：
-
-| 列名          | 说明                         |
-| :------------ | :--------------------------- |
-| `column_name` | 列名称                       |
-| `data_type`   | 列的数据类型                 |
-| `sample_rate` | 采用比例。默认全量收集时为 1 |
-| `num_buckets` | 包含的桶数量                 |
-| `buckets`     | 桶的详细信息（Json 格式）    |
-| `lower`       | 桶的下界                     |
-| `upper`       | 桶的上界                     |
-| `count`       | 桶包含的元素数量             |
-| `pre_sum`     | 前面桶所有元素的数量         |
-| `ndv`         | 桶中不同值的个数             |
+#### 修改统计信息
 
-示例：
-
-- 查看 `example_tbl` 表所有列的直方图信息，使用以下语法：
-
-```SQL
-mysql> SHOW COLUMN HISTOGRAM stats_test.example_tbl;
-+-----------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------+
-| column_name     | data_type   | sample_rate | num_buckets | buckets          
                                                                                
             |
-+-----------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------+
-| date            | DATE        | 1.0         | 1           | 
[{"lower_expr":"2017-10-01","upper_expr":"2017-10-03","count":6.0,"pre_sum":0.0,"ndv":3.0}]
                   |
-| cost            | BIGINT      | 1.0         | 1           | 
[{"lower_expr":"2","upper_expr":"200","count":6.0,"pre_sum":0.0,"ndv":6.0}]     
                              |
-| min_dwell_time  | INT         | 1.0         | 1           | 
[{"lower_expr":"2","upper_expr":"22","count":6.0,"pre_sum":0.0,"ndv":6.0}]      
                              |
-| city            | VARCHAR(20) | 1.0         | 1           | 
[{"lower_expr":"Shanghai","upper_expr":"Shenzhen","count":6.0,"pre_sum":0.0,"ndv":4.0}]
                       |
-| user_id         | LARGEINT    | 1.0         | 1           | 
[{"lower_expr":"10000","upper_expr":"10004","count":6.0,"pre_sum":0.0,"ndv":5.0}]
                             |
-| sex             | TINYINT     | 1.0         | 1           | 
[{"lower_expr":"0","upper_expr":"1","count":6.0,"pre_sum":0.0,"ndv":2.0}]       
                              |
-| max_dwell_time  | INT         | 1.0         | 1           | 
[{"lower_expr":"3","upper_expr":"22","count":6.0,"pre_sum":0.0,"ndv":6.0}]      
                              |
-| last_visit_date | DATETIME    | 1.0         | 1           | 
[{"lower_expr":"2017-10-01 06:00:00","upper_expr":"2017-10-03 
10:20:22","count":6.0,"pre_sum":0.0,"ndv":6.0}] |
-| age             | SMALLINT    | 1.0         | 1           | 
[{"lower_expr":"20","upper_expr":"35","count":6.0,"pre_sum":0.0,"ndv":4.0}]     
                              |
-+-----------------+-------------+-------------+-------------+---------------------------------------------------------------------------------------------------------------+
-```
-
-- 查看 `example_tbl` 表 `city`, `age`, `sex` 列的直方图信息，使用以下语法：
-
-```SQL
-mysql> SHOW COLUMN HISTOGRAM stats_test.example_tbl(city, age, sex);
-+-------------+-------------+-------------+-------------+----------------------------------------------------------------------------------------+
-| column_name | data_type   | sample_rate | num_buckets | buckets              
                                                                  |
-+-------------+-------------+-------------+-------------+----------------------------------------------------------------------------------------+
-| city        | VARCHAR(20) | 1.0         | 1           | 
[{"lower_expr":"Shanghai","upper_expr":"Shenzhen","count":6.0,"pre_sum":0.0,"ndv":4.0}]|
-| sex         | TINYINT     | 1.0         | 1           | 
[{"lower_expr":"0","upper_expr":"1","count":6.0,"pre_sum":0.0,"ndv":2.0}]       
       |
-| age         | SMALLINT    | 1.0         | 1           | 
[{"lower_expr":"20","upper_expr":"35","count":6.0,"pre_sum":0.0,"ndv":4.0}]     
       |
-+-------------+-------------+-------------+-------------+----------------------------------------------------------------------------------------+
-```
-
-Buckets 说明：
-
-> 每个列的 Buckets 都以 JSON 格式返回，Bucket 从小到大排列，每个 Bucket 包含上下界，元素数量，元素 NDV 
以及前面所有桶的元素数量。其中列的元素数量（row_count） = 最后一个桶元素数量（count） + 
前面所有桶的元素数量（pre_sum）。以下列的行数为 17。
-
-```JSON
-[
-    {
-        "lower_expr": 2,
-        "upper_expr": 7,
-        "count": 6,
-        "pre_sum": 0,
-        "ndv": 6
-    },
-    {
-        "lower_expr": 10,
-        "upper_expr": 20,
-        "count": 11,
-        "pre_sum": 6,
-        "ndv": 11
-    }
-]
-```
-
-## 修改统计信息
-
-⽤户通过 `ALTER` 语句修改统计信息，根据提供的参数，修改列相应的统计信息。
+⽤户可以通过 `ALTER` 语句调整统计信息。
 
 ```SQL
 ALTER TABLE table_name MODIFY COLUMN column_name SET STATS ('stat_name' = 
'stat_value', ...) [ PARTITION (partition_name) ];
@@ -680,35 +206,8 @@ ALTER TABLE table_name MODIFY COLUMN column_name SET STATS 
('stat_name' = 'stat_
 - table_name: 删除统计信息的目标表。可以是 `db_name.table_name` 形式。
 - column_name: 指定的目标列，必须是 `table_name` 中存在的列，每次只能修改一列的统计信息。
 - stat_name 和 stat_value: 相应的统计信息名称和统计信息信息的值，多个统计信息逗号分隔。可以修改的统计信息包括 
`row_count`, `ndv`, `num_nulls`, `min_value`, `max_value`, `data_size`。
-- partition_name: 指定的目标分区。必须是 `table_name` 中存在的分区，多个分区使用逗号分割。
-
-示例：
-
-- 修改 `example_tbl` 表 `age` 列 `row_count` 统计信息，使用以下语法：
-
-```SQL
-mysql> ALTER TABLE stats_test.example_tbl MODIFY COLUMN age SET STATS 
('row_count'='6001215');
-mysql> SHOW COLUMN STATS stats_test.example_tbl(age);
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| column_name | count     | ndv  | num_null | data_size | avg_size_byte | min  
| max  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| age         | 6001215.0 | 0.0  | 0.0      | 0.0       | 0.0           | N/A  
| N/A  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-```
 
-- 修改 `example_tbl` 表 `age` 列 `row_count`, `num_nulls`, `data_size` 统计信息，使用以下语法：
-
-```SQL
-mysql> ALTER TABLE stats_test.example_tbl MODIFY COLUMN age SET STATS 
('row_count'='6001215', 'num_nulls'='2023', 'data_size'='600121522');
-mysql> SHOW COLUMN STATS stats_test.example_tbl(age);
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| column_name | count     | ndv  | num_null | data_size | avg_size_byte | min  
| max  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-| age         | 6001215.0 | 0.0  | 2023.0   | 600121522 | 0.0           | N/A  
| N/A  |
-+-------------+-----------+------+----------+-----------+---------------+------+------+
-```
-
-## 删除统计信息
+#### 删除统计信息
 
 ⽤户通过 `DROP` 语句删除统计信息，根据提供的参数，删除指定的表、分区或列的统计信息。删除时会同时删除列统计信息和列直方图信息。
 
@@ -718,56 +217,61 @@ mysql> SHOW COLUMN STATS stats_test.example_tbl(age);
 DROP [ EXPIRED ] STATS [ table_name [ (column_name [, ...]) ] ];
 ```
 
-其中：
-
-- table_name: 要删除统计信息的目标表。可以是  `db_name.table_name`  形式。
-- column_name: 指定的目标列。必须是  `table_name`  中存在的列，多个列名称用逗号分隔。
-- expired：统计信息清理。不能指定表，会删除系统中无效的统计信息以及过期的统计任务信息。
 
-示例：
-
-- 清理统计信息，使用以下语法：
-
-```SQL
-mysql> DROP EXPIRED STATS;
-```
+#### 删除Analyze Job
 
-- 删除 `example_tbl` 表的统计信息，使用以下语法：
-
-```SQL
-mysql> DROP STATS stats_test.example_tbl;
-```
-
-- 删除 `example_tbl` 表 `city`, `age`, `sex` 列的统计信息，使用以下语法：
+用于根据job id删除自动/周期Analyze作业
 
-```SQL
-mysql> DROP STATS stats_test.example_tbl(city, age, sex);
+```sql
+DROP ANALYZE JOB [JOB_ID]
 ```
 
-## ANALYZE 配置项
+### 查看自动收集任务执行情况
 
-用于根据job id删除自动/周期Analyze作业
+此命令用于打开自动收集功能后，查看自动收集任务的完成状态。
 
 ```sql
-DROP ANALYZE JOB [JOB_ID]
+SHOW AUTO ANALYZE [ptable_name]
+    [ WHERE [ STATE = [ "PENDING" | "RUNNING" | "FINISHED" | "FAILED" ] ] ];
 ```
 
-## Full auto analyze
+自动收集任务不支持查看每个列的具完成情况及失败原因。默认只保存过去20000个执行完毕的自动收集任务的状态。
 
-用户可以使用选项 `enable_full_auto_analyze` 来决定是否启用Full auto 
analyze。如果启用，Doris会自动分析除了一些内部数据库（如`information_db`等）之外的所有数据库，并忽略AUTO作业。默认情况下，该选项为true。
+## 配置项
 
-## Other ANALYZE configuration item
-
-| conf                                                    | comment            
                                                                                
                                                                                
                                                                                
                                 | default value                  |
+|fe conf option                                                    | comment   
                                                                                
                                                                                
                                                                                
                                          | default value                  |
 
|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
 | statistics_sql_parallel_exec_instance_num               | 
控制每个统计信息收集SQL在BE侧的并发实例数/pipeline task num                                       
                                                                                
                                                                                
                                                    | 1                         
     |
 | statistics_sql_mem_limit_in_bytes                       | 
控制每个统计信息SQL可占用的BE内存                                                             
                                                                                
                                                                                
                                                    | 2L * 1024 * 1024 * 1024 
(2GiB) |
 | statistics_simultaneously_running_task_num              | 通过`ANALYZE 
TABLE[DATABASE]`提交异步作业后，可同时analyze的列的数量，所有异步任务共同受到该参数约束                         
                                                                                
                                                                                
                                         | 5                              |
 | analyze_task_timeout_in_minutes                         | AnalyzeTask执行超时时间  
                                                                                
                                                                                
                                                                                
                                 | 12 hours                       |
-| full_auto_analyze_start_time/full_auto_analyze_end_time | Full auto analyze 
执行时间范围，该时间段之外的时间不会触发full auto analyze                                           
                                                                                
                                                                                
                                  | 00:00:00-02:00:00              |            
          |
-|stats_cache_size| 
统计信息缓存的实际内存占用大小高度依赖于数据的特性，因为在不同的数据集和场景中，最大/最小值的平均大小和直方图的桶数量会有很大的差异。此外，JVM版本等因素也会对其产生影响。在这里，我将给出统计信息缓存在包含10_0000个项目时所占用的内存大小。每个项目的最大/最小值的平均长度为32，列名的平均长度为16，并且每个列都有一个具有128个桶的直方图。在这种情况下，统计信息缓存总共占用了911.954833984MiB的内存。如果没有直方图，统计信息缓存总共占用了61.2777404785MiB的内存。强烈不建议分析具有非常大字符串值的列，因为这可能导致FE内存溢出（OOM）。
 | 10_0000                        |
-|full_auto_analyze_simultaneously_running_task_num| 控制并发执行的full auto analyze   
                                                                                
                                                                                
                                                                                
                         | 5                              |
-|parallel_sync_analyze_task_num| 每次通过`ANALYZE...WITH 
SYNC`提交同步作业时，可同时analyze的列的数量                                                    
                                                                                
                                                                                
                                | 2                              |
+|stats_cache_size| 
统计信息缓存的实际内存占用大小高度依赖于数据的特性，因为在不同的数据集和场景中，最大/最小值的平均大小和直方图的桶数量会有很大的差异。此外，JVM版本等因素也会对其产生影响。下面给出统计信息缓存在包含100000个项目时所占用的内存大小。每个项目的最大/最小值的平均长度为32，列名的平均长度为16，统计信息缓存总共占用了61.2777404785MiB的内存。强烈不建议分析具有非常大字符串值的列，因为这可能导致FE内存溢出。
 | 100000                        |
+|enable_auto_sample|是否开启大表自动sample，开启后对于大小超过huge_table_lower_bound_size_in_bytes会自动通过采样收集|
 false|
+|auto_analyze_job_record_count|控制统计信息的自动触发作业执行记录的持久化行数|20000|
+|huge_table_default_sample_rows|定义开启开启大表自动sample后，对大表的采样行数|4194304|
+|huge_table_lower_bound_size_in_bytes|定义大表的大小下界，在开启enable_auto_sample的情况下，大小超过该值的表将会自动通过采样收集统计信息|5368
 709120|
+|huge_table_auto_analyze_interval_in_millis|控制对大表的自动ANALYZE的最小时间间隔，在该时间间隔内大小超过huge_table_lower_bound_size_in_bytes的表仅ANALYZE一次|43200000|
+|table_stats_health_threshold|取值在0-100之间，当自上次统计信息收集操作之后，数据更新量达到 (100 - 
table_stats_health_threshold)% ，认为该表的统计信息已过时|80|
+
+|会话变量|说明|默认值|
+|---|---|---|
+|full_auto_analyze_start_time|自动统计信息收集开始时间|00:00:00|
+|full_auto_analyze_end_time|自动统计信息收集结束时间|02:00:00|
+|enable_full_auto_analyze|开启自动收集功能|true|
+
+注意：上面列出的会话变量必须通过`SET GLOBAL`全局设置。
+
+## 使用建议
+
+根据我们的测试，在数据量（这里指实际存储占用的空间）为128GiB以下的表上，除自动收集功能执行时间段之外无须改动默认配置。
+
+依据集群配置情况，自动收集任务通常会占用20%左右的CPU资源，因此用户需要根据自己的业务情况，适当调整自动收集功能执行时间段以避开业务高峰期资源抢占。
+
+由于ANALYZE是资源密集型操作，因此最好尽可能不要在业务高峰期执行此类操作，从而避免对业务造成干扰，集群负载较高的情况下，ANALYZE操作也更容易失败。此外，基于相同的原因，我们建议用户避免全量的ANALYZE整库整表。通常来讲，只需要对经常作为谓词条件，JOIN条件，聚合字段以及ID字段的列进行ANALYZE就足够了。如果用户提交的SQL涉及到大量此类操作，并且表上也没有统计信息或者统计信息非常陈旧，那么我们建议：
+
+* 在提交复杂查询之前先对涉及到的表列进行ANALYZE，因为规划不当的复杂查询将占用非常多的系统资源，非荣容易资源耗尽或超时而失败
+* 
如果用户为Doris配置了周期性数据导入例程，那么建议在导入完毕后，执行ANALYZE从而保证后续查询规划能够利用到最新的统计数据。可以利用Doris已有的作业调度框架自动化完成此类设置
+* 当表的数据发生显著变化后，比如新建表并完成数据导入后，ANALYZE对应的表。
 
 ## 常见问题
 
@@ -778,3 +282,9 @@ SQL执行时间受`query_timeout`会话变量控制，该变量默认值为300
 ### ANALYZE提交报错：Stats table not available...
 
 
执行ANALYZE时统计数据会被写入到内部表`__internal_schema.column_statistics`中，FE会在执行ANALYZE前检查该表tablet状态，如果存在不可用的tablet则拒绝执行任务。出现该报错请检查BE集群状态。
+
+用户可通过`SHOW BACKENDS\G`，确定BE状态是否正常。如果BE状态正常，可使用命令`ADMIN SHOW REPLICA STATUS 
FROM __internal_schema.[tbl_in_this_db]`，检查该库下tablet状态，确保tablet状态正常。
+
+### 大表ANALYZE失败
+
+由于ANALYZE能够使用的资源受到比较严格的限制，对一些大表的ANALYZE操作有可能超时或者超出BE内存限制。这些情况下，建议使用 `ANALYZE 
... WITH SAMPLE...`。
diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java 
b/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java
index 2df7b9c358d..185a582cde4 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/statistics/OlapAnalysisTask.java
@@ -37,6 +37,7 @@ import org.apache.commons.text.StringSubstitutor;
 import java.security.SecureRandom;
 import java.util.ArrayList;
 import java.util.Collection;
+import java.util.Collections;
 import java.util.Date;
 import java.util.HashMap;
 import java.util.List;
@@ -121,8 +122,9 @@ public class OlapAnalysisTask extends BaseAnalysisTask {
         List<Long> tabletIds = pair.first;
         double scaleFactor = (double) tbl.getRowCount() / (double) pair.second;
         // might happen if row count in fe metadata hasn't been updated yet
-        if (Double.isInfinite(scaleFactor)) {
+        if (Double.isInfinite(scaleFactor) || Double.isNaN(scaleFactor)) {
             scaleFactor = 1;
+            tabletIds = Collections.emptyList();
         }
         String tabletStr = tabletIds.stream()
                 .map(Object::toString)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris) branch branch-2.0 updated: [docs & fix](stats) Fix tablesample init failed and some outdated contents in docs #25603 (#26250)

Reply via email to