xinyiZzz opened a new pull request, #10170:
URL: https://github.com/apache/incubator-doris/pull/10170

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   ### Motivation
   TABLESAMPLE allows you to limit the number of rows from a table in the FROM 
clause.
   
   Used for data detection, quick verification of the accuracy of SQL, table 
statistics collection.
   
   ### Grammar
   ```
   [TABLET tids] TABLESAMPLE n [ROWS | PERCENT] [REPEATABLE seek]
   ```
   
   Limit the number of rows read from the table in the FROM clause, 
   select a number of Tablets pseudo-randomly from the table according to the 
specified number of rows or percentages, 
   and specify the number of seeds in REPEATABLE to return the selected samples 
again. 
   In addition, can also manually specify the TableID, 
   Note that this can only be used for OLAP tables.
   
   ### Example
   ```
   SELECT * FROM t1 TABLET(10001) TABLESAMPLE(1000 ROWS) REPEATABLE (2) limit 
1000;
   ```
   
   Pseudo-randomly sample 1000 rows in t1.
   Note that several Tablets are actually selected according to the statistics 
of the table, 
   and the total number of selected Tablet rows may be greater than 1000, 
   so if you want to explicitly return 1000 rows, you need to add Limit.
   
   ### Design
   First, determine how many rows to sample from each partition according to 
the number of partitions.
   Then determine the number of Tablets to be selected for each partition 
according to the average number of rows of Tablet,
   If seek is not specified, the specified number of Tablets are 
pseudo-randomly selected from each partition.
   If seek is specified, it will be selected sequentially from the seek tablet 
of the partition.
   And add the manually specified Tablet id to the selected Tablet.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to