pengxiangyu opened a new issue, #10404:
URL: https://github.com/apache/doris/issues/10404

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   Now, doris only store data in local disk, it makes you can read and write 
data on disk quickly. But not all data in database is read/written usually, 
most data is used when it is a new one. When the data is not hot, it will still 
cost the space of the disk.You can delete it, however some data maybe useful 
again some time.
   
   So, the cold data need to be saved on some cheaper storage, such as 
BOS/S3/HDFS, etc. It will be cheaper.
   
   Then the cold data can also be read when it is necessary, just from remote 
storage.
   
   ### Overall
   
   1. Support remote storage, data will be move to remote storage(BOS/S3) when 
it is cold.
   2. Dynamic partition need to be set to cold continuously by the create time, 
so we can set them cold continuously.
   3. Meta need to be local, so we can read it quickly. Then read data by the 
meta.
   4. When cold data need to be read, get it from remote storage.
   5. remote storage need to be similar to local storage, cold data can be 
read, moved to trash and deleted, but cant't be appended.
   
   ### Detail design
   BE will resovle the relation of local disk and remote storage.
   Local disk will hold the meta, which will be used to find which data is 
needed.
   Remote storage will hold the cold data, which will be read by be.
   ```
                                        FE
                                         |
                                        BE
                             |                        |
                            META                     DATA
                         LOCAL DISK              REMOTE STORAGE
   ```
   1. Support remote storage
   remote storage configure will be set in the properties of Create/Alter Table
   a. storage_medium is the storage for hot data.
   b. storage_cold_medium is the destination storage which cold data will be 
moved to.
   c. storage_cooldown_time is the time for cold data.
   ```
   CREATE TABLE TblPxy
   (
       aa BIGINT
   )
   ENGINE=olap
   DISTRIBUTED BY HASH (aa) BUCKETS 32
   PROPERTIES(
       "storage_medium" = "SSD",
       "storage_cold_medium" = "S3",
       "storage_cooldown_time" = "2021-11-08 11:52:00"
   );
   ```
   2. Dynamic partition cold data
   Dynamic partition is created continuously, so the cold time must be set by 
the partition time.
   a. dynamic_partition.hot_partition_num means how many hot partition will 
relay, the older partition will be set to cold.
   b. dynamic_partition.storage_medium is the storage holding hot data.
   c. dynamic_partition.storage_cold_medium is the dest storage for cold data.
   ```
   CREATE TABLE TblPxy (
       k1 DATE,
       aa BIGINT
   ) ENGINE=olap PARTITION BY RANGE (k1) ()
   DISTRIBUTED BY HASH (aa) BUCKETS 1
   PROPERTIES(
       "dynamic_partition.hot_partition_num" = "3",
       "dynamic_partition.storage_medium" = "HDD",
       "dynamic_partition.storage_cold_medium" = "S3",
       "dynamic_partition.time_unit" = "DAY",
       "dynamic_partition.start" = "-3",
       "dynamic_partition.end" = "3",
       "dynamic_partition.prefix" = "p",
       "dynamic_partition.buckets" = "32"
   );
   ```
   3. Read cold data, meta will be local
   When you are calling select and the data is cold. BE will get meta of local 
disck first, choose which data is needed.
   Then the matched remote data will be read and return to BE.
   ```
   SELECT * FROM TblPxy;
   ```
   4. Cold data trash
   When cold data need to be dropped, move it to trash path on remote storage, 
and the trash path will be set in local trash path.
   Cleaner will check local trash path, if it's time to delete, remote data 
will be deleted first, and then local.
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to