pengxiangyu opened a new issue, #10404: URL: https://github.com/apache/doris/issues/10404
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description ### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description Now, doris only store data in local disk, it makes you can read and write data on disk quickly. But not all data in database is read/written usually, most data is used when it is a new one. When the data is not hot, it will still cost the space of the disk.You can delete it, however some data maybe useful again some time. So, the cold data need to be saved on some cheaper storage, such as BOS/S3/HDFS, etc. It will be cheaper. Then the cold data can also be read when it is necessary, just from remote storage. ### Overall 1. Support remote storage, data will be move to remote storage(BOS/S3) when it is cold. 2. Dynamic partition need to be set to cold continuously by the create time, so we can set them cold continuously. 3. Meta need to be local, so we can read it quickly. Then read data by the meta. 4. When cold data need to be read, get it from remote storage. 5. remote storage need to be similar to local storage, cold data can be read, moved to trash and deleted, but cant't be appended. ### Detail design BE will resovle the relation of local disk and remote storage. Local disk will hold the meta, which will be used to find which data is needed. Remote storage will hold the cold data, which will be read by be. ``` FE | BE | | META DATA LOCAL DISK REMOTE STORAGE ``` 1. Support remote storage remote storage configure will be set in the properties of Create/Alter Table a. storage_medium is the storage for hot data. b. storage_cold_medium is the destination storage which cold data will be moved to. c. storage_cooldown_time is the time for cold data. ``` CREATE TABLE TblPxy ( aa BIGINT ) ENGINE=olap DISTRIBUTED BY HASH (aa) BUCKETS 32 PROPERTIES( "storage_medium" = "SSD", "storage_cold_medium" = "S3", "storage_cooldown_time" = "2021-11-08 11:52:00" ); ``` 2. Dynamic partition cold data Dynamic partition is created continuously, so the cold time must be set by the partition time. a. dynamic_partition.hot_partition_num means how many hot partition will relay, the older partition will be set to cold. b. dynamic_partition.storage_medium is the storage holding hot data. c. dynamic_partition.storage_cold_medium is the dest storage for cold data. ``` CREATE TABLE TblPxy ( k1 DATE, aa BIGINT ) ENGINE=olap PARTITION BY RANGE (k1) () DISTRIBUTED BY HASH (aa) BUCKETS 1 PROPERTIES( "dynamic_partition.hot_partition_num" = "3", "dynamic_partition.storage_medium" = "HDD", "dynamic_partition.storage_cold_medium" = "S3", "dynamic_partition.time_unit" = "DAY", "dynamic_partition.start" = "-3", "dynamic_partition.end" = "3", "dynamic_partition.prefix" = "p", "dynamic_partition.buckets" = "32" ); ``` 3. Read cold data, meta will be local When you are calling select and the data is cold. BE will get meta of local disck first, choose which data is needed. Then the matched remote data will be read and return to BE. ``` SELECT * FROM TblPxy; ``` 4. Cold data trash When cold data need to be dropped, move it to trash path on remote storage, and the trash path will be set in local trash path. Cleaner will check local trash path, if it's time to delete, remote data will be deleted first, and then local. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Use case _No response_ ### Related issues _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org