eldenmoon opened a new issue, #33297: URL: https://github.com/apache/doris/issues/33297
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version 2.1.1 ### What's Wrong? ## Problem Background ### Issue 1: FDB Transaction Timeout with Variant in Cloud Currently, Variant writes an `fdb kv` for each `rowset`'s `TabletSchema` upon `commit_rowset`. This approach is due to the schema's granularity being at the rowset level, with each rowset potentially having a unique schema. However, when the number of rowsets exceeds 1000, `get_rowset` operations may face FDB transaction timeouts due to large kv sizes (exceeding 100K), leading to read times over 10ms. To address this, there's a need to optimize the kv size to prevent timeouts, targeting a size reduction to <2k. ### Issue 2: Schema Not Updated on Compaction The compaction process does not update the schema, potentially merging multiple rowset schemas into a new one without accurately reflecting the changes. This results in inconsistencies and incorrect schema representations in the output rowset. ## Proposed Fix To reduce the space occupied by `TabletSchema`, which is largely consumed by the `column` and `index` fields, a dictionary compression method is proposed. By encoding these fields into dictionary keys, we can significantly decrease the storage space required. This involves modifications to the `RowsetMetaCloudPB` and the introduction of a `SchemaCloudDictionary` to manage the dictionary keys and ensure efficient storage utilization. ``` message RowsetMetaCloudPB { ... repeated int32 column_dict_key_list = 26; ... } message SchemaCloudDictionary { map<int32, ColumnPB> column_dict = 1; optional int64 current_column_index = 2; map<int32, TabletIndexPB> index_dict = 3; optional int64 current_index_index = 2; } ### What You Expected? optimize the size of schema kv for variant ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org