eldenmoon opened a new issue, #33297:
URL: https://github.com/apache/doris/issues/33297

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   2.1.1
   
   ### What's Wrong?
   
   ## Problem Background
   
   ### Issue 1: FDB Transaction Timeout with Variant in Cloud
   Currently, Variant writes an `fdb kv` for each `rowset`'s `TabletSchema` 
upon `commit_rowset`. This approach is due to the schema's granularity being at 
the rowset level, with each rowset potentially having a unique schema. However, 
when the number of rowsets exceeds 1000, `get_rowset` operations may face FDB 
transaction timeouts due to large kv sizes (exceeding 100K), leading to read 
times over 10ms. To address this, there's a need to optimize the kv size to 
prevent timeouts, targeting a size reduction to <2k.
   
   ### Issue 2: Schema Not Updated on Compaction
   The compaction process does not update the schema, potentially merging 
multiple rowset schemas into a new one without accurately reflecting the 
changes. This results in inconsistencies and incorrect schema representations 
in the output rowset.
   
   ## Proposed Fix
   To reduce the space occupied by `TabletSchema`, which is largely consumed by 
the `column` and `index` fields, a dictionary compression method is proposed. 
By encoding these fields into dictionary keys, we can significantly decrease 
the storage space required. This involves modifications to the 
`RowsetMetaCloudPB` and the introduction of a `SchemaCloudDictionary` to manage 
the dictionary keys and ensure efficient storage utilization.
   
   ```
   message RowsetMetaCloudPB {
     ...
     repeated int32 column_dict_key_list = 26;
     ...
   }
   
   message SchemaCloudDictionary {
     map<int32, ColumnPB> column_dict = 1;
     optional int64 current_column_index = 2;
     map<int32, TabletIndexPB> index_dict = 3;
     optional int64 current_index_index = 2;
   }
   
   
   ### What You Expected?
   
   optimize the size of schema kv for variant
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to