weizuo93 opened a new issue #4797:
URL: https://github.com/apache/incubator-doris/issues/4797


   MemTable flush can be activated by three situations:
   A. The size of MemTable reach to the flush threshold 
`config::write_buffer_size`;
   B. Data load is finished and DeltaWriter needs to be closed;
   C. Memory consumption exceeds the limit and MemTable needs to flush to 
reduce memory usage.
   
   A large number of small segment files would be generated due to "situation 
B" above when small batch data (far less than `config::write_buffer_size`) are 
loaded frequently ,  which will lead to lower efficiency for scan operations.
   
   We can optimize MemTable flush mechanism like this:
   (1) Maintain a `vector<MemTable>` for each tablet;
   (2) When close DeltaWriter and flush MemTable for a tablet, do not reset the 
Memtable if there is no flush operation before for this tablet in the data load 
and push the MemTable into `vector<MemTable>`;
   (3) When next flush operation for this tablet comming, judge whether the 
flush is activated by :
         a.  "situation A": Merge all MemTable in `vector<MemTable>` into 
current MemTable , flush the merged MemTable , delete all the rowset 
corresponding to MemTable in `vector<MemTable>` and clear `vector<MemTable>`;
         b.  "situation B": If the total size of  MemTable in 
`vector<MemTable>`and current MemTable reach threshold 
`config::write_buffer_size`, merge all MemTable in `vector<MemTable>` into 
current MemTable, flush  the merged MemTable, delete all the rowset 
corresponding to MemTable in `vector<MemTable>` and clear `vector<MemTable>`; 
If the total size of  MemTable in `vector<MemTable>`and current MemTable is 
less than  the threshold `config::write_buffer_size`, push the MemTable into 
`vector<MemTable>` and only flush the current MemTable;
         c.  "situation C": flush the current MemTable 、reset all the MemTable 
in  `vector<MemTable>` and clear `vector<MemTable>`.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to