weizuo93 opened a new issue #4797: URL: https://github.com/apache/incubator-doris/issues/4797
MemTable flush can be activated by three situations: A. The size of MemTable reach to the flush threshold `config::write_buffer_size`; B. Data load is finished and DeltaWriter needs to be closed; C. Memory consumption exceeds the limit and MemTable needs to flush to reduce memory usage. A large number of small segment files would be generated due to "situation B" above when small batch data (far less than `config::write_buffer_size`) are loaded frequently , which will lead to lower efficiency for scan operations. We can optimize MemTable flush mechanism like this: (1) Maintain a `vector<MemTable>` for each tablet; (2) When close DeltaWriter and flush MemTable for a tablet, do not reset the Memtable if there is no flush operation before for this tablet in the data load and push the MemTable into `vector<MemTable>`; (3) When next flush operation for this tablet comming, judge whether the flush is activated by : a. "situation A": Merge all MemTable in `vector<MemTable>` into current MemTable , flush the merged MemTable , delete all the rowset corresponding to MemTable in `vector<MemTable>` and clear `vector<MemTable>`; b. "situation B": If the total size of MemTable in `vector<MemTable>`and current MemTable reach threshold `config::write_buffer_size`, merge all MemTable in `vector<MemTable>` into current MemTable, flush the merged MemTable, delete all the rowset corresponding to MemTable in `vector<MemTable>` and clear `vector<MemTable>`; If the total size of MemTable in `vector<MemTable>`and current MemTable is less than the threshold `config::write_buffer_size`, push the MemTable into `vector<MemTable>` and only flush the current MemTable; c. "situation C": flush the current MemTable 、reset all the MemTable in `vector<MemTable>` and clear `vector<MemTable>`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org