sollhui opened a new pull request, #61494:
URL: https://github.com/apache/doris/pull/61494
## Problem
In `VerticalSegmentWriter`, after each column's data is written via
`_finalize_column_writer_and_update_meta()`, the column writer is kept alive
until `finalize_columns_index()` is called at the very end. This means all
column writers in the current batch hold their in-memory index structures
simultaneously:
- Ordinal index page offsets
- Zone map min/max data
- Bloom filter bit arrays
- Lucene RAM buffer (inverted index)
For workloads with many columns or inverted index columns, this causes
significant unnecessary peak memory.
## Solution
Write all index types immediately inside
`_finalize_column_writer_and_update_meta()`
right after data pages are written, then call `reset()` on the column writer.
Since every call site of `_finalize_column_writer_and_update_meta()` already
marks the column as fully written, the index data can be flushed and freed at
that point without any correctness risk.
`finalize_columns_index()` is updated to skip already-released writers
(null check), serving as a defensive fallback. The now-dead per-type helper
methods (`_write_ordinal_index`, `_write_zone_map`, etc.) and `clear()` are
removed.
## Impact
- Peak memory reduced proportionally to the number of columns written through
`_finalize_column_writer_and_update_meta()` — covers the MoW partial update
path and the standard `append_block` batch path
- Particularly effective for tables with inverted indexes (Lucene RAM buffer
freed column-by-column instead of all at once)
- No behavioral change: segment file format is unaffected since each column's
byte offsets are recorded independently in the footer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]