sollhui opened a new pull request, #61494:
URL: https://github.com/apache/doris/pull/61494

   ## Problem
   
   In `VerticalSegmentWriter`, after each column's data is written via
   `_finalize_column_writer_and_update_meta()`, the column writer is kept alive
   until `finalize_columns_index()` is called at the very end. This means all
   column writers in the current batch hold their in-memory index structures
   simultaneously:
   
   - Ordinal index page offsets
   - Zone map min/max data
   - Bloom filter bit arrays
   - Lucene RAM buffer (inverted index)
   
   For workloads with many columns or inverted index columns, this causes
   significant unnecessary peak memory.
   
   ## Solution
   
   Write all index types immediately inside 
`_finalize_column_writer_and_update_meta()`
   right after data pages are written, then call `reset()` on the column writer.
   Since every call site of `_finalize_column_writer_and_update_meta()` already
   marks the column as fully written, the index data can be flushed and freed at
   that point without any correctness risk.
   
   `finalize_columns_index()` is updated to skip already-released writers
   (null check), serving as a defensive fallback. The now-dead per-type helper
   methods (`_write_ordinal_index`, `_write_zone_map`, etc.) and `clear()` are
   removed.
   
   ## Impact
   
   - Peak memory reduced proportionally to the number of columns written through
     `_finalize_column_writer_and_update_meta()` — covers the MoW partial update
     path and the standard `append_block` batch path
   - Particularly effective for tables with inverted indexes (Lucene RAM buffer
     freed column-by-column instead of all at once)
   - No behavioral change: segment file format is unaffected since each column's
     byte offsets are recorded independently in the footer
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to