github-actions[bot] commented on code in PR #63564:
URL: https://github.com/apache/doris/pull/63564#discussion_r3295777359
##########
be/src/core/column/column_string.h:
##########
@@ -298,7 +298,10 @@ class ColumnStr final : public COWHelper<IColumn,
ColumnStr<T>> {
running_offset += lengths[i];
offsets_ptr[i] = static_cast<T>(running_offset);
}
- chars.resize(offsets[old_rows + num - 1]);
+ // OFFSET_ONLY columns carry valid offsets but no real string payload.
Use non-zero
+ // placeholders so char-padding shrink logic cannot recompute these
offsets as zero-length
+ // strings when this column is nested under a struct that also
contains CHAR fields.
+ chars.resize_fill(offsets[old_rows + num - 1], 1);
Review Comment:
This makes every OFFSET_ONLY string read write one synthetic byte for every
logical byte in the column. The previous `resize()` only advanced the chars
size after allocating, while `resize_fill(..., 1)` touches the whole appended
range. For a query such as `select length(big_string_col) ...` with nested
pruning enabled, the BE still only needs offsets, but this now performs O(total
string bytes) memory writes per block and can dominate the scan for large
values, even though the CHAR/struct shrink issue only applies to the later
`shrink_padding_chars()` path. Please keep the general OFFSET_ONLY path sparse
and fix the shrink path more narrowly, e.g. by preventing shrink from
recomputing offsets for offset-only string children or only materializing
placeholders when that specific shrink path is actually required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]