mrhhsg opened a new pull request, #64060:
URL: https://github.com/apache/doris/pull/64060

   ### What problem does this PR solve?
   
   Issue Number: None
   
   Related PR: None
   
   Problem Summary: Arrow block conversion previously switched an oversized 
UTF8 column to a Large UTF8 builder when the column byte size reached the Arrow 
UTF8 limit. The output schema still uses the original UTF8 field, and Doris 
does not support this large UTF8 conversion path yet. Return a clear 
InvalidArgument error instead, including the column byte size, the configured 
limit, and a suggestion to reduce batch_size. Add a focused BE unit test with a 
dummy column whose byte_size reaches the limit, so the limit branch is covered 
without allocating a huge string column.
   
   ### Release note
   
   Return a clear error when Arrow UTF8 block conversion reaches the supported 
column byte-size limit.
   
   ### Check List (For Author)
   
   - Test: Unit Test / Manual test
       - Ran `./build-support/clang-format.sh`
       - Ran `./build-support/check-format.sh`
       - Ran `git diff --check origin/master..HEAD`
       - Ran `DORIS_HOME=/mnt/disk7/hushenggang/doris ninja -C be/ut_build_ASAN 
src/format/CMakeFiles/Format.dir/arrow/arrow_block_convertor.cpp.o`
       - Ran `DORIS_HOME=/mnt/disk7/hushenggang/doris ninja -C be/ut_build_ASAN 
test/CMakeFiles/doris_be_test.dir/core/data_type_serde/data_type_serde_arrow_test.cpp.o`
       - Ran `DORIS_HOME=/mnt/disk7/hushenggang/doris ninja -C be/ut_build_ASAN 
test/doris_be_test`
       - Ran `./run-be-ut.sh --run 
--filter=DataTypeSerDeArrowTest.RejectOversizedUtf8ColumnByteSize -j 8`
       - Attempted 
`CLANG_TIDY_BINARY=/mnt/disk6/common/ldb_toolchain_taipan/bin/clang-tidy 
./build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN --base 
origin/master`, but clang-tidy could not analyze due to existing 
environment/header issues: unmatched NOLINTEND in `be/src/core/types.h` and 
missing `stddef.h` from system/toolchain headers. It also reported pre-existing 
complexity/function-size warnings in `create_test_block`.
   - Behavior changed: Yes. Oversized UTF8 Arrow block conversion now returns 
InvalidArgument instead of trying the unsupported Large UTF8 conversion path.
   - Does this need documentation: No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to