eldenmoon opened a new pull request, #63718:
URL: https://github.com/apache/doris/pull/63718
### What problem does this PR solve?
Issue Number: close DORIS-24846
Related PR: #xxx
Problem Summary: `DataTypeVariantSerDe::write_column_to_arrow` always cast
the Arrow builder to `arrow::StringBuilder`. During Parquet OUTFILE export, the
Arrow block converter can switch utf8 columns to `large_utf8` when a batch is
large, which gives variant serialization an `arrow::LargeStringBuilder` and
crashes BE on the bad cast.
This patch handles both `arrow::StringBuilder` and
`arrow::LargeStringBuilder` for VARIANT Arrow serialization and adds a BE UT
that reproduces the LargeStringBuilder path.
### Release note
Fix BE crash when exporting VARIANT columns to Parquet OUTFILE with large
Arrow string batches.
### Check List (For Author)
- Test: Unit Test
- `./run-be-ut.sh --run
--filter='DataTypeSerDeTest.VariantWriteColumnToArrowSupportsLargeString'`
- `./run-be-ut.sh --run --filter='DataTypeSerDeTest.*'`
- `PATH=/mnt/disk1/claude-max/ldb_toolchain16/bin:$PATH
build-support/check-format.sh`
- Behavior changed: Yes. VARIANT Arrow serialization now supports
`large_utf8` builders instead of aborting on a bad builder cast.
- Does this need documentation: No
### Notes
`build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN --base
upstream/master` was attempted. It is blocked by existing diagnostics in this
path, including `core/types.h` unmatched `NOLINTEND` and pre-existing
modernize/readability findings in `data_type_variant_serde.cpp` /
`data_type_serde_test.cpp`; the new signed/unsigned warning introduced while
developing this patch was fixed before the final tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]