mrhhsg opened a new pull request, #63617:
URL: https://github.com/apache/doris/pull/63617

   ### What problem does this PR solve?
   
   Issue Number: None
   
   Problem Summary: Dictionary-encoded string pages use codewords from the data 
page to index the page dictionary during decoding. Corrupted codeword data can 
reference entries outside the dictionary and lead to out-of-bounds dictionary 
reads. This PR validates dictionary codewords against the dictionary size 
before materializing string columns, predicate columns, dictionary columns, and 
offset-only length reads. Invalid codewords now fail with a corruption status 
instead of indexing outside the dictionary.
   
   This PR also updates a stale column array view unit test helper usage 
required by the current master build after rebasing.
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test:
       - Unit Test: ./run-be-ut.sh --run 
--filter=ColumnStringTest.insert_many_dict_data*:PredicateColumnTest.InsertManyDictData*:ColumnDictionaryTest.insert_many_dict_data*:BinaryDictPageTest.TestRejectInvalidDictCodeword*
 -j 16
       - Manual test: build-support/clang-format.sh
       - Manual test: build-support/check-format.sh
       - Manual test: git diff --check
   - Behavior changed: Yes. Corrupted dictionary-encoded string pages with 
invalid codewords are rejected as corruption instead of being decoded.
   - Does this need documentation: No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to