BaldDemian opened a new pull request, #3479:
URL: https://github.com/apache/fory/pull/3479

   <!--
   **Thanks for contributing to Apache Fory™.**
   
   **If this is your first time opening a PR on fory, you can refer to 
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).**
   
   Contribution Checklist
   
       - The **Apache Fory™** community has requirements on the naming of pr 
titles. You can also find instructions in 
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).
   
       - Apache Fory™ has a strong focus on performance. If the PR you submit 
will have an impact on performance, please benchmark it first and provide the 
benchmark result here.
   -->
   
   ## Why?
   `buffer.h` reads and writes multi-byte integers (16/32/64-bit) directly into 
a raw `uint8_t*` buffer. The original code cast the byte pointer to a **typed** 
pointer and dereferenced it: 
   
   - Read:                                                                      
                                                                                
          
     `return reinterpret_cast<const T*>(data_ + offset)[0];`                    
                                                                                
                                                                              
                                                                                
                                                               
   - Write:                                                                     
                                                          
     `*reinterpret_cast<T*>(data_ + offset) = value;`                           
                                                               
                                                                                
                                                               
   Dereferencing a pointer that is not aligned to `alignof(T)` is considered UB 
in C++. UBSan correctly flagged these as misaligned address runtime errors, 
because `data_ + offset` can be at any byte boundary.
   
   
   ## What does this PR do?
   
   <!-- Describe the details of this PR. -->
   Two helper templates were added to the `buffer.h`:
   
   ```c++
   template <typename T>
   FORY_ALWAYS_INLINE static T load_unaligned(const uint8_t *ptr) {
       T value;
       std::memcpy(&value, ptr, sizeof(T));
       return value;
   }
   
   template <typename T>
   FORY_ALWAYS_INLINE static void store_unaligned(uint8_t *ptr, T value) {
       std::memcpy(ptr, &value, sizeof(T));
   }
   ```
   
   All `reinterpret_cast` calls that may lead to UB in the file were replaced 
with calls to these helpers.
   
   No UB were detected when running `bazel test --cache_test_results=no 
--config=x86_64 --config=ubsan $(bazel query //...)` after applying this patch. 
   Details can be found in 
[ubsan_report.txt](https://github.com/user-attachments/files/25990662/ubsan_report.txt).
   Only a few `unused-but-set-parameter` warnings were detected by UBSan.
   
   ## Related issues
   
   <!--
   Is there any related issue? If this PR closes them you say say fix/closes:
   
   - #xxxx0
   - #xxxx1
   - Fixes #xxxx2
   -->
   
   Fix https://github.com/apache/fory/issues/3459
   
   ## AI Contribution Checklist
   
   <!-- Full requirements and disclosure template:
   
https://github.com/apache/fory/blob/main/AI_POLICY.md#9-contributor-checklist-for-ai-assisted-prs
 -->
   
   - [ ] Substantial AI assistance was used in this PR: `yes` / `no`
   - [ ] If `yes`, I included a completed [AI Contribution 
Checklist](https://github.com/apache/fory/blob/main/AI_POLICY.md#9-contributor-checklist-for-ai-assisted-prs)
 in this PR description and the required `AI Usage Disclosure`.
   
   <!-- If substantial AI assistance = `yes`, paste the completed checklist and 
disclosure block here. -->
   
   ## Does this PR introduce any user-facing change?
   
   <!--
   If any user-facing interface changes, please [open an 
issue](https://github.com/apache/fory/issues/new/choose) describing the need to 
do so and update the document if necessary.
   
   Delete section if not applicable.
   -->
   
   - [ ] Does this PR introduce any public API change?
   - [ ] Does this PR introduce any binary protocol compatibility change?
   
   ## Benchmark
   
   <!--
   When the PR has an impact on performance (if you don't know whether the PR 
will have an impact on performance, you can submit the PR first, and if it will 
have impact on performance, the code reviewer will explain it), be sure to 
attach a benchmark data here.
   
   Delete section if not applicable.
   -->
   I believe replacing `reinterpret_cast` with `memcpy` won't incur much 
runtime burden, since `memcpy` can be optimized by both GCC and Clang 
effectively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to