Reranko05 opened a new issue, #49800:
URL: https://github.com/apache/arrow/issues/49800

   ### Describe the enhancement requested
   
   The current `base64_decode` implementation processes input byte-by-byte 
using scalar operations.
   
   This came up while working on a recent change to improve validation 
performance (PR #49660), where replacing `find()` with a lookup table 
highlighted that decoding itself is still done sequentially.
   
   Since base64 decoding follows a regular pattern (4 chars → 3 bytes), it 
seems like it could benefit from SIMD/vectorized approaches (e.g., AVX2), 
especially for larger inputs.
   
   I wanted to check:
   - Is exploring a SIMD-based decoding path something that would be in scope 
for Arrow?
   - Have there been any prior attempts or discussions around this?
   - Would a CPU-dispatched approach (SIMD + scalar fallback) be acceptable 
here?
   
   I haven’t explored SIMD in this area yet, but happy to prototype something 
or run comparisons if this aligns with project direction.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to