AetheWu opened a new issue, #61124:
URL: https://github.com/apache/doris/issues/61124

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   doris: 4.0.3
   adbc client: github.com/apache/arrow-go/v18
   
   
   ### What's Wrong?
   
   ## Description:
   When fetching large JSON string columns using Arrow Flight SQL, the returned 
RecordBatch contains corrupted memory state. Specifically, the Offsets indicate 
valid data ranges, but the DataBuffer is reported as size 0.
   
   ## Evidence:
   - When accessing the last row of a batch:
   - Total Length: 1283
   - Offsets Buffer Length: 1284
   - Last 10 Offsets: [... 1251873, 0] (Note the non-monotonic reset to 0)
   - Data Buffer Total Size: 0 bytes
   
   This indicates that during the serialization of the RecordBatch in the Doris 
BE, the DataBuffer was either truncated or released prematurely, while the 
Offsets metadata was not correctly synchronized.
   
   ## Impact:
   Client-side library (like apache/arrow-go) encounters a slice bounds out of 
range panic when attempting to access the string value, as it tries to slice a 
0-length buffer using offsets that point to invalid memory.
   
   ### What You Expected?
   
   Investigate the ArrowFlightStream serialization logic in the BE, 
particularly how StringArray offsets are calculated and how the DataBuffer 
lifecycle is managed during stream fragmentation.
   
   ### How to Reproduce?
   
   1. Execute a query via Arrow Flight SQL that returns a large number of rows 
(e.g., > 2000 rows).
   2. The target column contains JSON string data (average length ~1KB per row).
   3. Use reader.Next() to iterate through RecordBatches.
   4. Access the last few rows of a specific RecordBatch using 
array.String.Value(i).
   
   ### Anything Else?
   
   --- [DEBUG] Arrow Array Memory State Dump ---
   Array Type: *array.String
   Total Length: 1283
   Null Count: 0
   Offsets Buffer Length: 1284
   Last 10 Offsets: [1244061 0 1246014 0 1247967 0 1249920 0 1251873 0]
   Data Buffer Total Size (Bytes): 0
   Target Row: 1282 | StartOffset: 1251873, EndOffset: 0
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to