Re: [PR] [fix](doriswriter)Fix the problem when data bytes size > 2GB causes an int overflow issue [doris]

via GitHub Tue, 20 Feb 2024 01:03:10 -0800


kris37 commented on code in PR #31071:
URL: https://github.com/apache/doris/pull/31071#discussion_r1495461000



##########
extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisStreamLoadObserver.java:
##########
@@ -90,7 +96,7 @@ public void streamLoad(WriterTuple data) throws Exception {
                 .toString();
         LOG.info("Start to join batch data: rows[{}] bytes[{}] label[{}].", 
data.getRows().size(), data.getBytes(), data.getLabel());
         loadUrl = urlDecode(loadUrl);
-        Map<String, Object> loadResult = put(loadUrl, data.getLabel(), 
addRows(data.getRows(), data.getBytes().intValue()));

Review Comment:
   > So was the solution to store the data.bytes() as a long without converting 
it to int? Could BigInteger class work?
   The root cause of this bug is that the addRows function returns a type of 
byte[], which limits each batch's maximum writable data to 2GB. When the actual 
value of data.getBytes() exceeds Integer.MAX_VALUE, casting from long to int 
might cause an overflow, resulting in the parameter's value possibly being < 0. 
This can lead to a NullPointerException (NPE) when trying to allocate memory 
for the ByteBuffer. Therefore, the fundamental solution to this bug is to 
abandon the return type of byte[] from addRows in favor of List<byte[]>. This 
change will only limit the number of rows each batch can write to less than 
Integer.MAX_VALUE/2, thereby avoiding the issue of each batch being unable to 
write more than 2GB of data.
   
![image](https://github.com/apache/doris/assets/22276890/c187cf3d-5f9b-4ca3-95ee-388c18bc7341)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Re: [PR] [fix](doriswriter)Fix the problem when data bytes size > 2GB causes an int overflow issue [doris]

Reply via email to