kris37 commented on code in PR #31071: URL: https://github.com/apache/doris/pull/31071#discussion_r1495461000
########## extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisStreamLoadObserver.java: ########## @@ -90,7 +96,7 @@ public void streamLoad(WriterTuple data) throws Exception { .toString(); LOG.info("Start to join batch data: rows[{}] bytes[{}] label[{}].", data.getRows().size(), data.getBytes(), data.getLabel()); loadUrl = urlDecode(loadUrl); - Map<String, Object> loadResult = put(loadUrl, data.getLabel(), addRows(data.getRows(), data.getBytes().intValue())); Review Comment: > So was the solution to store the data.bytes() as a long without converting it to int? Could BigInteger class work? The root cause of this bug is that the addRows function returns a type of byte[], which limits each batch's maximum writable data to 2GB. When the actual value of data.getBytes() exceeds Integer.MAX_VALUE, casting from long to int might cause an overflow, resulting in the parameter's value possibly being < 0. This can lead to a NullPointerException (NPE) when trying to allocate memory for the ByteBuffer. Therefore, the fundamental solution to this bug is to abandon the return type of byte[] from addRows in favor of List<byte[]>. This change will only limit the number of rows each batch can write to less than Integer.MAX_VALUE/2, thereby avoiding the issue of each batch being unable to write more than 2GB of data.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org