github-actions[bot] commented on code in PR #64612:
URL: https://github.com/apache/doris/pull/64612#discussion_r3449572552


##########
fe/be-java-extensions/max-compute-connector/src/main/java/org/apache/doris/maxcompute/MaxComputeJniWriter.java:
##########
@@ -272,79 +269,31 @@ private void rotateCurrentBatchWriter() throws 
IOException {
         openBatchWriter(requestBlockId());
     }
 
-    private void writeRowsWithRowChecks(VectorTable inputTable, int numRows, 
int numCols) throws IOException {
-        int rowStart = 0;
-        while (rowStart < numRows) {
-            int rowEnd = rowStart;
-            long batchEstimatedBytes = 0L;
-            boolean rotateAfterWrite = false;
-            while (rowEnd < numRows) {
-                long rowEstimatedBytes = 
estimateSingleRowPayloadBytes(inputTable, numCols, rowEnd);
-                boolean exceedsHardLimit = currentBlockWrittenBytes + 
batchEstimatedBytes
-                        + rowEstimatedBytes > maxBlockBytes;
-                if (exceedsHardLimit) {
-                    if (rowEnd == rowStart) {
-                        if (currentBlockWrittenBytes > 0) {
-                            rotateCurrentBatchWriter();
-                            continue;
-                        }
-                        batchEstimatedBytes += rowEstimatedBytes;
-                        rowEnd++;
-                        rotateAfterWrite = true;
-                    }
-                    break;
-                }
-                batchEstimatedBytes += rowEstimatedBytes;
-                rowEnd++;
-                if (currentBlockWrittenBytes + batchEstimatedBytes >= 
maxBlockBytes) {
-                    rotateAfterWrite = true;
-                    break;
-                }
-            }
-
-            if (rowEnd == rowStart) {
-                long rowEstimatedBytes = 
estimateSingleRowPayloadBytes(inputTable, numCols, rowStart);
-                batchEstimatedBytes = rowEstimatedBytes;
-                rowEnd = rowStart + 1;
-                rotateAfterWrite = true;
-            }
-
-            try (VectorSchemaRoot root = buildRowRangeRoot(inputTable, 
numCols, rowStart, rowEnd)) {
-                batchWriter.write(root);
-            }
-            batchWriter.flush();
-            int rowsWrittenNow = rowEnd - rowStart;
-            writtenRows += rowsWrittenNow;
-            currentBlockWrittenBytes += batchEstimatedBytes;
-            writtenBytes += batchEstimatedBytes;
-            rowStart = rowEnd;
-
-            if (rotateAfterWrite && rowStart < numRows) {
-                rotateCurrentBatchWriter();
-            }
+    private void writeBatch(VectorTable inputTable, int numRows, int numCols) 
throws IOException {
+        // Roll to a fresh block before writing once the current one hits the 
size target.
+        if (batchWriter != null && currentBlockWrittenBytes >= maxBlockBytes) {
+            rotateCurrentBatchWriter();
         }
-    }
 
-    private static class CountingDiscardOutputStream extends OutputStream {
-        @Override
-        public void write(int b) {
-            // Discard bytes while allowing WriteChannel to track payload size.

Review Comment:
   This no longer enforces `mc.write_max_block_bytes` for the block being 
written. `writeBatch` only rotates when the current block is already over the 
limit, then estimates `batchBytes`, writes the whole `VectorSchemaRoot`, and 
adds the bytes afterward. With a 64MB threshold, a current block at 60MB and a 
20MB incoming JNI block will be committed as an 80MB MaxCompute block; rotation 
happens only before the next batch. The removed code split the input row range 
and rotated before appending rows that would overflow the current block, and 
the BE side does not do any byte splitting before calling this Java writer. 
Please check `currentBlockWrittenBytes + batchBytes` before writing and split 
oversized incoming batches into row ranges, preserving the previous single-row 
oversize fallback.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to