leekeiabstraction commented on code in PR #443:
URL: https://github.com/apache/fluss-rust/pull/443#discussion_r2986660097
##########
crates/fluss/src/record/arrow.rs:
##########
@@ -367,8 +410,50 @@ impl MemoryLogRecordsArrowBuilder {
// todo: consider write other change type
}
+ /// Check if the builder is full based on estimated serialized size.
+ ///
+ /// Uses a threshold-based optimization to skip expensive size checks:
+ /// only computes the actual estimated size when the record count reaches
+ /// the predicted threshold. Matching Java's `ArrowWriter.isFull()`.
pub fn is_full(&self) -> bool {
- self.arrow_record_batch_builder.records_count() >= DEFAULT_MAX_RECORD
+ // Delegate to inner builder first (e.g. PrebuiltRecordBatchBuilder
+ // is always full after one batch, regardless of size).
+ if self.arrow_record_batch_builder.is_full() {
+ return true;
+ }
+ let records_count = self.arrow_record_batch_builder.records_count();
+ let threshold = self.estimated_max_records_count.get();
Review Comment:
nit as this is a heuristic: There's a race condition if multiple threads run
through the same section at the same time, with the changes from the thread
that completes first to be overwritten. I think this is also true in java side,
but if it is not too complex to fix maybe we can update here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]