Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

via GitHub Thu, 07 Dec 2023 08:31:38 -0800


cccs-jc commented on PR #8980:
URL: https://github.com/apache/iceberg/pull/8980#issuecomment-1845659813


   > @cccs-jc i mean let's have changes for 3.5 with it's test only in 3.5 and 
we can backport the change with it's test in lower spark version like 3.4 and 
3.3, 3.4 test failures are expected right as we don't have changes for 
SparkMicrobatch stream for 3.4 in it.
   > 
   > Also i would request to revert the change in core for Microbatch.java if 
we don't have coverage for it as i am unsure when would that fail (may be some 
legacy handling)
   > 
   > Apologies for getting being late in getting back at this.
   
   Keeping the `+ existingFilesCount();` in the SparkMicrobatch.java makes no 
sense to me.
   
   What is the purpose of adding that to the currentFileIndex ?
   
   The way I understand it currentFileIndex is a position of the added files. 
So we want to only count the added files (addedFilesCount()). These are the 
files that you want a streaming job to consume.
   
   Can you explain what is the purpose of using `existingFilesCount` here ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

Reply via email to