bk-mz commented on PR #12856: URL: https://github.com/apache/iceberg/pull/12856#issuecomment-2818352166
These changes logs examples from planFiles: ```txt 2025-04-21 12:43:04.411 INFO [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Initializing SparkMicroBatchStream with params: branch=null, caseSensitive=false, splitSize=134217728, splitLookback=10, splitOpenFileCost=4194304, fromTimestamp=1745239382655, maxFilesPerMicroBatch=2147483647, maxRecordsPerMicroBatch=50000000 2025-04-21 12:43:04.412 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - InitialOffsetStore created with location s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0 2025-04-21 12:43:04.428 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Found existing offset file at s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0, reading 2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Read offset Streaming Offset[-1: position (-1) scan_all_files (false)] from s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0 2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Initial offset set to Streaming Offset[-1: position (-1) scan_all_files (false)] 2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Skip delete snapshots=true, skip overwrite snapshots=true 2025-04-21 12:43:04.806 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Deserialized offset from JSON {"version":1,"snapshot_id":1823134505898519413,"position":8,"scan_all_files":false}: Streaming Offset[1823134505898519413: position (8) scan_all_files (false)] 2025-04-21 12:43:04.806 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Deserialized offset from JSON {"version":1,"snapshot_id":3821473473156059401,"position":24,"scan_all_files":false}: Streaming Offset[3821473473156059401: position (24) scan_all_files (false)] 2025-04-21 12:43:04.867 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.aws.glue.GlueCatalog [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Using optimistic locking for Glue Data Catalog tables. 2025-04-21 12:43:05.152 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot [id=1823134505898519413, dateTime=2025-04-19T20:53:29.457Z, ageHours=39, startFileIndex=8, endFileIndex=8] generated 0 file scan tasks 2025-04-21 12:43:05.265 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot [id=267320482196197876, dateTime=2025-04-19T20:54:28.367Z, ageHours=39, startFileIndex=0, endFileIndex=8] generated 8 file scan tasks 2025-04-21 12:43:05.266 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Skipping processing for snapshot id=7905888728843236371 operation=replace 2025-04-21 12:43:05.352 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot [id=7421363600347898867, dateTime=2025-04-19T20:55:33.095Z, ageHours=39, startFileIndex=0, endFileIndex=8] generated 8 file scan tasks 2025-04-21 12:43:05.471 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot [id=1672029328327466770, dateTime=2025-04-19T20:56:31.462Z, ageHours=39, startFileIndex=0, endFileIndex=9] generated 9 file scan tasks 2025-04-21 12:43:05.561 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot [id=7190315158754434941, dateTime=2025-04-19T20:57:26.865Z, ageHours=39, startFileIndex=0, endFileIndex=8] generated 8 file scan tasks ... 2025-04-21 12:43:11.720 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot [id=3821473473156059401, dateTime=2025-04-19T22:01:46.828Z, ageHours=38, startFileIndex=0, endFileIndex=24] generated 24 file scan tasks 2025-04-21 12:43:11.721 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - planFiles returned 764 file scan tasks. total_files=764, total_size_in_bytes=178383916431. Time taken to eval stats 0 ms 2025-04-21 12:43:11.734 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Split into 1484 combined scan tasks 2025-04-21 12:43:11.736 DEBUG [ip-0-0-0-0.ec2.internal] - org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Created 1484 SparkInputPartitions ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org