Re: [PR] Add detailed debug and warn logging to SparkMicroBatchStream [iceberg]

via GitHub Mon, 21 Apr 2025 10:06:43 -0700


bk-mz commented on PR #12856:
URL: https://github.com/apache/iceberg/pull/12856#issuecomment-2819031939


   I'll take a look on above. 
   
   JIC, logs from `latestOffset`
   
   ```text
   2025-04-21 14:04:36.643 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - latestOffset(startOffset=Streaming 
Offset[3821473473156059401: position (24) scan_all_files (false)], 
limit=MaxRows: 50000000) called
   2025-04-21 14:04:36.857 INFO  [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.BaseMetastoreTableOperations [stream execution thread for 
[id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Refreshing table metadata from new 
version: 
s3://my-table/table/metadata/15032-bd9347a5-4e80-4073-803d-6f0e4f8e14f0.metadata.json
   2025-04-21 14:04:37.061 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Effective startingOffset=Streaming 
Offset[3821473473156059401: position (24) scan_all_files (false)]
   2025-04-21 14:04:37.062 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Start position in snapshot=24, 
scanAllFiles=false
   2025-04-21 14:04:37.106 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Snapshot 3821473473156059401 has 1 
manifest files after skipping
   2025-04-21 14:04:37.106 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Reading manifest 
s3://my-table/table/metadata/6c61afec-a595-4f22-898c-8b3eb71a670c-m0.avro at 
index 0
   2025-04-21 14:04:37.153 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Next valid snapshot is 
2193338683398121123
   2025-04-21 14:04:37.153 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Moving to next snapshot 
2193338683398121123, scanAllFiles reset to false
   2025-04-21 14:04:37.193 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Snapshot 2193338683398121123 has 1 
manifest files after skipping
   2025-04-21 14:04:37.193 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Reading manifest 
s3://my-table/table/metadata/3ab6f7f5-569d-409d-8c17-a4bbd397e522-m0.avro at 
index 0
   ...
   2025-04-21 14:04:38.006 DEBUG [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Limits reached: filesAdded=175 
(max=2147483647), recordsCount=49840537 (max=50000000), stopping
   2025-04-21 14:04:38.006 INFO  [ip-172-31-8-223.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
a33ebf30-ab06-4354-8cdc-b0f2afa16849]] - Computed next streaming offset 
[offset=Streaming Offset[1601835420991821798: position (10) scan_all_files 
(false)]] after filling the batch: [files=175, records=49840537, 
bytes=41418549762, snapshots=11]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add detailed debug and warn logging to SparkMicroBatchStream [iceberg]

Reply via email to