stevenzwu commented on code in PR #6765: URL: https://github.com/apache/iceberg/pull/6765#discussion_r1100716901
########## docs/flink-getting-started.md: ########## @@ -747,6 +747,44 @@ FlinkSink.builderFor( .append(); ``` +### monitoring metrics + +The following Flink metrics are provided by the Flink Iceberg sink. + +Parallel writer metrics are added under the sub group of `IcebergStreamWriter`. +They should have the following key-value tags. +* table: full table name (like iceberg.my_db.my_table) +* subtask_index: writer subtask index starting from 0 + + Metric name | Metric type | Description | +| ------------------------- |------------|-----------------------------------------------------------------------------------------------------| +| lastFlushDurationMs | Gague | The duration (in milli) that writer subtasks take to flush and upload the files during checkpoint. | +| flushedDataFiles | Counter | Number of data files flushed and uploaded. | +| flushedDeleteFiles | Counter | Number of delete files flushed and uploaded. | +| flushedReferencedDataFiles| Counter | Number of data files referenced by the flushed delete files. | +| dataFilesSizeHistogram | Histogram | Histogram distribution of data file sizes (in bytes). | +| deleteFilesSizeHistogram | Histogram | Histogram distribution of delete file sizes (in bytes). | + +Committer metrics are added under the sub group of `IcebergFilesCommitter`. +They should have the following key-value tags. +* table: full table name (like iceberg.my_db.my_table) + + Metric name | Metric type | Description | +|---------------------------------|--------|----------------------------------------------------------------------------| +| lastCheckpointDurationMs | Gague | The duration (in milli) that the committer operator checkpoints its state. | +| lastCommitDurationMs | Gague | The duration (in milli) that the Iceberg table commit takes. | +| committedDataFilesCount | Counter | Number of data files committed. | +| committedDataFilesRecordCount | Counter | Number of records contained in the committed data files. | +| committedDataFilesByteCount | Counter | Number of bytes contained in the committed data files. | +| committedDeleteFilesCount | Counter | Number of delete files committed. | +| committedDeleteFilesRecordCount | Counter | Number of records contained in the committed delete files. | +| committedDeleteFilesByteCount | Counter | Number of bytes contained in the committed delete files. | +| elapsedSecondsSinceLastSuccessfulCommit| Gague | Elapsed time (in seconds) since last successful Iceberg commit. | + +`elapsedSecondsSinceLastSuccessfulCommit` is an ideal alerting metric for these scenarios. Review Comment: it refers the scenarios described below. I assume you also mean the descriptions below aren't clear. I updated the description. Please check if it helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org