gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2317080098
Hey @Fokko, Thanks for your response and thanks for the explanation! I might miss some pieces of information here, but checked the snapshot summary in the metadata.jsons and compared them with ScanMetrics and they don't seem to share the same metrics. For instance there is totalPlanningDuration, skippedDataManifest and etc that are not part of the the snapshot summary but part of ScanMetrics. So if I'm not mistaken there is a way to enhance ScanMetrics with additional metrics but not to grow the metadata.json size any further (that I also agree that grows way too big containing the historical snapshot's summaries). About a debugging scenario what I have in mind is that many times debugging is needed for a remote user/customer where you get a report about some issue, you get some query profiles and some logs, and then this is what you can use to come up with a root cause. Sometimes you don't have the opportunity to run additional queries on the user's end, or it might take some extra time to ask them to run something for you and get back with the results. So instead of this, what I have in mind is to have a wide collection of metrics available in Iceberg after running a query or calling planFiles etc., this could be embedded into any query engine's query profile, and once a user faces an issue, checking the query profile will provide us enough information to judge the root cause and no turnarounds would be required to ask for additional queries like checking some metadata tables, etc. That is inevitable in some cases, sure, but could be avoided in other cases where collecting more metrics would help. I hope this makes sense :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org