gaborkaszab commented on PR #5837:
URL: https://github.com/apache/iceberg/pull/5837#issuecomment-2317080098

   Hey @Fokko,
   Thanks for your response and thanks for the explanation!
   
   I might miss some pieces of information here, but checked the snapshot 
summary in the metadata.jsons and compared them with ScanMetrics and they don't 
seem to share the same metrics. For instance there is totalPlanningDuration, 
skippedDataManifest and etc that are not part of the the snapshot summary but 
part of ScanMetrics. So if I'm not mistaken there is a way to enhance 
ScanMetrics with additional metrics but not to grow the metadata.json size any 
further (that I also agree that grows way too big containing the historical 
snapshot's summaries).
   
   About a debugging scenario what I have in mind is that many times debugging 
is needed for a remote user/customer where you get a report about some issue, 
you get some query profiles and some logs, and then this is what you can use to 
come up with a root cause. Sometimes you don't have the opportunity to run 
additional queries on the user's end, or it might take some extra time to ask 
them to run something for you and get back with the results.
   So instead of this, what I have in mind is to have a wide collection of 
metrics available in Iceberg after running a query or calling planFiles etc., 
this could be embedded into any query engine's query profile, and once a user 
faces an issue, checking the query profile will provide us enough information 
to judge the root cause and no turnarounds would be required to ask for 
additional queries like checking some metadata tables, etc. That is inevitable 
in some cases, sure, but could be avoided in other cases where collecting more 
metrics would help.
   
   I hope this makes sense :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to