mcvsubbu commented on PR #9058: URL: https://github.com/apache/pinot/pull/9058#issuecomment-1197352989
> > @mcvsubbu > > > > 1. @npawar @Jackie-Jiang ? I might have just very rough, possibly inaccurate numbers. > > 2. I feel the need of a control plane level API within pinot to give an overall view into current and past state of minion tasks is of importance to us. Task generator being a key part of the entire minion task flow. While metrics can help to some extent, having details like failure stack traces etc might be difficult. This api avoids having to tally metrics and debug logs from a separate log processing system. > > 3. I suppose it could. But integrating the log processing framework into pinot APIs themselves might be a bit of a challenge. Having a system table for these kind of usecases might be the right way forward, such that pinot itself can store and serve debug data and status metrics for each of the components / flows. Essentially move from in-memory storage of logs and metrics into the system table > > > > @npawar @Jackie-Jiang to add more > > for 1: typically for users of RealtimeToOfflineTask, tasks get generated hourly. For SegmentGenerationAndPushTask, it can be way more frequently, depending on the the number of times files are generated in the source dir. MergeRollupTasks, are less frequent, but still several a day. There might be others, but this is what we see most commonly setup by users in oss. In a typical setup, all of these would be configured. It becomes quite confusing for users to have to find the exact exception in the logs, especially because some logs are in controller (scheduler related) and some in minion (task execution related). This API will help us make the feedback loop quicker, especially when we add this into the new Minion tab on the Pinot Admin UI > > Regarding info already in Helix, these scheduler related exceptions are not present in the Helix generated metadata. I still think log processing should be the answer to this (and other similar PRs that may come up in the future). We should not be adding a new API for every error condition we may encounter in the system (and log something). @siddharthteotia , @npawar , @Jackie-Jiang , @snleee, @kishoreg, @mayankshriv what do you think? If the PMCs don't have any objection to this then I can live with this, but I am willing to bet that more such PRs will come up because logs are difficult to read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org