Hi All, Could you kindly help in reviewing the following PR: [LIVY-866] Optimizing Yarn GetApplications Query to prevent additional load on Yarn and Livy by akshatb1 · Pull Request #327 · apache/incubator-livy (github.com) <https://github.com/apache/incubator-livy/pull/327>?
*Brief Details of the change:* Currently Livy queries Yarn applications by applicationType : Spark. This will put a heavy load on Yarn clusters if there are thousands or more Spark applications in all states (running, finished, failed, queued etc.). A better approach would be to query the applications by tags in addition to job type since Livy only needs to track applications with certain application tags. However, YarnClient does not expose any API to query applications by tags. As part of this implementation, extending the YarnClientImpl and implementing getApplications method which can take GetApplicationRequest as parameter. Instead of querying all SPARK applications, query SPARK applications with required tags to avoid load on Yarn and Livy servers. Appreciate your help. Thanks, Akshat
