Hi All,

Could you kindly help in reviewing the following PR: [LIVY-866] Optimizing
Yarn GetApplications Query to prevent additional load on Yarn and Livy by
akshatb1 · Pull Request #327 · apache/incubator-livy (github.com)
<https://github.com/apache/incubator-livy/pull/327>?


*Brief Details of the change:*

Currently Livy queries Yarn applications by applicationType : Spark. This
will put a heavy load on Yarn clusters if there are thousands or more Spark
applications in all states (running, finished, failed, queued etc.).
A better approach would be to query the applications by tags in addition to
job type since Livy only needs to track applications with certain
application tags. However, YarnClient does not expose any API to query
applications by tags.

As part of this implementation, extending the YarnClientImpl and
implementing getApplications method which can take GetApplicationRequest as
parameter. Instead of querying all SPARK applications, query SPARK
applications with required tags to avoid load on Yarn and Livy servers.

Appreciate your help.
Thanks,
Akshat

Reply via email to