Abstract Livy is a web service that exposes a REST interface for managing long running Apache Spark contexts in your cluster. With Livy, new applications can be built on top of Apache Spark that require fine grained interaction with many Spark contexts [1].
While this project has been well regarded and used in many contexts as the defacto standard API to Spark environments, it has been incubating for over 5 years without graduation to a TLP and it has become difficult to impossible for fixes and improvements to be contributed as the current community seems to have moved on. There has been discussion regarding retirement of this podling where there seems to be some increasing interest in joining and reviving the community [2]. The intent of this proposal is to avoid retiring a well regarded, actively used and rather mature project by reviving the PPMC and community with new folks that have a vested interest in the project and health of the community. Proposal We propose to revive the PPMC with a set of contributors and maintainers as mentors, PPMC members and committers. The retirement DISCUSS thread [2] has shown a growing interest in providing new committers and bringing improvements and fixes from organization’s internally maintained forks back to a revived community. General Approach to Revival: - Add new Mentors - Larry McCay, lmc...@apache.org , Cloudera - Sunil Govindan, sun...@apache.org, Cloudera - Imran Rashid - iras...@apache.org, Cloudera - Add new Committers/PPMC - Larry McCay, lmc...@apache.org, Cloudera - Vinod Kumar Vavilapalli, vino...@cloudera.com, Cloudera - Gyorgy Gal, ggal ,gal.gyo...@gmail.com, Cloudera - Wing Yew Poon, wyp...@cloudera.com, Cloudera - Xilang Yan, xilang....@gmail.com, Shopee - Jianzhen Wu, myjianz...@gmail.com, Shopee - Nagella Jagadeewara Rao, jnage...@visa.com, Visa - Pralab Kumar, pralk...@visa.com, Visa - Prasad Shrikant, shrikant....@gmail.com, Visa - Brahma Reddy Battula, bra...@apache.org, Visa - Invite existing PPMC members to opt-in or otherwise go emeritus - Jean-Baptiste Onofré, jbono...@apache.org, Talend (opted-in via Retirement DISCUSS thread [2]) - Invite existing Committers/PPMC members to opt-in or otherwise go emeritus - Establish Roadmap via follow up DISCUSS thread - Known Improvements from Forks which will need proposals and discussion: - Adding HA for Livy - Updating security capabilities (eg. kerberos for jdbc, fixing bugs in encryption) - Expanding the support for kubernetes - Responding to CVEs in dependencies (eg. log4j, thrift) - Livy rest cluster - IS THIS SAME AS HA for Livy ABOVE? - Support multi Spark versions - Implemented a metrics system for Livy - Support customize batch/interactive session lifecycle event handler, default log event with log4j, very helpful for trouble shooting - Optimize log to track which session id the log message came from, also very helpful for trouble shooting - Support customize Spark config optimization rules, can be used to optimize config for users’ job - A set of command line tool which can be used to replace Spark’s spark-submit, pyspark, spark-sql but actually submit application in Livy - We are planning to implement a JDBC state store, and allow multi Livy Thrift sessions to share one backend Spark application in the next few months. - These items and others that are brought to community may need consolidation or multiple configurable options and will need to be part of the discussion - One-pager Livy Improvement Proposals (LIP) may make sense to drive these discussions and convergence - Feature Branch Strategy for large changes - Large features are hard to review we will need to define a strategy - Determine the Improvements to be delivered across first 3 Releases with Target Release Dates - Ensure CVE and Dependency management hygiene is in place The above approach will usher the community back to an active status with a Roadmap of 3 or more release plans and security hygiene in place. Development Practices The Livy project follows a review before commit philosophy. Every commit automatically runs through the unit tests and generates coverage reports presented as a pull request comment. Our experience with this process leads us to believe that it helps ease new contributors into the project. They get feedback quickly on common mistakes, lowering the burden on reviewers. Those same reviewers get to lead by example, showing the new contributors that we value feedback within our community even when changes are done by more experienced folks. Taken from the original Apache Livy Proposal [1], this should continue to be true. As mentioned, Livy is a mature project and as such RTC is the most appropriate for continued quality and awareness. 1. Original Apache Livy Proposal https://cwiki.apache.org/confluence/display/incubator/LivyProposal 2. Retirement DISCUSS thread https://lists.apache.org/thread/gcstsrhbp91c5mm55htqn1l3djv8m7o0