Hi Marco,

Thank you for sharing the insights. The discussion is intended for setting 
goals so that future design improvement to the CI can take these goals into 
consideration. Thus, while I fully recognize that there could be difficulty in 
implementation, I'd still like to confirm with the community if the outlined 
access control recommendation is at the right level.

To summarize your concerns:
- opening up access control should be conditioned on having good version 
control and roll-back mechanism to ease the operation burden from breakage, 
which is more likely given larger user base.
- upgrades to the system would be better managed as planned and collective 
efforts instead of adhoc tasks performed by uncoordinated individuals.

You also mentioned that "changes to the system should only be done by the 
administrators". It's exactly the intention of this thread is to define who 
would qualify as administrators. Currently, such qualification is opaque, and 
only happens within a group in Amazon.

On the other hand, this current way can, and already has caused friction. When 
this project's daily activity of validating and merging code is affected due to 
the system's instability, the community members have no choice but to wait for 
the issues to be resolved by the current system administrators. Other affected 
community members have no way to help even if they wish to.

Given the existing Apache project governance model, I'd recommend that the goal 
for CI access control be set so that committer and PMC member who wishes to be 
involved should have the right to help.

-sz

On 2019/09/17 12:49:20, Marco de Abreu <[email protected]> wrote: 
> Ah, with regards to #1 and #2: Currently, we don't have any plugins that
> control the actions of a single user and allows us to monitor and rate
> limit them. Just giving trigger permission (which is also tied with
> abort-permission if I recall correctly), would allow a malicious user to
> start a huge number of jobs and thus either create immense costs or bring
> down the system. Also, we'd have to check how we can restrict the trigger
> permission to specific jobs.
> 
> -Marco
> 
> On Tue, Sep 17, 2019 at 2:47 PM Marco de Abreu <[email protected]>
> wrote:
> 
> > Hi Sheng,
> >
> > will I'm in general all in favour of widening the access to distribute the
> > tasks, the situation around the CI system in particular is a bit more
> > difficult.
> >
> > As far as I know, the creation of the CI system is neither automated,
> > versioned nor backed up or safeguarded. This means that if somebody makes a
> > change that breaks something, we're left with a broken system we can't
> > recover from. Thus, I preferred it in the past to restrict the access as
> > much as possible (at least to Prod) to avoid these situations from
> > happening. While #1 and #2 are already possible today (we have two roles
> > for committers and regular users that allow this already), #3 and #4 come
> > with a significant risk for the stability of the system.
> >
> > As soon as a job is added or changed, a lot of things happen in Jenkins -
> > one of these tasks is the SCM scan which tries to determine the branches
> > the job should run on. For somebody who is inexperienced, the first pitfall
> > is that suddenly hundreds of jobs are being spawned which will certainly
> > overload Jenkins and render it unusable. There are a lot of tricks and I
> > could elaborate them, but basically the bottom line is that the
> > configuration interface of Jenkins is far from fail-proof and exposes a
> > significant risk if accessed by somebody who doesn't exactly know what
> > they're doing - speak, we would need to design some kind of training and
> > even that would not safeguard us from these fatal events.
> >
> > There's the whole security aspect around user-facing artifact generation
> > of CI/CD and the possibility of them being tampered, but I don't think I
> > have to elaborate that.
> >
> > With regards to #4 especially, I'd say that the risk of somebody just
> > upgrading the system or changing plugins inherits an even bigger risk.
> > Plugins are notoriously unsafe and system updates have also shown to not
> > really go like a breeze. I'd argue that changes to the system should only
> > be done by the administrators of it since they have a bigger overview over
> > all the things that are currently going on while also having the full
> > access (backups before making changes, SSH access, log access, metric
> > access, etc) to debug errors. In the end we shouldn't forget that this is a
> > productive system - usually, you'd have nobody being able to touch it at
> > all, but we're not in a perfect world, so I'd say we should restrict it to
> > a bare minimum in the form of admins.
> >
> > So while I certainly understand and encourage to distribute the access, I
> > don't feel comfortable widening the access to such a critical productive
> > system. It being down means that the GitHub development is fully halted,
> > which is really problematic since we don't have rollback mechanisms.
> >
> > Best regards,
> > marco
> >
> > On Sun, Sep 15, 2019 at 6:40 AM Sheng Zha <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> I'd like to initiate discussion on how access control should be managed
> >> for the CI system. The hope is that we can present the conclusion of this
> >> discussion as the recommendation and request to the donors of the CI system
> >> from Amazon.
> >>
> >> The specific aspects I'd like to discuss are the abilities to:
> >> 1. trigger PR validation and nightly jobs.
> >> 2. trigger continuous delivery jobs, such as for binary releases in pip,
> >> maven, and dockerhub.
> >> 3. add jobs to the CI system.
> >> 4. maintain and manage the CI system, such as system upgrades and jenkins
> >> plugin installation.
> >>
> >> Given that we already have GitHub SSO enabled on the Jenkins CI, I
> >> suggest the following authentication levels for these items:
> >> 1. all authenticated GitHub users.
> >> 2-4. all MXNet committers
> >>
> >> What do you think? If you have more aspects that you wish to discuss,
> >> feel free to propose.
> >>
> >> -sz
> >>
> >
> 

Reply via email to