On Fri, Jun 13, 2014 at 4:41 AM, Steven Howes <[email protected]> wrote:
> On 13 Jun 2014, at 08:12, Matthew Jordan <[email protected]> wrote: > > Apologies if this e-mail gets a bit rambling; by the time I send this it > will be past 2 AM here in the US and we've been scrambling to fix the > regression caused by r415972 without reintroducing the vulnerability it > fixed for the past 9 hours or so. > > > > Clearly, there are things we should have done better to catch this > before the security releases went out yesterday. The regression was serious > enough that plenty of tests in the Test Suite caught the error - in fact, > development of a test on a local dev machine was how we discovered that the > regression had occurred. > > I’ve not been directly involved with the whole commit/testing procedure, > so excuse me if I’m misreading anything.. > > If it fails the tests, how was it released? I understand the whole reduced > transparency/communications thing, it’s an unfortunate necessity of dealing > with security issues. I can’t see how that excludes the testing carried out > by the Test Suite though? > > Kind regards, > > Disregarding local test suite runs, a few things happened here: (1) Four security patches were made at roughly the same time. Unfortunately, the patch with the issue was the last one to get committed - and by the time that occurred, there were a large number of jobs scheduled in front of it. (2) The order of execution of jobs in Bamboo is the following: (a) Basic build (simple compile test) on first available build agent => (b) Full build (multiple compile options, e.g., parallel builds) on all different flavors of build agent => (c) Unit test run => (d) Channel driver tests in the Test Suite => (e) ARI tests in the Test Suite Nightly, a full run of the test suite takes place. This issue would have been caught by step (d) - but each of the previous steps takes awhile to complete (Asterisk doesn't compile quickly). A test suite run takes a long time - even with the reduced sets of tests in steps (d) and (e). Each merge in a branch causes this process to kick off - and there were at least 7 iterations of this in front of it. Which leads to point #3: (3) The merge process on the offending patch was slowed down due to merge conflicts between branches. The merging of the patch into all branches wasn't complete until nearly 3 PM, which meant we had very little time to get the releases out - generally, we strive hard to get the security releases out the door as early as possible, so system administrators have time that day to upgrade their systems if they are affected. All of that aside, there's a few things (again, beyond running the test suite locally) that could be done to improve the situation: (a) Add a 'smoke test' to the Test Suite that gets run either in the Basic Build or Full Build steps. This would do some very simple things: originate a call over AMI with a Local channel, use a SIP channel to connect to another instance of Asterisk, pass media/DTMF, bounce back to the test using AGI, and maybe a few other things. Such a test could hit a lot of our normal 'hot spots' and - if run early enough in the cycle - would flag developers quicker than the current process. (b) Throw some more hardware at the problem. Right now, we have a single 32-bit/64-bit CentOS 6 machine - we could easily double that up, which would get results faster. -- Matthew Jordan Digium, Inc. | Engineering Manager 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA Check us out at: http://digium.com & http://asterisk.org
-- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
