On Fri, 09 Nov 2007 09:40:25 -0800 Vincent Fox <[EMAIL PROTECTED]> wrote:
> If there's something that 3 admins could do to alleviate load we did it. > > The bigger problem I am seeing is that Cyrus doesn't in our > usage seem to ramp load smoothly or even predictably. It goes > fine up to a certain point, and then you hit a brick wall without > very much in the way of warning. You add that small chunk of > extra users or load and suddenly everything goes to hell. Keeping > the user-count per instance appropriate was the only thing we did > over multiple days of desperate "try this" that did the job. > > A generous engineering cushion of capacity seems more critical than usual. In my expirience the "brick wall" you describe is what happens when disks reach a certain point of random IO that they cannot keep up with. In cases such as yours, the only reasonable thing is announce some kind of "extended maintenance" to users so they dont bog you down with whining and then go metodologicaly through the system, testing and eliminating possible causes one by one until you zero-in to the root cause. If this is some solaris "feature" as you suspect, then I think a Dtrace expert is a man you're looking for. I'm still on linux and was thinking a lot about trying out solaris 10, but stories like yours will make me think again about that ... -- Jure Pečar http://jure.pecar.org/ ---- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html