Hi,
  I was also poking at the same systems with Claudio and I can say for sure 
that we did see servers 'recover' (they at least returned to 'normal' cpu 
load) on their own without restarting.  I'd also like to clarify that we 
were accidentally running 24 threads per server and 3 servers when this 
happened on a '24 core system' (12 hyperthreaded cores).  We've since 
backed off to 8 threads and 3 servers and not seen the issues since but are 
worried we might.  I don't know enough about memcache's internals but I was 
speculating that starving a thread of CPU time could cause some blocking 
until the right thread got some CPU?  Also, do you have a feel for threads 
vs. servers?  I certainly ran across a few postings that suggested running 
~4 threads and multiple instances per server was a more efficient 
configuration than lots of threads.
  Thanks,  Keith




On Thursday, August 7, 2014 9:49:12 PM UTC-4, Dormando wrote:
>
> No command can take up much time. If all other commands hang up, it's 
> either a long-running stats command like I listed before, or a hang bug 
> (though I don't know why it would recover on its own). We've fixed a lot 
> of those since .13, so I'd still advocate upgrading at least some 
> instances to see if they become immune to it. 
>
> On Thu, 7 Aug 2014, Claudio Santana wrote: 
>
> > 
> > I think this issue has something to do with our access pattern (although 
> we run very limited commands and not very high traffic 
> > either). 
> > 
> > We always start having issues on the same instance (I guess because of 
> the system accessing a specific key). When we notice the 
> > issue we bounce the instance within 15/20 mins, I don't know if you 
> think this is not enough time to recover. 
> > 
> > Sometimes the issue "moves" to other instaces in other servers (our 
> client doesn't rebalance so the system is trying to access 
> > completely different keys). On the other servers sometimes the issue 
> goes away on its own or the spike is not at 100pct. 
> > 
> > On Aug 7, 2014 6:36 PM, "dormando" <[email protected] <javascript:>> 
> wrote: 
> >       Those three stats commands aren't problematic. The others I listed 
> are. 
> >       Sadly there aren't stats counters for them, I think... Are you 
> sure it's 
> >       not completely crashing after the CPU spike? it actually recovers 
> on its 
> >       own? 
> > 
> >       On Thu, 7 Aug 2014, Claudio Santana wrote: 
> > 
> >       > 
> >       > I run every minute stats, stats items and stats slabs. 
> >       > 
> >       > the only commands executed are remove, incr, add, get, set and 
> cas. 
> >       > 
> >       > I'm running now with 6 threads per instance with 3 per server 
> and haven't had the issue again,  not that this 
> >       change fixed it. 
> >       > 
> >       > I'll definitely update. 
> >       > 
> >       > On Aug 7, 2014 6:13 PM, "dormando" <[email protected] 
> <javascript:>> wrote: 
> >       >       Please upgrade. If you have problems with the latest 
> version we can look 
> >       >       into it more. 
> >       > 
> >       >       You can also look at command counters for odd commands 
> being given: make 
> >       >       sure nobody's running flushes, or "stats sizes", or "stats 
> cachedump" 
> >       >       since those can cause CPU spikes and hangs. 
> >       > 
> >       >       With 1.4.20 you can use "stats conns" to see what the 
> connections are 
> >       >       doing during the cpu spike. 
> >       > 
> >       >       On Thu, 7 Aug 2014, Claudio Santana wrote: 
> >       > 
> >       >       > Forgot to say I'm running version 1.4.13  libevent 
> 2.0.16-stable 
> >       >       > 
> >       >       > 
> >       >       > 
> >       >       > On Thu, Aug 7, 2014 at 6:08 PM, Claudio Santana <
> [email protected] <javascript:>> wrote: 
> >       >       >       Sorry for the late response. 
> >       >       > 
> >       >       > My CPU utilization normally is min 2.5% to 6.5% max. 
> >       >       > 
> >       >       > So it's interesting you ask this. The reason why I 
> submitted the 1st question is because I've experienced 
> >       some 
> >       >       random CPU 
> >       >       > utilization spikes. From this about 6% CPU utilization 
> all of the sudden it spikes to 100% and I can see 
> >       the 
> >       >       offending 
> >       >       > process is one of the Memcached instances. Sadly this 
> CPU spike is accompanied by all requests timing out 
> >       causing 
> >       >       the 
> >       >       > whole system to become unusable. 
> >       >       > 
> >       >       > I collect minute by minute stats of all these memcached 
> instances and according to my stats this issue 
> >       happens 
> >       >       within 2 
> >       >       > minutes. I can see in the number of commands there's no 
> increase in number of commands being issued right 
> >       before 
> >       >       the CPU 
> >       >       > spike nor increase in the number of bytes in/out. 
> >       >       > 
> >       >       > Does anybody have any ideas of what could be going on? 
> >       >       > 
> >       >       > I have all Memcached stats collected by minute in 
> Graphite, I can provide other stats that could help 
> >       explain this 
> >       >       issue 
> >       >       > if necessary. 
> >       >       > 
> >       >       > 
> >       >       > On Mon, Aug 4, 2014 at 9:36 PM, dormando <
> [email protected] <javascript:>> wrote: 
> >       >       >       You could run one instance with one thread and 
> serve all of that just 
> >       >       >       fine. have you actually looked at graphs of the 
> CPU usage of the host? 
> >       >       >       memcached should be practically idle with load 
> that low. 
> >       >       > 
> >       >       >       One with -t 6 or -t 8 would do it just fine. 
> >       >       > 
> >       >       >       On Mon, 4 Aug 2014, Claudio Santana wrote: 
> >       >       > 
> >       >       >       > Dormando, thanks for the quick response. Sorry 
> for the confusion, I don't have exact metrics per 
> >       second 
> >       >       but 
> >       >       >       per minute 1.12 
> >       >       >       > million sets and 1.8 million gets which 
> translates to 18,666 sets per minute and 30,000 gets per 
> >       second. 
> >       >       >       > 
> >       >       >       > These stats are per Memcached instance which I 
> currently run 3 on each server. 
> >       >       >       > 
> >       >       >       > Claudio. 
> >       >       >       > 
> >       >       >       > 
> >       >       >       > On Mon, Aug 4, 2014 at 6:22 PM, dormando <
> [email protected] <javascript:>> wrote: 
> >       >       >       >       On Mon, 4 Aug 2014, Claudio Santana wrote: 
> >       >       >       > 
> >       >       >       >       > I have this Memcached cluster where 3 
> instances of Memcached run in a single server. These 
> >       servers 
> >       >       >       have 24 cores, 
> >       >       >       >       each instance 
> >       >       >       >       > is configured to have 8 threads each. 
> Each individual instance serves  have about 5000G 
> >       gets/sets 
> >       >       a 
> >       >       >       day and about 
> >       >       >       >       3k current 
> >       >       >       >       > connections. 
> >       >       >       > 
> >       >       >       > I don't know what "5000G gets/sets a day" 
> translates to in per-second (nor 
> >       >       >       > what the G-unit even is?), can you define this? 
> >       >       >       > 
> >       >       >       > > What would be better? consolidate these 3 
> instances to a single instance per server with 24 
> >       threads? 
> >       >       I've 
> >       >       >       read in a few 
> >       >       >       > articles 
> >       >       >       > > that Memcached's performance starts suffering 
> with more than 4-6 threads per instance, is this 
> >       generally 
> >       >       >       true? 
> >       >       >       > > 
> >       >       >       > > How about keeping the 3 instances per server 
> and decreasing the number of threads to say 4 or 6? 
> >       or 
> >       >       >       creating 4 instances 
> >       >       >       > in the 
> >       >       >       > > same servers instead of 3 and decreasing the 
> number of threads per instance to 6 so there is one 
> >       thread 
> >       >       >       per core. 
> >       >       >       > > 
> >       >       >       > > Is there a guide you could recommend to 
> configure the right number of threads and strategies to 
> >       get the 
> >       >       >       most out of a 
> >       >       >       > Memcached 
> >       >       >       > > server/instance? 
> >       >       >       > > 
> >       >       >       > > Thanks, 
> >       >       >       > > Claudio 
> >       >       >       > > 
> >       >       >       > > -- 
> >       >       >       > > 
> >       >       >       > > --- 
> >       >       >       > > You received this message because you are 
> subscribed to the Google Groups "memcached" group. 
> >       >       >       > > To unsubscribe from this group and stop 
> receiving emails from it, send an email to 
> >       >       >       > [email protected] <javascript:>. 
> >       >       >       > > For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       >       > > 
> >       >       >       > > 
> >       >       >       > 
> >       >       >       > -- 
> >       >       >       > 
> >       >       >       > --- 
> >       >       >       > You received this message because you are 
> subscribed to the Google Groups "memcached" group. 
> >       >       >       > To unsubscribe from this group and stop 
> receiving emails from it, send an email to 
> >       >       >       [email protected] <javascript:>. 
> >       >       >       > For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       >       > 
> >       >       >       > 
> >       >       >       > -- 
> >       >       >       > 
> >       >       >       > --- 
> >       >       >       > You received this message because you are 
> subscribed to the Google Groups "memcached" group. 
> >       >       >       > To unsubscribe from this group and stop 
> receiving emails from it, send an email to 
> >       >       >       [email protected] <javascript:>. 
> >       >       >       > For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       >       > 
> >       >       >       > 
> >       >       > 
> >       >       >       -- 
> >       >       > 
> >       >       >       --- 
> >       >       >       You received this message because you are 
> subscribed to the Google Groups "memcached" group. 
> >       >       >       To unsubscribe from this group and stop receiving 
> emails from it, send an email to 
> >       >       >       [email protected] <javascript:>. 
> >       >       >       For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       > 
> >       >       > 
> >       >       > 
> >       >       > -- 
> >       >       > 
> >       >       > --- 
> >       >       > You received this message because you are subscribed to 
> the Google Groups "memcached" group. 
> >       >       > To unsubscribe from this group and stop receiving emails 
> from it, send an email to 
> >       >       [email protected] <javascript:>. 
> >       >       > For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       > 
> >       >       > 
> >       > 
> >       >       -- 
> >       > 
> >       >       --- 
> >       >       You received this message because you are subscribed to 
> the Google Groups "memcached" group. 
> >       >       To unsubscribe from this group and stop receiving emails 
> from it, send an email to 
> >       >       [email protected] <javascript:>. 
> >       >       For more options, visit https://groups.google.com/d/optout. 
>
> >       > 
> >       > -- 
> >       > 
> >       > --- 
> >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group. 
> >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to 
> >       [email protected] <javascript:>. 
> >       > For more options, visit https://groups.google.com/d/optout. 
> >       > 
> >       > 
> > 
> >       -- 
> > 
> >       --- 
> >       You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> >       To unsubscribe from this group and stop receiving emails from it, 
> send an email to 
> >       [email protected] <javascript:>. 
> >       For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> >

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to