I have opened a pull request with a preliminary implementation for a 
settings command: https://github.com/memcached/memcached/pull/255

I took a few liberties, so let me know if anything is out of line.

On Wednesday, January 25, 2017 at 1:52:24 PM UTC-8, Dormando wrote:
>
> Yeah gimme a few weeks maybe. Reducing those syscalls is like almost all 
> of the CPU usage. Difference between 1.2m keys/sec and 35m keys/sec on 20 
> cores in my own tests. 
>
> I did this: 
> https://github.com/memcached/memcached/pull/243 
> .. which would help batch perf. 
> and this: 
> https://github.com/memcached/memcached/pull/241 
> .. which should make binprot perf better at nearly undetectable cost to 
> ascii. 
>
> so, working my way to it. 
>
> On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: 
>
> > Yes, our production traffic all uses binary protocol, even behind our 
> on-server proxy that we use. In fact, if you have a way to reduce syscalls 
> by batching responses, that 
> > would solve another huge pain we have that's of our own doing. 
> > 
> > 
> > Scott Mansfield 
> > Product > Consumer Science Eng > EVCache > Sr. Software Eng 
> > { 
> >   M: 352-514-9452 
> >   E: [email protected] <javascript:> 
> >   K: {M: mobile, E: email, K: key} 
> > } 
> > 
> > On Wed, Jan 25, 2017 at 11:33 AM, dormando <[email protected] 
> <javascript:>> wrote: 
> >       Okay, so it's the big rollup that gets delayed. Makes sense. 
> > 
> >       You're using binary protocol for everything? That's a major focus 
> of my 
> >       performance annoyance right now, since every response packet is 
> sent 
> >       individually. I should have that switched to an option at least 
> pretty 
> >       soon, which should also help with the time it takes to service 
> them. 
> > 
> >       I'll test both ascii and binprot + the req_per_event option to see 
> how bad 
> >       this is measurably. 
> > 
> >       On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: 
> > 
> >       > The client is the EVCache client jar: 
> https://github.com/netflix/evcache 
> >       > When a user calls the batch get function on the client, it will 
> spread those batch gets out over many servers because it is hashing keys to 
> different servers. 
> >       Imagine many of 
> >       > these batch gets happening at the same time, though, and each 
> server's queue will get a bunch of gets from a bunch of different 
> user-facing batch gets. It all 
> >       gets intermixed. 
> >       > These client-side read queues are rather large (10000) and might 
> end up sending a batch of a few hundred keys at a time. These large batch 
> gets are sent off to 
> >       the servers as 
> >       > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop 
> package and read back in that order. We are reading the responses fairly 
> efficiently internally, but 
> >       the batch get 
> >       > call that the user made is waiting on the data from all of these 
> separate servers to come back in order to properly respond to the user in a 
> synchronous manner.  
> >       > 
> >       > Now on the memcached side, there's many servers all doing this 
> same pattern of many large batch gets. Memcached will stop responding to 
> that connection after 20 
> >       requests on the 
> >       > same event and go serve other connections. If that happens, any 
> user-facing batch call that is waiting on any getq command still waiting to 
> be serviced on that 
> >       connection can 
> >       > be delayed. It doesn't normally end up causing timeouts but it 
> does at a low level. 
> >       > 
> >       > Our timeouts for this app in particular are 5 seconds for a 
> single user-facing batch get call. This client app is fine with higher 
> latency for higher throughput. 
> >       > 
> >       > At this point we have the reqs_per_event set to a rather high 
> 300 and it seems to have solved our problem. I don't think it's causing any 
> more consternation (for 
> >       now), but 
> >       > having a dynamic setting would have lowered the operational 
> complexity of the tuning. 
> >       > 
> >       > 
> >       > Scott Mansfield 
> >       > Product > Consumer Science Eng > EVCache > Sr. Software Eng 
> >       > { 
> >       >   M: 352-514-9452 
> >       >   E: [email protected] <javascript:> 
> >       >   K: {M: mobile, E: email, K: key} 
> >       > } 
> >       > 
> >       > On Wed, Jan 25, 2017 at 11:04 AM, dormando <[email protected] 
> <javascript:>> wrote: 
> >       >       I guess when I say dynamic I mostly mean 
> runttime-settable. Dynamic is a 
> >       >       little harder so I tend to do those as a second pass. 
> >       > 
> >       >       You're saying your client had head-of-line blocking for 
> unrelated 
> >       >       requests? I'm not 100% sure I follow. 
> >       > 
> >       >       Big multiget comes in, multiget gets processed slightly 
> slower than normal 
> >       >       due to other clients making requests, so requests *behind* 
> the multiget 
> >       >       time out, or the multiget itself? 
> >       > 
> >       >       How long is your timeout? :P 
> >       > 
> >       >       I'll take a look at it as well and see about raising the 
> limit in `-o 
> >       >       modern` after some performance tests. The default is from 
> 2006. 
> >       > 
> >       >       thanks! 
> >       > 
> >       >       On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached 
> wrote: 
> >       > 
> >       >       > The reqs_per_event setting was causing a client that was 
> doing large batch-gets (of a few hundred keys) to see some timeouts. Since 
> memcached will delay 
> >       >       responding fully until 
> >       >       > other connections are serviced and our client will wait 
> until the batch is done, we see some client-side timeouts for the users of 
> our client library. Our 
> >       >       solution has been to 
> >       >       > up the setting during startup, but just as a thought 
> experiment I was asking if we could have done it dynamically to avoid 
> losing data. At the moment 
> >       there's 
> >       >       quite a lot of 
> >       >       > machinery to change the setting (deploy, copy data over 
> with our cache warmer, flip traffic, tear down old boxes) and I would have 
> rather left everything 
> >       as is 
> >       >       and adjusted the 
> >       >       > setting on the fly until our client's problem was 
> resolved. 
> >       >       > I'm interested in patching this specific setting to be 
> settable, but having it fully dynamic in nature is not something I'd want 
> to tackle. There's a 
> >       natural 
> >       >       tradeoff of 
> >       >       > latency for other connections / throughput for the one 
> that is currently being serviced. I'm not sure it's a good idea to 
> dynamically change that. It 
> >       might cause 
> >       >       unexpected 
> >       >       > behavior if one bad client sends huge requests. 
> >       >       > 
> >       >       > 
> >       >       > Scott Mansfield 
> >       >       > Product > Consumer Science Eng > EVCache > Sr. Software 
> Eng 
> >       >       > { 
> >       >       >   M: 352-514-9452 
> >       >       >   E: [email protected] <javascript:> 
> >       >       >   K: {M: mobile, E: email, K: key} 
> >       >       > } 
> >       >       > 
> >       >       > On Tue, Jan 24, 2017 at 11:53 AM, dormando <
> [email protected] <javascript:>> wrote: 
> >       >       >       Hey, 
> >       >       > 
> >       >       >       Would you mind explaining a bit how you determined 
> the setting was causing 
> >       >       >       an issue, and what the impact was? The default 
> there is very old and might 
> >       >       >       be worth a revisit (or some kind of auto-tuning) 
> as well. 
> >       >       > 
> >       >       >       I've been trending as much as possible to online 
> configuration, inlcuding 
> >       >       >       the actual memory limit.. You can turn the lru 
> crawler on and off, 
> >       >       >       automoving on and off, manually move slab pages, 
> etc. I'm hoping to make 
> >       >       >       the LRU algorithm itself modifyable at runtime. 
> >       >       > 
> >       >       >       So yeah, I'd take a patch :) 
> >       >       > 
> >       >       >       On Mon, 23 Jan 2017, 'Scott Mansfield' via 
> memcached wrote: 
> >       >       > 
> >       >       >       > There was a single setting my team was looking 
> at today and wish we could have changed dynamically: the 
> >       >       >       > reqs_per_event setting. Right now in order to 
> change it we need to shut down the process and start it again 
> >       >       >       > with a different -R parameter. I don't see a way 
> to change many of the settings, though there are some that 
> >       >       >       > are ad-hoc changeable through some stats 
> commands. I was going to see if I could patch memcached to be able 
> >       >       >       > to change the reqs_per_event setting at runtime, 
> but before doing so I wanted to check to see if that's 
> >       >       >       > something that would be amenable. I also didn't 
> want to do something specifically for that setting if it was 
> >       >       >       > going to be better to add it as a general 
> feature. 
> >       >       >       > I see some pros and cons: 
> >       >       >       > 
> >       >       >       > One easy pro is that you can easily change 
> things at runtime to save performance while not losing all of 
> >       >       >       > your data. If client request patterns change, 
> the process can react. 
> >       >       >       > 
> >       >       >       > A con is that the startup parameters won't 
> necessarily match what the process is doing, so they are no 
> >       >       >       > longer going to be a useful way to determine the 
> settings of memcached. Instead you would need to connect 
> >       >       >       > and issue a stats settings command to read them. 
> It also introduces change in places that may have 
> >       >       >       > previously never seen it, e.g. the 
> reqs_per_event setting is simply read at the beginning of the 
> >       >       >       > drive_machine loop. It might need some kind of 
> synchronization around it now instead. I don't think it 
> >       >       >       > necessarily needs it on x86_64 but it might on 
> other platforms which I am not familiar with. 
> >       >       >       > 
> >       >       >       > -- 
> >       >       >       > 
> >       >       >       > --- 
> >       >       >       > You received this message because you are 
> subscribed to the Google Groups "memcached" group. 
> >       >       >       > To unsubscribe from this group and stop 
> receiving emails from it, send an email to 
> >       >       >       > [email protected] <javascript:>. 
> >       >       >       > For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       >       > 
> >       >       >       > 
> >       >       > 
> >       >       >       -- 
> >       >       > 
> >       >       >       --- 
> >       >       >       You received this message because you are 
> subscribed to a topic in the Google Groups "memcached" group. 
> >       >       >       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. 
> >       >       >       To unsubscribe from this group and all its topics, 
> send an email to [email protected] <javascript:>. 
> >       >       >       For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       > 
> >       >       > 
> >       >       > -- 
> >       >       > 
> >       >       > --- 
> >       >       > You received this message because you are subscribed to 
> the Google Groups "memcached" group. 
> >       >       > To unsubscribe from this group and stop receiving emails 
> from it, send an email to [email protected] <javascript:>. 
> >       >       > For more options, visit 
> https://groups.google.com/d/optout. 
> >       >       > 
> >       >       > 
> >       > 
> >       >       -- 
> >       > 
> >       >       --- 
> >       >       You received this message because you are subscribed to a 
> topic in the Google Groups "memcached" group. 
> >       >       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. 
> >       >       To unsubscribe from this group and all its topics, send an 
> email to [email protected] <javascript:>. 
> >       >       For more options, visit https://groups.google.com/d/optout. 
>
> >       > 
> >       > 
> >       > -- 
> >       > 
> >       > --- 
> >       > You received this message because you are subscribed to the 
> Google Groups "memcached" group. 
> >       > To unsubscribe from this group and stop receiving emails from 
> it, send an email to [email protected] <javascript:>. 
> >       > For more options, visit https://groups.google.com/d/optout. 
> >       > 
> >       > 
> > 
> >       -- 
> > 
> >       --- 
> >       You received this message because you are subscribed to a topic in 
> the Google Groups "memcached" group. 
> >       To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. 
> >       To unsubscribe from this group and all its topics, send an email 
> to [email protected] <javascript:>. 
> >       For more options, visit https://groups.google.com/d/optout. 
> > 
> > 
> > -- 
> > 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "memcached" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> >

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to