I have opened a pull request with a preliminary implementation for a settings command: https://github.com/memcached/memcached/pull/255
I took a few liberties, so let me know if anything is out of line. On Wednesday, January 25, 2017 at 1:52:24 PM UTC-8, Dormando wrote: > > Yeah gimme a few weeks maybe. Reducing those syscalls is like almost all > of the CPU usage. Difference between 1.2m keys/sec and 35m keys/sec on 20 > cores in my own tests. > > I did this: > https://github.com/memcached/memcached/pull/243 > .. which would help batch perf. > and this: > https://github.com/memcached/memcached/pull/241 > .. which should make binprot perf better at nearly undetectable cost to > ascii. > > so, working my way to it. > > On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: > > > Yes, our production traffic all uses binary protocol, even behind our > on-server proxy that we use. In fact, if you have a way to reduce syscalls > by batching responses, that > > would solve another huge pain we have that's of our own doing. > > > > > > Scott Mansfield > > Product > Consumer Science Eng > EVCache > Sr. Software Eng > > { > > M: 352-514-9452 > > E: [email protected] <javascript:> > > K: {M: mobile, E: email, K: key} > > } > > > > On Wed, Jan 25, 2017 at 11:33 AM, dormando <[email protected] > <javascript:>> wrote: > > Okay, so it's the big rollup that gets delayed. Makes sense. > > > > You're using binary protocol for everything? That's a major focus > of my > > performance annoyance right now, since every response packet is > sent > > individually. I should have that switched to an option at least > pretty > > soon, which should also help with the time it takes to service > them. > > > > I'll test both ascii and binprot + the req_per_event option to see > how bad > > this is measurably. > > > > On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached wrote: > > > > > The client is the EVCache client jar: > https://github.com/netflix/evcache > > > When a user calls the batch get function on the client, it will > spread those batch gets out over many servers because it is hashing keys to > different servers. > > Imagine many of > > > these batch gets happening at the same time, though, and each > server's queue will get a bunch of gets from a bunch of different > user-facing batch gets. It all > > gets intermixed. > > > These client-side read queues are rather large (10000) and might > end up sending a batch of a few hundred keys at a time. These large batch > gets are sent off to > > the servers as > > > "one" getq|getq|getq|getq|getq|getq|getq|getq|getq|getq|noop > package and read back in that order. We are reading the responses fairly > efficiently internally, but > > the batch get > > > call that the user made is waiting on the data from all of these > separate servers to come back in order to properly respond to the user in a > synchronous manner. > > > > > > Now on the memcached side, there's many servers all doing this > same pattern of many large batch gets. Memcached will stop responding to > that connection after 20 > > requests on the > > > same event and go serve other connections. If that happens, any > user-facing batch call that is waiting on any getq command still waiting to > be serviced on that > > connection can > > > be delayed. It doesn't normally end up causing timeouts but it > does at a low level. > > > > > > Our timeouts for this app in particular are 5 seconds for a > single user-facing batch get call. This client app is fine with higher > latency for higher throughput. > > > > > > At this point we have the reqs_per_event set to a rather high > 300 and it seems to have solved our problem. I don't think it's causing any > more consternation (for > > now), but > > > having a dynamic setting would have lowered the operational > complexity of the tuning. > > > > > > > > > Scott Mansfield > > > Product > Consumer Science Eng > EVCache > Sr. Software Eng > > > { > > > M: 352-514-9452 > > > E: [email protected] <javascript:> > > > K: {M: mobile, E: email, K: key} > > > } > > > > > > On Wed, Jan 25, 2017 at 11:04 AM, dormando <[email protected] > <javascript:>> wrote: > > > I guess when I say dynamic I mostly mean > runttime-settable. Dynamic is a > > > little harder so I tend to do those as a second pass. > > > > > > You're saying your client had head-of-line blocking for > unrelated > > > requests? I'm not 100% sure I follow. > > > > > > Big multiget comes in, multiget gets processed slightly > slower than normal > > > due to other clients making requests, so requests *behind* > the multiget > > > time out, or the multiget itself? > > > > > > How long is your timeout? :P > > > > > > I'll take a look at it as well and see about raising the > limit in `-o > > > modern` after some performance tests. The default is from > 2006. > > > > > > thanks! > > > > > > On Wed, 25 Jan 2017, 'Scott Mansfield' via memcached > wrote: > > > > > > > The reqs_per_event setting was causing a client that was > doing large batch-gets (of a few hundred keys) to see some timeouts. Since > memcached will delay > > > responding fully until > > > > other connections are serviced and our client will wait > until the batch is done, we see some client-side timeouts for the users of > our client library. Our > > > solution has been to > > > > up the setting during startup, but just as a thought > experiment I was asking if we could have done it dynamically to avoid > losing data. At the moment > > there's > > > quite a lot of > > > > machinery to change the setting (deploy, copy data over > with our cache warmer, flip traffic, tear down old boxes) and I would have > rather left everything > > as is > > > and adjusted the > > > > setting on the fly until our client's problem was > resolved. > > > > I'm interested in patching this specific setting to be > settable, but having it fully dynamic in nature is not something I'd want > to tackle. There's a > > natural > > > tradeoff of > > > > latency for other connections / throughput for the one > that is currently being serviced. I'm not sure it's a good idea to > dynamically change that. It > > might cause > > > unexpected > > > > behavior if one bad client sends huge requests. > > > > > > > > > > > > Scott Mansfield > > > > Product > Consumer Science Eng > EVCache > Sr. Software > Eng > > > > { > > > > M: 352-514-9452 > > > > E: [email protected] <javascript:> > > > > K: {M: mobile, E: email, K: key} > > > > } > > > > > > > > On Tue, Jan 24, 2017 at 11:53 AM, dormando < > [email protected] <javascript:>> wrote: > > > > Hey, > > > > > > > > Would you mind explaining a bit how you determined > the setting was causing > > > > an issue, and what the impact was? The default > there is very old and might > > > > be worth a revisit (or some kind of auto-tuning) > as well. > > > > > > > > I've been trending as much as possible to online > configuration, inlcuding > > > > the actual memory limit.. You can turn the lru > crawler on and off, > > > > automoving on and off, manually move slab pages, > etc. I'm hoping to make > > > > the LRU algorithm itself modifyable at runtime. > > > > > > > > So yeah, I'd take a patch :) > > > > > > > > On Mon, 23 Jan 2017, 'Scott Mansfield' via > memcached wrote: > > > > > > > > > There was a single setting my team was looking > at today and wish we could have changed dynamically: the > > > > > reqs_per_event setting. Right now in order to > change it we need to shut down the process and start it again > > > > > with a different -R parameter. I don't see a way > to change many of the settings, though there are some that > > > > > are ad-hoc changeable through some stats > commands. I was going to see if I could patch memcached to be able > > > > > to change the reqs_per_event setting at runtime, > but before doing so I wanted to check to see if that's > > > > > something that would be amenable. I also didn't > want to do something specifically for that setting if it was > > > > > going to be better to add it as a general > feature. > > > > > I see some pros and cons: > > > > > > > > > > One easy pro is that you can easily change > things at runtime to save performance while not losing all of > > > > > your data. If client request patterns change, > the process can react. > > > > > > > > > > A con is that the startup parameters won't > necessarily match what the process is doing, so they are no > > > > > longer going to be a useful way to determine the > settings of memcached. Instead you would need to connect > > > > > and issue a stats settings command to read them. > It also introduces change in places that may have > > > > > previously never seen it, e.g. the > reqs_per_event setting is simply read at the beginning of the > > > > > drive_machine loop. It might need some kind of > synchronization around it now instead. I don't think it > > > > > necessarily needs it on x86_64 but it might on > other platforms which I am not familiar with. > > > > > > > > > > -- > > > > > > > > > > --- > > > > > You received this message because you are > subscribed to the Google Groups "memcached" group. > > > > > To unsubscribe from this group and stop > receiving emails from it, send an email to > > > > > [email protected] <javascript:>. > > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are > subscribed to a topic in the Google Groups "memcached" group. > > > > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > > > > To unsubscribe from this group and all its topics, > send an email to [email protected] <javascript:>. > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > -- > > > > > > > > --- > > > > You received this message because you are subscribed to > the Google Groups "memcached" group. > > > > To unsubscribe from this group and stop receiving emails > from it, send an email to [email protected] <javascript:>. > > > > For more options, visit > https://groups.google.com/d/optout. > > > > > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to a > topic in the Google Groups "memcached" group. > > > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > > > To unsubscribe from this group and all its topics, send an > email to [email protected] <javascript:>. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > > > --- > > > You received this message because you are subscribed to the > Google Groups "memcached" group. > > > To unsubscribe from this group and stop receiving emails from > it, send an email to [email protected] <javascript:>. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > > > --- > > You received this message because you are subscribed to a topic in > the Google Groups "memcached" group. > > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/memcached/C6l8aoXQO4A/unsubscribe. > > To unsubscribe from this group and all its topics, send an email > to [email protected] <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > > > --- > > You received this message because you are subscribed to the Google > Groups "memcached" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > > > > -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
