Re: Few queries on atomicity of requests

Dustin Sallings Sat, 21 Jun 2008 12:47:29 -0700


On Jun 21, 2008, at 11:35, Daniel wrote:

- Overcomplicated. See dustin's rails examples for easyabstractions on
getting to the 90% mark.


You are absolutely correct...  But wouldn't it be worth making a
memcached/database combo that's overcomplicated if we could get a
performance boost from every app that uses just like they use a
database?

It just feels like the wrong level. If you're just caching databaseresults, you're not going to be getting the best usage out of yourapp. I mean, you're not likely to be doing all that much better thanwhat any other query cache does today.

If you're caching the objects you build from the results of a query,you can do a *lot* better. You can start doing things like simply notjoining tables, not doing N+1 queries, etc...

I had an app that, under normal circumstances, could turn five or soDB queries (optimal) to get all of the information required to buildthe objects necessary to process a request instead use exactly onecache look up to do the same. It would complete a transaction withexactly one cache write and one DB transaction over some large numberof queries (which were all performed asynchronously to the clientrequest -- in this particular app, there was no value in making theclient wait for positive confirmation on DB writes).

No matter how big or optimal your DB is, a single round trip for agraph of data from one of many memcached instances will be faster thanmultiple queries on a more centralized ACID database.

Even with a full fledged database cluster doing the work, the cluster
ends up running more slowly because it has to handle all of the extra
database needs, replication, journaling, failover, version control,
blocking, etc etc.

I thought you were advocating MVCC in memcached earlier? I don'tthink you can reach the levels of guarantee you're asking for over DBtransactions without something like MVCC and 2pc. If you start addingthese types of things in, you're just making another database, andit'll be as slow as any of them.

FYI, Oracle's product, TimesTen seems to have some performance metrics
that I believe apply...  They reported on page 44 of one financial
trading system having an order of magnitude performance increase:

Well, yes, databases that don't have to write to filesystems arefaster. Caches that don't try to be databases are faster than those.

There are plenty of open source in-memory databases available today(e.g. sqlite or mysql).


http://www.oracle.com/technology/products/timesten/pdf/oow2007/oow07_s291347_timesten_caching_use_cases.pdf

I just thought of another way of describing it. The CDD's are like
Reader Databases, while the core database handles all writing.

I think many of us build our applications this way, butdeliberately. If it were to happen automatically, it'd be at a costof performance.

Yes, that is a better memcached specific application design. What I'm

suggesting will be significantly slower than an integrated designed-for

memcached app. That's way I believe it seems worthwhile to include the
current memcached functionality in the CDD.

I think the best way to approach this is by building an architecturethat fits it well. Turns out, that's pretty hard, and then nobodywants to use it.

I get what you're saying. You want to make something everyone canuse, but doesn't do all that much. It kind of sounds like mysql'squery cache. I don't know how it'd be better than that. I've notheard anything particularly wonderful about it.

I've written some activerecord extensions that can do automaticcaching and invalidation of objects by relationship. This liveswithin the ORM because it can act on real live objects and has a deepunderstanding of when things change and knows what to do about it.Although it's in its infancy stage, it basically works. However,getting it to do all of the stuff it should/could would require a*lot* of work and could very well make things slower. Sometimes it'sjust better to say what you mean.

I don't get it...  Here in "memcached land" we're dealing with
situations where if we DON'T warm up the cache before going live can

make sites blow up, meanwhile people are saying/thinking that ageneric

memcached/database combination isn't worth the trouble.

The way I've taken care of this kind of situation in the past is torate control going live. e.g. if I'm down, that's generally bad. IfI'm up, that's great, but I get a huge traffic spike at the worsttime. So I just have a probabilistic request acceptance filter thatgoes from accepting 0% of requests to 100% of requests over nminutes. Adding ceilings and different slopes on various requesttypes can help, too.

So you're not down, but you're also not completely up for a period oftime. Doing this makes the warming pretty much take care of itself.

Memcached is great as it is, but it sometimes returns old data.

``There are only two hard things in Computer Science: cacheinvalidation and naming things.'' -- Phil Karlton

The
database knows when this data changes. Why can't we develop a system
that will make it so the database alerts/updates the cache when datais
changed.

My ORM knows when data changes, and already has a *really* easy wayto perform cache invalidations. However, I still can't always geteverything. There are lots of things that are cached as derivedvalues -- pages, parts of pages, objects that contain other objects,etc... The application has far, far better insight into all of theplaces a given piece of information is used, but it'll be along timebefore it can automatically track all of them.

We're all familiar with those knarly queries that need to be run
repeatedly. I believe it's possible to create a system where the query
is kept up to date by the Caching Database Daemon's (CDD's) adding a
very small overhead to the core database if there are enough CDD's to
keep the underlying data in cache memory.

We do that in our apps pretty easily today, except it scaleshorizontally by not requiring some centralized service to do it. Oneof the front-ends changes something, and updates one of the memcachedservers.

When the application wants the results, it's CDD simply sends arequestto all of the other CDD's and sorts/limits/distinct the results.What a
cool performance boost!

That assumes it can read the data I've cached (or, in this case, it'scaching for its own benefit). That application seems reallyspecialized. With as little code as I generally write to get cachingworking in apps, I don't see how something like that would be less work.


--
Dustin Sallings

Re: Few queries on atomicity of requests

Reply via email to