On Jun 21, 2008, at 11:35, Daniel wrote:

- Overcomplicated. See dustin's rails examples for easy abstractions on
getting to the 90% mark.

You are absolutely correct...  But wouldn't it be worth making a
memcached/database combo that's overcomplicated if we could get a
performance boost from every app that uses just like they use a
database?

It just feels like the wrong level. If you're just caching database results, you're not going to be getting the best usage out of your app. I mean, you're not likely to be doing all that much better than what any other query cache does today.

If you're caching the objects you build from the results of a query, you can do a *lot* better. You can start doing things like simply not joining tables, not doing N+1 queries, etc...

I had an app that, under normal circumstances, could turn five or so DB queries (optimal) to get all of the information required to build the objects necessary to process a request instead use exactly one cache look up to do the same. It would complete a transaction with exactly one cache write and one DB transaction over some large number of queries (which were all performed asynchronously to the client request -- in this particular app, there was no value in making the client wait for positive confirmation on DB writes).

No matter how big or optimal your DB is, a single round trip for a graph of data from one of many memcached instances will be faster than multiple queries on a more centralized ACID database.

Even with a full fledged database cluster doing the work, the cluster
ends up running more slowly because it has to handle all of the extra
database needs, replication, journaling, failover, version control,
blocking, etc etc.

I thought you were advocating MVCC in memcached earlier? I don't think you can reach the levels of guarantee you're asking for over DB transactions without something like MVCC and 2pc. If you start adding these types of things in, you're just making another database, and it'll be as slow as any of them.

FYI, Oracle's product, TimesTen seems to have some performance metrics
that I believe apply...  They reported on page 44 of one financial
trading system having an order of magnitude performance increase:

Well, yes, databases that don't have to write to filesystems are faster. Caches that don't try to be databases are faster than those.

There are plenty of open source in-memory databases available today (e.g. sqlite or mysql).


http://www.oracle.com/technology/products/timesten/pdf/oow2007/oow07_s291347_timesten_caching_use_cases.pdf

I just thought of another way of describing it. The CDD's are like
Reader Databases, while the core database handles all writing.

I think many of us build our applications this way, but deliberately. If it were to happen automatically, it'd be at a cost of performance.

Yes, that is a better memcached specific application design. What I'm
suggesting will be significantly slower than an integrated designed- for
memcached app. That's way I believe it seems worthwhile to include the
current memcached functionality in the CDD.

I think the best way to approach this is by building an architecture that fits it well. Turns out, that's pretty hard, and then nobody wants to use it.

I get what you're saying. You want to make something everyone can use, but doesn't do all that much. It kind of sounds like mysql's query cache. I don't know how it'd be better than that. I've not heard anything particularly wonderful about it.

I've written some activerecord extensions that can do automatic caching and invalidation of objects by relationship. This lives within the ORM because it can act on real live objects and has a deep understanding of when things change and knows what to do about it. Although it's in its infancy stage, it basically works. However, getting it to do all of the stuff it should/could would require a *lot* of work and could very well make things slower. Sometimes it's just better to say what you mean.

I don't get it...  Here in "memcached land" we're dealing with
situations where if we DON'T warm up the cache before going live can
make sites blow up, meanwhile people are saying/thinking that a generic
memcached/database combination isn't worth the trouble.

The way I've taken care of this kind of situation in the past is to rate control going live. e.g. if I'm down, that's generally bad. If I'm up, that's great, but I get a huge traffic spike at the worst time. So I just have a probabilistic request acceptance filter that goes from accepting 0% of requests to 100% of requests over n minutes. Adding ceilings and different slopes on various request types can help, too.

So you're not down, but you're also not completely up for a period of time. Doing this makes the warming pretty much take care of itself.

Memcached is great as it is, but it sometimes returns old data.

``There are only two hard things in Computer Science: cache invalidation and naming things.'' -- Phil Karlton

The
database knows when this data changes. Why can't we develop a system
that will make it so the database alerts/updates the cache when data is
changed.

My ORM knows when data changes, and already has a *really* easy way to perform cache invalidations. However, I still can't always get everything. There are lots of things that are cached as derived values -- pages, parts of pages, objects that contain other objects, etc... The application has far, far better insight into all of the places a given piece of information is used, but it'll be along time before it can automatically track all of them.

We're all familiar with those knarly queries that need to be run
repeatedly. I believe it's possible to create a system where the query
is kept up to date by the Caching Database Daemon's (CDD's) adding a
very small overhead to the core database if there are enough CDD's to
keep the underlying data in cache memory.

We do that in our apps pretty easily today, except it scales horizontally by not requiring some centralized service to do it. One of the front-ends changes something, and updates one of the memcached servers.

When the application wants the results, it's CDD simply sends a request to all of the other CDD's and sorts/limits/distinct the results. What a
cool performance boost!


That assumes it can read the data I've cached (or, in this case, it's caching for its own benefit). That application seems really specialized. With as little code as I generally write to get caching working in apps, I don't see how something like that would be less work.

--
Dustin Sallings



Reply via email to