Re: Database pooling vs. persistent connections

Anssi Kääriäinen Sun, 17 Feb 2013 10:31:51 -0800

On 17 helmi, 13:24, Aymeric Augustin
<aymeric.augus...@polytechnique.org> wrote:
> **tl;dr** I believe that persistent database connections are a good idea.
> Please prove me wrong :)
>
> --------------------
>
> Since I didn't know why the idea of adding a connection pooler to Django was
> rejected, I did some research before replying to the cx_Oracle SessionPooling
> thread.
>
> The best explanation I've found is from Russell:
>
> > To clarify -- we've historically been opposed to adding connection
> > pooling to Django is for the same reason that we don't include a web
> > server in Django -- the capability already exists in third party
> > tools, and they're in a position to do a much better job at it than us
> > because it's their sole focus. Django doesn't have to be the whole
> > stack.
>
> All the discussions boil down to this argument, and the only ticket on the
> topic is short on details:https://code.djangoproject.com/ticket/11798
>
> --------------------
>
> The connection pools for Django I've looked at replace "open a connection" by
> "take a connection from the pool" and "close a connection" by "return the
> connection to the pool". This isn't "real" connection pooling: each worker
> holds a connection for the entire duration of each request, regardless of
> whether it has an open transaction or not.
>
> This requires as many connection as workers, and thus is essentially
> equivalent to persistent database connections, except connections can be
> rotated among workers.
>
> Persistent connections would eliminate the overhead of creating a connection
> (IIRC ~50ms/req), which is the most annoying symptom, without incurring the
> complexity of a "real" pooler.
>
> They would be a win for small and medium websites that don't manage their
> database transactions manually and where the complexity of maintaining an
> external connection pooler isn't justified.
>
> Besides, when Django's transaction middelware is enabled, each request is
> wrapped in a single transaction, which reserves a connection. In this case, a
> connection pooler won't perform better than persistent connections.
>
> Obviously, large websites should use an external pooler to multiplex their
> hundreds of connections from workers into tens of connections to their
> database and manage their transactions manually. I don't believe persistent
> connections to the pooler would hurt in this scenario, but if it does, it
> could be optional.
>
> --------------------
>
> AFAICT there are three things to take care of before reusing a connection:
>
> 1) restore a pristine transaction state: transaction.rollback() should do;
>
> 2) reset all connection settings: the foundation was laid in #19274;
>
> 3) check if the connection is still alive, and re-open it otherwise:
>     - for psycopg2: "SELECT 1";
>     - for MySQL and Oracle: connection.ping().
>
> Some have argued that persistent connections tie the lifetime of databases
> connections to the lifetime of workers, but it's easy to store the creation
> timestamp and re-open the connection if it exceeds a given max-age.
>
> So -- did I miss something?


I am not yet convinced that poolers implemented inside Django core are
necessary. A major reason for doing #19274 was to allow somewhat easy
creation of 3rd party connection poolers.

I don't see transactional connection pooling as something that forces
including connection pools into Django. Transactional pooler
implementation should be possible outside Django. On every execute
check which connection to use. When inside transaction, then use the
connection tied to the transaction, otherwise take a free connection
from the pool and use that. The big problem is that Django's
transaction handling doesn't actually know when the connection is
inside transaction. Fix this and doing transactional poolers external
to Django will be possible. (Tying the connection to transaction
managed blocks could work, but then what to do for queries outside any
transaction managed block?).

Instead of implementing poolers inside Django would it be better to
aim for DBWrapper subclassing (as done in #19274)? The subclassing
approach has some nice properties, for example one could implement
"rewrite to prepared statements" feature (basically, some queries will
get automatically converted to use prepared statements on execution
time). This setup should result in nice speedups for some use cases.

Another implementation idea is to have the DB settings contain a
'POOLER' entry. By default this entry is empty, but when defined it
points to a class that has a (very limited) pooler API:
lend_connection(), release_connection() and close_all_connections()
(the last one is needed for test teardown). And then the connections
itself could have .reset(), .ping() and so on methods. This is simple
and should also be extensible.

It seems SQLAlchemy has a mature pooling implementation. So, yet
another approach is to see if SQLAlchemy's pooling implementation
could be reused. (Maybe in conjunction with the above 'POOLER' idea).

I also do believe that persistent database connections are a good
idea. I don't yet believe the implementation must be in Django core...

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Database pooling vs. persistent connections

Reply via email to