Re: Database pooling vs. persistent connections

Anssi Kääriäinen Tue, 19 Feb 2013 02:04:33 -0800

On 19 helmi, 02:31, Carl Meyer <c...@oddbird.net> wrote:
> On 02/18/2013 02:27 PM, Aymeric Augustin wrote:
>
> > Problem #1: Is it worth re-executing the connection setup at the beginning 
> > of
> > every request?
>
> > The connection setup varies widely between backends:
> > - SQLite: none
> > - 
> > PostgreSQL:https://github.com/django/django/blob/master/django/db/backends/postg...
> > - 
> > MySQL:https://github.com/django/django/blob/master/django/db/backends/mysql...
> > - 
> > Oracle:https://github.com/django/django/blob/master/django/db/backends/oracl...
>
> > The current version of the patch repeats it every time. In theory, this 
> > isn't
> > necessary. Doing it only once would be more simple.
>
> > It could be backwards incompatible, for instance, if a developer changes the
> > connection's time zone. But we can document to set CONN_MAX_AGE = 0 to 
> > restore
> > the previous behavior in such cases.
>
> It seems silly to re-run queries per-request that were only ever
> intended to be per-connection setup. So I favor the latter. I think this
> change will require prominent discussion in the
> potentially-backwards-incompatible section of the release notes regardless.
>
> (This could form an argument for keeping the built-in default 0, and
> just setting it to a higher number in the project template. But I don't
> like polluting the default project template, and I think a majority of
> existing upgrading projects will benefit from this without problems, so
> I don't actually want to do that.)


Maybe we need another setting for what to do in request.start. It does
seem somewhat likely that users could do SET SEARCH_PATH in middleware
to support multitenant setups for example, and expect that set to go
away when connection is closed after the request. Any other SET is a
likely candidate for problems in PostgreSQL, and I am sure other DBs
have their equivalent of session state, too. (In this case doing RESET
ALL; run connection setup again is the right thing to do in
PostgreSQL).

It would be nice to support this use case, but just documenting this
change clearly in the release notes, and point out that if you have
such requirements, then set max_age to 0. More features can always be
added later on.

> > Problem #2: How can Django handle situations where the connection to the
> > database is lost?
>
> > Currently, with MySQL, Django pings the database server every time it 
> > creates
> > a cursor, and reconnects if that fails. This isn't a good practice and this
> > behavior should be removed:https://code.djangoproject.com/ticket/15119
>
> > Other backends don't have an equivalent behavior. If a connection was 
> > opened,
> > Django assume it works. Worse, if the connection breaks, Django fails to 
> > close
> > it, and keeps the broken connection instead of opening a new one:
> >https://code.djangoproject.com/ticket/15802
>
> > Thus, persistent connections can't make things worse :) but it'd be nice to
> > improve Django in this area, consistently across all backends.
>
> > I have considered four possibilities:
>
> > (1) Do nothing. At worst, the connection will be closed after max_age and 
> > then
> >     reopened. The worker will return 500s in the meantime. This is the 
> > current
> >     implementation.
>
> > (2) "Ping" the connection at the beginning of each request, and if it 
> > doesn't
> >     work, re-open it. As explained above, this isn't a good practice. Note
> >     that if Django repeats the connection setup before each request, it can
> >     take this opportunity to check that the connection works and reconnect
> >     otherwise. But I'm not convinced I should keep this behavior.
>
> > (3) Catch database exceptions in all appropriate places, and if the 
> > exception
> >     says that the connection is broken, reconnect. In theory this is the 
> > best
> >     solution, but it's complicated to implement. I haven't found a 
> > conclusive
> >     way to identify error conditions that warrant a reconnection.
>
> > (4) Close all database connections when a request returns a 500. It's a bad
> >     idea because it ties the HTTP and database layers. It could also hide
> >     problems.
>
> I'd be inclined to go for (1), with the intent of moving gradually
> towards (3) as specific detectable error conditions that happen in real
> life and do warrant closing the connection and opening a new one are
> brought to our attention. Unfortunately handling those cases is likely
> to require parsing error messages, as pretty much anything related to
> DBAPI drivers and error conditions does :/
>
> I tried to dig for the origins of the current MySQL behavior to see if
> that would illuminate such a case, but that code goes way back into the
> mists of ancient history (specifically, merger of the magic-removal
> branch), beyond which the gaze of "git annotate" cannot penetrate.
>
> Option (4) is very bad IMO, and (2) is not much better.

I hope this discussion is about what to do at request finish/start
time.

I am very strongly opposed to anything where Django suddenly changes
connections underneath you. At request finish/start this is OK (you
should expect new connections then anyways), but otherwise if you get
broken connection, it isn't Django's business to swap the connection
underneath you. There is a reasonable expectation that while you are
using single connections[alias] in a script for example, you can
expect the underlying connection to be the same for the whole time.
Otherwise SET somevar in postgresql could break for example.

Now, one could argument that SET somevar should not be used with
Django's connections. But this is very, very limiting for some real
world use cases (multitenant and SET SEARCH_PATH for one). There is no
way to actually force such a limitation, so the limitation would be
documentation only. In addition the result is that very rarely you get
a weird (potentially data corrupting) problem because your connection
was swapped at the wrong moment. Nearly impossible to debug
(especially if this is not logged either).

If the connection swapping is still wanted, then there must at least
be a way to tell Django that do NOT swap connections unless told to do
so.

I think a good approach would be to mark the connection potentially
broken on errors in queries, and then in request_finished check for
this potentially broken flag. If flag set, then and only then run
ping() / select 1. So, this is a slight modification of no. 3 where
one can mark the connection potentially broken liberally, but the
connection is swapped only when the ping fails, and only in
request_finished. For most requests there should be no overhead as
errors in queries are rare.

BTW the remark above in Aymeric's post that persistent connections
can't make things worse: I don't believe this. Persistent connections
will keep the broken connection from request to request, and at least
on PostgreSQL a broken connection is correctly closed in request
finish.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Database pooling vs. persistent connections

Reply via email to