Re: Configurable safety options for high performance Django systems

2014-11-25 Thread Florian Apolloner
On Monday, November 24, 2014 10:05:47 PM UTC+1, Rick van Hattem wrote: > > My goal was simply to move the Django project forward but it seems the > problems I've encountered in the field are too uncommon for most other > developers to care or understand. > Oh, I can assure you that we care and

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Rick van Hattem
Thanks for the help but writing the custom database backend won't be a problem, I've written one before :) My goal was simply to move the Django project forward but it seems the problems I've encountered in the field are too uncommon for most other developers to care or understand. Thank you all

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Michael Manfre
On Mon, Nov 24, 2014 at 3:32 PM, Christophe Pettus wrote: > In your particular case, where you have the relatively unusual situation > that: > > 1. You have this problem, and, > 2. You can't fix the code to solve this problem. > > ... you probably have the right answer is having a local patch for

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Christophe Pettus
On Nov 24, 2014, at 11:16 AM, Rick van Hattem wrote: > It seems you are misunderstanding what I am trying to do here. The 10,000 (or > whatever, that should be configurable) is a number large enough not to bother > anyone but small enough not to trigger the OOM system. There are really only f

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Rick van Hattem
It seems you are misunderstanding what I am trying to do here. The 10,000 (or whatever, that should be configurable) is a number large enough not to bother anyone but small enough not to trigger the OOM system. In that case it works perfectly. Due note that the proposed solution is not a hypotheti

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Christophe Pettus
On Nov 24, 2014, at 3:36 AM, Rick van Hattem wrote: > If you fetch N+1 items you know if there are over N items in your list. Let's stop there. Unfortunately, because of the way libpq works, just sending the query and checking the result set size won't solve your problem, except for an even s

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Rick van Hattem
Hi Christophe, As I previously explained, there's no need for a roundtrip or different transactions levels. If you fetch N+1 items you know if there are over N items in your list. So with that in mind you can simply do a query with a limit of N+1 to know if your query returns over N items. No ne

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Christophe Pettus
On Nov 24, 2014, at 1:08 AM, Rick van Hattem wrote: > Indeed, except it's not an "except: pass" but an "except: raise" which I'm > proposing. Which makes a world of difference. Well, as previously noted, this option would introduce another round-trip into every database if it's actually going

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Rick van Hattem
On 23 November 2014 at 22:57, Christophe Pettus wrote: > > On Nov 23, 2014, at 1:53 PM, Rick van Hattem wrote: > > > Very true, that's a fair point. That's why I'm opting for a configurable > option. Patching this within Django has saved me in quite a few cases but > it can have drawbacks. > > A

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Rick van Hattem
If that is an option than it's definitely a better location to set limits to prevent the server from going down. It helps nothing when it comes to debugging though. Which is the primary reason for patching the orm. And in addition to that, quite a few customers won't let you change the hosting se

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Mattias Linnap
Since your use case seems to be "avoid blocking the server when one pageview uses unexpectedly many resources", and you don't want to fix or optimise the actual application code, why not set these limits at a level where it makes sense? For example: * http://uwsgi-docs.readthedocs.org/en/latest

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Christophe Pettus
On Nov 23, 2014, at 1:53 PM, Rick van Hattem wrote: > Very true, that's a fair point. That's why I'm opting for a configurable > option. Patching this within Django has saved me in quite a few cases but it > can have drawbacks. As a DB guy, I have to say that if an application is sending a qu

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Rick van Hattem
On 23 Nov 2014 22:13, "Christophe Pettus" wrote: > > > On Nov 23, 2014, at 1:07 PM, Rick van Hattem wrote: > > > > Not really, cause psycopg already fetched everything. > > > > Not if Django limits it by default :) > > Unfortunately, that's not how it works. There are three things that take up m

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Christophe Pettus
On Nov 23, 2014, at 1:07 PM, Rick van Hattem wrote: > > Not really, cause psycopg already fetched everything. > > Not if Django limits it by default :) Unfortunately, that's not how it works. There are three things that take up memory as the result of a query result: 1. The Django objects.

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Rick van Hattem
Hi Florian, On 23 Nov 2014 16:22, "Florian Apolloner" wrote: > > Hi Rick, > > > On Sunday, November 23, 2014 1:11:13 PM UTC+1, Rick van Hattem wrote: >> >> If/when an unsliced queryset were to reach a certain limit (say, 10,000, but configurable) the system would raise an error. > > > Django can'

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Marco Paolini
2014-11-23 13:27 GMT+01:00 Shai Berger : > Hi Rick, > > On Sunday 23 November 2014 14:11:13 Rick van Hattem wrote: > > > > So please, can anyone give a good argument as to why any sane person > would > > have a problem with a huge default limit which will kill the performance > of > > your site an

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Florian Apolloner
Hi Rick, On Sunday, November 23, 2014 1:11:13 PM UTC+1, Rick van Hattem wrote: > If/when an *unsliced* queryset were to reach a certain limit (say, > 10,000, but configurable) the system would raise an error. > Django can't know if that would be the case without issuing an extra query -- and e

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Shai Berger
Hi Rick, On Sunday 23 November 2014 14:11:13 Rick van Hattem wrote: > > So please, can anyone give a good argument as to why any sane person would > have a problem with a huge default limit which will kill the performance of > your site anyhow but isn't enough to kill the entire system? > ...bec

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Rick van Hattem
Hi Mattias, Can you comment on the example I've given where this should not cause any problems and should help pretty much everyone (but just in case, the setting could be *optional*)? If/when an *unsliced* queryset were to reach a certain limit (say, 10,000, but configurable) the system would

Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Mattias Linnap
I thought I'd chime in since I run a Django app with fairly large tables (1M, 30M, 120M rows) on a fairly poor VPS (4GB ram, 2 cores, database and all services on one machine). I just don't see a situation where "safety limits on ORM queries" would improve things. Changing behaviour based on th

Re: Configurable safety options for high performance Django systems

2014-11-20 Thread Rick van Hattem
On Thursday, November 20, 2014 8:31:06 AM UTC+1, Christian Schmitt wrote: > > Nope. a large OFFSET of N will read through N rows, regardless index >> coverage. see >> http://www.postgresql.org/docs/9.1/static/queries-limit.html >> > > That's simple not true. > If you define a Order By with a well

Re: Configurable safety options for high performance Django systems

2014-11-20 Thread Marco Paolini
2014-11-20 8:30 GMT+01:00 Schmitt, Christian : > Nope. a large OFFSET of N will read through N rows, regardless index >> coverage. see >> http://www.postgresql.org/docs/9.1/static/queries-limit.html >> > > That's simple not true. > If you define a Order By with a well indexes query, the database w

Re: Configurable safety options for high performance Django systems

2014-11-19 Thread Schmitt, Christian
> > Nope. a large OFFSET of N will read through N rows, regardless index > coverage. see http://www.postgresql.org/docs/9.1/static/queries-limit.html > That's simple not true. If you define a Order By with a well indexes query, the database will only do a bitmap scan. This wiki isn't well explaine

Re: Configurable safety options for high performance Django systems

2014-11-19 Thread Marco Paolini
2014-11-19 14:49 GMT+01:00 Schmitt, Christian : > A sequence scan will only be made, if you query non indexed values. > So if you add a simple ORDER BY you will make a index scan, which is very > fast. > Nope. a large OFFSET of N will read through N rows, regardless index coverage. see http://www

Re: Configurable safety options for high performance Django systems

2014-11-19 Thread Schmitt, Christian
A sequence scan will only be made, if you query non indexed values. So if you add a simple ORDER BY you will make a index scan, which is very fast. The problem relies more on the database than on the ORM. As already said. If you need to deal with that much queries you need to log your SQL statement

Re: Configurable safety options for high performance Django systems

2014-11-19 Thread Marco Paolini
also, the offset + limit pagination strategy of django paginator is sub-optimal as it has N complexity: doing SELECT * FROM auth_user LIMIT 100 offset 100 causes a 10-long table scan 2014-11-19 13:56 GMT+01:00 Marco Paolini : > > > 2014-11-19 13:50 GMT+01:00 Rick van Hattem : > >> Definit

Re: Configurable safety options for high performance Django systems

2014-11-19 Thread Marco Paolini
2014-11-19 13:50 GMT+01:00 Rick van Hattem : > Definitely agree on this, silently altering a query's limit is probably > not the way to go. Raising an exception in case of no limit and lots of > results could be useful. > > For the sake of keeping the discussion useful: > - Let's say you have a ta

Re: Configurable safety options for high performance Django systems

2014-11-19 Thread Rick van Hattem
Definitely agree on this, silently altering a query's limit is probably not the way to go. Raising an exception in case of no limit and lots of results could be useful. For the sake of keeping the discussion useful: - Let's say you have a table with 50,000 items, not an insanely large amount im

Re: Configurable safety options for high performance Django systems

2014-11-19 Thread Rick van Hattem
Hi Carl, Thruthfully some part of my reason for forking was that I was running an older version of Django which didn't have custom user models. In that case it's a bit more difficult to override the manager and I've seen quite a few external projects (accidently) do something like "User.objects

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Carl Meyer
Hi Rick, On 11/18/2014 11:59 AM, Rick van Hattem wrote: [snip] > In all but the most basic Django projects I've seen problems like these. > Sane defaults won't hurt anyone and solves issues for people with larger > systems. And running forks of Django seems counter productive as well. As a side n

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Michael Manfre
On Tue, Nov 18, 2014 at 7:42 PM, Josh Smeaton wrote: > To me, "sane default" means django should not silently alter the query to > provide a LIMIT when it is not asked for. > > I have also run into situations where doing a .count() or iterating a full > table has broken the application, or put to

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Josh Smeaton
To me, "sane default" means django should not silently alter the query to provide a LIMIT when it is not asked for. I have also run into situations where doing a .count() or iterating a full table has broken the application, or put too much pressure on the database. Specifically with django bin

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Rick van Hattem
The 100M is an example of a really big problem, but the problem also exists on a much smaller scale. Even doing "if some_queryset" with 10,000 items can get pretty slow, just because it doesn't kill the server in all cases doesn't make it a good thing to ignore. In all but the most basic Django p

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Schmitt, Christian
Sorry, but I don't think this is suitable. If somebody has 100M rows per Table then, he should prolly think about sharding/replication anyway. So the ORM would still suffer anyway. Currently my company has a few tables with a high count as well but since we never used the django-admin and managed t

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Rick van Hattem
That certainly solves one part of the problem. After that I would still opt for an optional configurable default for slicing. Personally I prefer to raise an error when unsliced querysets are used since it's almost always harmful or at least dangerous behaviour. On 18 November 2014 19:18, Claude P

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Claude Paroz
On Tuesday, November 18, 2014 1:58:00 PM UTC+1, Rick van Hattem wrote: > > Hi guys, > > As it is right now Django has the tendency to kill either your browser (if > you're lucky) or the entire application server when confronted with a large > database. For example, the admin always does counts fo

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Aymeric Augustin
I've had to implement workarounds for such problems in many Django projects I created. For example I wrote code to determine which models have too many instances for dumping them all into a HTML dropdown and automatically add this model to raw_id_fields in related models. The main difficulty I'm f

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Rick van Hattem
The framework is indeed generic which is why it should also take scenarios into account where there is a lot more data. There is obviously no one-size-fits-all solution which is why I propose configurable options. The needed patches within the Django core are small and have very limited impact

Re: Configurable safety options for high performance Django systems

2014-11-18 Thread Michael Manfre
Django is a generic framework. Those who use it to implement functionality are the ones who kill either the browser or their entire application. Applications must be written to meet functional and performance requirements of the project. When dealing with large tables, more effort is certainly need

Configurable safety options for high performance Django systems

2014-11-18 Thread Rick van Hattem
Hi guys, As it is right now Django has the tendency to kill either your browser (if you're lucky) or the entire application server when confronted with a large database. For example, the admin always does counts for pagination and a count over a table with many rows (say, in the order of 100M)