Re: Configurable safety options for high performance Django systems

2014-11-23 Thread Mattias Linnap
I thought I'd chime in since I run a Django app with fairly large tables 
(1M, 30M, 120M rows) on a fairly poor VPS (4GB ram, 2 cores, database and 
all services on one machine).

I just don't see a situation where "safety limits on ORM queries" would 
improve things. Changing behaviour based on the size of the queryset seems 
like a really bad idea. Now instead of gradual slowdown as the dataset 
grows, you start getting surprise exceptions, invalid data in results, or 
missing functionality once you hit a magic threshold.

There is no good way to set that threshold automatically, since people have 
very different expectations of what's "slow". For example, loading the 
second-to-last list page in the Django admin for my 1M row table takes a 
perfectly reasonable 2-3 seconds. That's too slow for the front page of a 
popular site, but even 30 seconds would be fine for an admin page that's 
visited maybe once a month.

I'm not saying that performance improvements or optimisation suggestions 
are a bad idea, but if they are possible, they should stay consistent. If 
using "if queryset:" is a bug, then it should show a warning and suggestion 
to use "if queryset.exists():" for both 100 and 100M rows. If there's an 
option to turn off full pagination in admin, then it should be a ModelAdmin 
option, not dependent on data size - the developer can enable it for big 
tables manually.

Behaviour changes based on the queryset size make it more difficult, not 
easier for developers - right now you can delay optimisation work until it 
really becomes too painful for users, whereas having strict limits would 
require doing that work as soon as a limit is reached.

Finally, I agree that it would be nice if Postgresql server-side cursors 
were better integrated with the ORM. It's possible to use them for raw 
queries already:

import connection, transaction
connection.ensure_connection()
with transaction.atomic():
cursor = connection.connection.cursor(name='hello')
cursor.execute('BIG QUERYSET SQL')
for row in cursor: # fetches rows in cursor.itersize chunks
pass

I use this in a few scripts that iterate over 10GB of table data. But a way 
to map the rows to Django objects would be nice.


On Thursday, 20 November 2014 13:46:01 UTC+2, Rick van Hattem wrote:
>
> On Thursday, November 20, 2014 8:31:06 AM UTC+1, Christian Schmitt wrote:
>>
>> Nope. a large OFFSET of N will read through N rows, regardless index 
>>> coverage. see 
>>> http://www.postgresql.org/docs/9.1/static/queries-limit.html
>>>
>>
>> That's simple not true.
>> If you define a Order By with a well indexes query, the database will 
>> only do a bitmap scan.
>> This wiki isn't well explained. Take this:
>>
>> > 
>> https://wiki.postgresql.org/images/3/35/Pagination_Done_the_PostgreSQL_Way.pdf
>>
>
> Sounds awesome in theory, here's a real world example:
>
> http://explain.depesz.com/s/WpQU
> # explain analyze select * from entity_entity order by id limit 10 offset 
> 100;
>   
>   QUERY PLAN
>
> ---
>  Limit  (cost=44832.84..44833.28 rows=10 width=423) (actual 
> time=8280.248..8280.382 rows=10 loops=1)
>->  Index Scan using entity_entity_pkey_two on entity_entity 
>  (cost=0.44..823410.92 rows=18366416 width=423) (actual 
> time=0.232..6861.507 rows=110 loops=1)
>  Total runtime: 8280.442 ms
> (3 rows)
>
> http://explain.depesz.com/s/qexc
> # explain analyze select * from entity_entity where id > 100 order by 
> id limit 10;
> 
> QUERY PLAN
>
> ---
>  Limit  (cost=0.44..0.91 rows=10 width=423) (actual time=0.044..0.102 
> rows=10 loops=1)
>->  Index Scan using entity_entity_pkey_two on entity_entity 
>  (cost=0.44..823830.94 rows=17405140 width=423) (actual time=0.040..0.071 
> rows=10 loops=1)
>  Index Cond: (id > 100)
>  Total runtime: 0.152 ms
> (4 rows)
>
>
> And that's not even all the way in the table. That's just the beginning.
>
>
>> read the relevant theread I posted before. There are huge memory 
>>> allocation issues even if you just iterate the first two items of a query 
>>> that potentially returns a million rows: all the rows will be fetched and 
>>> stored in memory by  psycopg and django ORM 
>>>
>>
>> Thats only happening since django will built something like "SELECT 
>> "bla", "bla", "bla" FROM table LIMIT 10 OFFSET 100;
>> There is no ORDER BY clause, which is of course really slow, even with 
>> only 1000 entries inside a table.
>>
>
> Once again, great in theory. Doesn't always mean much in real world 
> scenarios. Compare the followi

Re: Configurable safety options for high performance Django systems

2014-11-24 Thread Mattias Linnap
Since your use case seems to be "avoid blocking the server when one 
pageview uses unexpectedly many resources", and you don't want to fix or 
optimise the actual application code, why not set these limits at a level 
where it makes sense?

For example:
* http://uwsgi-docs.readthedocs.org/en/latest/Options.html#limit-as sets a 
memory allocation limit on the UWSGI worker processes, and terminates & 
restarts the worker if it uses more memory. This will protect against not 
just big querysets, but also big plain python lists, loading huge image or 
xml files, memory leaks in C extensions and any other memory-consuming 
issues.
* 
http://uwsgi-docs.readthedocs.org/en/latest/FAQ.html#what-is-harakiri-mode 
sets a timeout on request handling, and terminates & restarts the worker if 
it takes more than X seconds to serve the request. This protects against 
not just big querysets, but also any other slow code that ties up the 
worker, for example forgetting to set a timeout on calls to external APIs, 
or a plain old infinite loop.
* 
http://stackoverflow.com/questions/5421776/how-can-i-limit-database-query-time-during-web-requests
 
You can set a Postgresql query timeout in seconds for the queries run by 
the web app user. This makes far more sense than a limit on the result 
size, as complex JOINs can be very resource-intensive but only return a few 
rows in the result.
There should be similar configuration options for other app servers and 
databases as well.

Django's ORM is just the wrong level in the software stack for these 
limits, since there are hundreds of other ways to kill the performance of a 
server, and the number of results in a queryset is a poor indicator of 
performance issues.

On Sunday, 23 November 2014 23:53:26 UTC+2, Rick van Hattem wrote:
>
>
> On 23 Nov 2014 22:13, "Christophe Pettus" > 
> wrote:
> >
> >
> > On Nov 23, 2014, at 1:07 PM, Rick van Hattem > 
> wrote:
> >
> > > > Not really, cause psycopg already fetched everything.
> > >
> > > Not if Django limits it by default :)
> >
> > Unfortunately, that's not how it works.  There are three things that 
> take up memory as the result of a query result:
> >
> > 1. The Django objects.  These are created in units of 100 at a time.
> > 2. The psycopg2 Python objects from the result.  These are already 
> limited to a certain number (I believe to 100) at a time.
> > 3. The results from libpq.  These are not limited, and there is no way 
> of limiting them without creating a named cursor, which is a significant 
> change to how Django interacts with the database.
> >
> > In short, without substantial, application-breaking changes, you can't 
> limit the amount of memory a query returns unless you add a LIMIT clause to 
> it.  However, adding a LIMIT clause can often cause performance issues all 
> by itself:
> >
> > http://thebuild.com/blog/2014/11/18/when-limit-attacks/
> >
> > There's no clean fix that wouldn't have significant effects on 
> unsuspecting applications.
>
> Very true, that's a fair point. That's why I'm opting for a configurable 
> option. Patching this within Django has saved me in quite a few cases but 
> it can have drawbacks.
>
> > --
> > -- Christophe Pettus
> >x...@thebuild.com 
> >
> > --
> > You received this message because you are subscribed to a topic in the 
> Google Groups "Django developers  (Contributions to Django itself)" group.
> > To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/django-developers/aYwPykvLaMU/unsubscribe
> .
> > To unsubscribe from this group and all its topics, send an email to 
> django-develop...@googlegroups.com .
> > To post to this group, send email to django-d...@googlegroups.com 
> .
> > Visit this group at http://groups.google.com/group/django-developers.
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/CD10DACF-4D0A-458D-BB85-0F7BB8BFF4C0%40thebuild.com
> .
> > For more options, visit https://groups.google.com/d/optout.
>  

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/e17fb633-b3d6-422c-9f6d-dc7fddae1aab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Modify get_or_create() to use field lookups parameters for values

2014-07-26 Thread Mattias Linnap
One way to make it work in presence of field lookups would be to demand 
that the full values of mandatory fields must be present in the defaults 
dictionary.

For example:
Model.objects.get_or_create(name_iexact='hello', defaults={'name': 'Hello', 
'slug': 'foo'}) or 
Model.objects.get_or_create(pub_date__month=12, defaults={'pub_date': 
timezone.now().date()})

On Tuesday, 10 June 2014 08:26:07 UTC+3, gavi...@gmail.com wrote:
>
> I don't think this is possible to do generally. What would count__gt=1 or 
> pub_date__month=12 do?
>
> On Friday, June 6, 2014 3:50:08 PM UTC-6, Patrick Bregman wrote:
>>
>> Hi all,
>>
>> First of, I'm new to this list so please tell me if there's something 
>> that can be done better. It's the best way to learn for me ;)
>>
>> Recently I've been doing some reworking of an old (think Django 1.1 or 
>> 1.2) webapp of mine into the latest and greatest of Django. In the process 
>> I modified some models, and thought that *get_or_create* would be 
>> perfect to replace boring try / except cases. Except that it didn't really 
>> work as planned. I tried to do a *get_or_create(name__iexact=value, 
>> defaults={'slug': slugify(value)})*. I expected this to be smart enough 
>> to know that it should fill the field *name* based on the *name__iexact* 
>> parameter. Apparently it isn't :) In this case you'd need to add a 
>> definition for *name* to the *defaults* dict to get the value into the 
>> newly created model. I'm not sure, but I personally think this shouldn't be 
>> that hard to fix. It's basically checking (and removing) known field 
>> lookups or simply everything after a double underscore, and using that as a 
>> field name. Or at least, that's my view on it.
>>
>> The big question is:
>> 1. Is this behavior that exotic that I'm the first one to notice, or did 
>> I do a wrong search on Google?
>> 2. Would this indeed be as easy (or comparably easy) as I think? 
>> 3. Is this behavior we actually want in Django?
>>
>> If the answer to 2 and 3 are yes, I'll look into giving it a try to 
>> making a patch for this myself.
>>
>> Regards,
>> Patrick Bregman
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/dbe4723f-2a8c-4a5c-94cd-be264c1198ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What are the best reasons for when and why people should use Django?

2014-08-09 Thread Mattias Linnap
I'm not sure it's possible to make a case for Django vs. Rails on anything 
but personal preference.
But it might be easier to argue why *Python* is a good language to learn, 
and therefore Django is the obvious web framework to use.

For a very wide range of use cases, Python tends to be the second-best tool 
to use (and personally, I think often the first):
* Heavy numerical modelling? Matlab might be the best, but Python + numpy 
is also great.
* Analyzing data and plotting pretty graphs? R might be the best, but 
Python + scipy + matplotlib is also great, especially if you need to 
pre-process the data.
* Financial analysis? Excel is mandatory, but you can replace the horror of 
VBA with Python plugins.
* Command-line utilities? Could use bash, perl or something else, but 
Python works best for anything longer than 10 lines.
* Cross-platform desktop apps? Maybe Java or C#, but Python + QT is pretty 
good.
* Web apps? Python + Django is one of the best.

And so on. Especially for non-computer scientists, learning Python instead 
of a specialised language or tool opens up the most possibilities. 


On Saturday, 9 August 2014 11:30:54 UTC+3, Eric Frost wrote:
>
> Django Developers, 
>
> I'm giving a talk at a general tech conference next week and it's 
> mostly ready, but would welcome response for when and why developers 
> should look to Django (vs. other frameworks like Ruby on Rails) when 
> starting new projects. 
>
> Would love to hear any thoughts and arguments! 
>
> Thanks! 
> Eric 
> m: 312-399-1586 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/69249dfb-3e2e-4694-a34e-504e23944fbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: 1.8 release planning

2014-10-23 Thread Mattias Linnap
My impression from the 1.7 release schedule was that there were many 
release blockers found in RC stage, and RC with known bugs released because 
of three combined reasons:
* The release had a number of big new features, including app loading and 
migrations.
* In alpha and beta stages, the testers tend to be developers working on 
the new release itself. Once RC is released, the release is seen as mostly 
bug free, and many more Django users start using and testing the new 
release in production on new greenfield projects and non-mission critical 
sites where having the latest and greatest features is better than 
stumbling on a few bugs. This production testing helps to discover many new 
bugs, some of them release blockers.
* Due to the high number of release blockers being discovered and fixed, a 
new version identifier was released to simplify reporting new bugs in Trac 
against the version that does or does not have these bugs.

I know that the Django project tries to stay conservative and not recommend 
betas and RCs for production use, since people might get upset if their 
mission critical sites break. But I think the release and testing process 
would benefit from the core developers giving their actual "best guess" 
towards the stability of a beta and RC version, and let users judge the 
acceptable risk of deploying it in production. There are plenty of hobby 
projects, internal sites, and new not-yet-released sites out there where a 
few days of downtime or hacking together workarounds isn't a big problem, 
but for example data loss would be bad. With a description that is less 
conservative than "never use it in production", more people might join in 
on testing and reporting bugs before an RC.

On Friday, 17 October 2014 22:48:18 UTC+3, Tim Graham wrote:
>
> I'd like to kickoff the discussion on the timetable and process for the 
> 1.8 release. I am also volunteering to be the release manager.
>
> First, a retrospective on the 1.7 release with planned release dates and 
> (actual):
>
> Jan. 20: alpha (Jan. 22)
> March 6: beta (March 20)
> May 1: RC (June 26)
> May 15: final (Sept. 2)
>
> One observation I have is that each stage of the release does not really 
> do a good job at accurately reflecting our belief about the quality of the 
> code. For example, we have an "alpha" in order to have a major feature 
> freeze, but we still allow a significant amount of minor features (3 months 
> worth in the last release) such that the alpha and beta are hardly 
> comparable. Likewise, we had little confidence that the "RC" would actually 
> be released without further changes, but rather we needed to do the release 
> in order to get to the stage where we would only backport release blocking 
> bugs. Therefore, I am going to propose returning to a process that is 
> closer to what's documented in the Release cycle docs [1]. The idea is to 
> front-load all feature work to pre-alpha so that we can become more 
> conservative with backports sooner.
>
> Here is my proposed schedule:
>
> Jan. 12: alpha
>   - Feature freeze including minor features (minor features were allowed 
> until beta in the past)
>   - fork stable/1.8.x from master (in the past we forked after beta, but 
> now that we'd no longer accept minor features after alpha, we'd need to 
> fork sooner).
>   - I picked this date since it is after the end of the year when I 
> imagine many people are on holiday and therefore able to contribute more to 
> open source.
>   - Non-release blocking bug fixes may be backported at the committer's 
> discretion after this date.
>
> Feb. 16: beta
>   - Only release blocking bugs are allowed to be backported after this 
> date.
>   - Aggressively advertise it for testing
>
> March 16: release candidate
>   - Hopefully a true release candidate. If there is still a consistent 
> stream of release blockers coming in at this date; we'd release beta 2 to 
> encourage further testing and push the release candidate date out ~1 month.
>
> March 30: final
>   - Release a final as long as the release blocker stream is sufficiently 
> low. If not, give an update about the status and make a plan as to how to 
> proceed from there.
>
> On a related note, I believe we should give some guidance on our thinking 
> regard LTS. Currently our docs say, "Django 1.4, supported until at least 
> March 2015." If we adopt 1.8 as the next LTS, I propose to support 1.4 
> until 6 months after 1.8 is released, which would be at least September 
> 2015. Like 1.4, we'd advertise LTS support for 1.8 for at least 3 years 
> after it's released with a decision on the next LTS to be made as we 
> approach that date.
>
> Feedback on the proposed schedule and handling of the LTS cycle would be 
> appreciated!
>
> If you have any major features you plan to shepherd for this cycle, please 
> ensure they are listed on the roadmap: 
> https://code.djangoproject.com/wiki/Version1.8Roadmap
>
> [1] 
> https://docs.djangoproject.com/en/dev/in

PostgreSQL Partial Indexes package

2017-10-07 Thread Mattias Linnap
Hi django-developers,

I have written a package that implements PostgreSQL and SQLite partial 
indexes on top of the new class-based 
indexes: https://github.com/mattiaslinnap/django-partial-index
The most common use case is partial unique constraints, but I have a few 
projects where non-unique partial indexes have turned out useful as well.

I have a few questions on how to continue with this:

1. Right now the "where condition" expression is provided as a string, and 
has to be different for PostgreSQL and SQLite in some common cases (for 
example boolean literals). Is there a good abstraction for SQL expressions 
somewhere in Django internals that I could use instead, something similar 
to Q-expressions perhaps? In particular, to add validate_unique() support 
to ModelForms, I would need to be able to extract all fields that are 
mentioned in the where condition.
2. I've seen mentions of a "check constraints" support being in development 
(https://github.com/django/django/pull/7615). Will that include partial 
unique constraints, or is just for the per-column checks?
3. If separate, then it would be nice to one day get partial indexes merged 
into the contrib.postgres package. Do you have any suggestions on what 
needs to happen before that - more test coverage, more contributors, more 
users, or similar?

Best,

Mattias

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/2f9b5333-8e6d-4c0f-9bd3-ae1cf79397c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Feedback on improving GeoDjango docs

2016-04-04 Thread Mattias Linnap
Great to hear, GeoDjango documentation has always seemed half-finished to 
me, and only useful to people who are already familiar with GIS terminology.

Based on my impressions from various forum posts over the years, beginners 
who are looking at GeoDjango:
* Have never heard about OGC geometries, PostGIS, WKT, WKB, SRIDs, ESRI 
Shapefiles, GEOS, GDAL and Proj libraries, etc.
* Have some understanding of Google Maps and GPS longitude & latitude 
coordinates.

Their main questions seem to be:
* How to have models of "places" which have a location field, and how to 
find N nearest places, all places in the currently visible map region, or 
distances between places.
* Is it worth learning GeoDjango for this, or should they just add two 
FloatFields to the model.

For example:
https://www.reddit.com/r/django/comments/4csqm0/im_wondering_if_geodjango_is_overkill_in_my_case/
https://www.reddit.com/r/learnpython/comments/4240va/making_a_webapp_in_django_how_to_do_locationbased/

For the tutorial, the content might cover:
0: Installation of required dependencies, with recommended versions 
(otherwise they get scared of the version compatibility matrices 
at https://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS). 
Explaining the purpose of each dependency can be a separate optional page.
1: A "places" type model, with N nearest + distance or region lookups. It 
should default to WGS84 coordinates, leaving the discussion of other 
coordinate systems for later.
1.5: Best practices for displaying the places on a map in the browser with 
some Javascript library (OpenLayers, Leaflet or Google Maps).
2. In a more advanced tutorial, perhaps adding some more complex geometries 
like polygons for regions or linestrings for GPS tracks, and operators like 
contains or intersections.

For the theme around the tutorials, I'm sure restaurants, apartments for 
rent, ice cream kiosks, or elephants in natural parks work equally well. :)

Mattias

On Sunday, 3 April 2016 12:55:18 UTC+3, Sylvain Fankhauser wrote:
>
> Hello,
>
> I'm currently working on ticket #22274 
> , which is an effort to 
> improve the GeoDjango documentation to make it more accessible to 
> newcomers. This represents quite some work and I'm still in the early 
> stages but I'd love to have some feedback on the general structure I'm 
> planning to create. You can find the current status of the potential new 
> docs here: http://sephii.github.io/django-22274/ref/contrib/gis/index.html 
> (current version of the docs can be found here 
> https://docs.djangoproject.com/en/1.9/ref/contrib/gis/).
>
> The tutorial part is not yet representative of what I'm planning to do so 
> no need to check that yet. That said, I found it quite difficult to come up 
> with a tutorial topic that would allow me to get through most of the 
> features from GeoDjango. So if you have an idea for a usecase I could use 
> for the tutorial, this would be more than welcome.
>
> Some other things I have on my todo list are:
>
> * Clarify installation instructions, split the platform-specific 
> instructions into separate documents
> * Check if we can try to merge together some parts in the "Models" section 
> (ie. GeoDjango Database API, Geographic Database Functions and GeoQuerySet 
> API Reference), which all serve the same purpose
>
> If you see anything else, feel free to comment.
>
> Cheers,
> Sylvain
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/49265d32-91dc-45a9-afc9-1c0d6944c256%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.