I think this is an excellent suggestion.

It seems generally accepted in this thread that although there are cases
where this would hurt performance, it would on average solve more problems
than it creates. The debate seems to be more whether or not it's "right"
for the ORM to behave in this magical way. With that in mind...

It won't affect experienced users. They'll read the release notes, see that
this change has been implemented, and either go and delete a bunch of
prefetch_related() calls, grumble a bit and turn auto-prefetch off globally
or just file it away as another fact they know about the Django ORM.

For beginners, this can defer the pain of learning about N+1 problems,
potentially forever depending on the scale of the project. Ultimately
Django's job isn't to teach hard lessons about SQL gotchas, it's to make it
easy to make a nice website. This proposal would reduce both average load
time of Django pages and debugging time, at the cost of the average Django
developer being a little more ignorant about what queries the ORM
generates. I think that's a good trade and in line with the goals of the
project.

Django's ORM isn't SQLAlchemy - it's built on high-level concepts, designed
to be relatively beginner-friendly, and already pretty magical.
select_related() and prefetch_related() are somewhat awkward plugs for a
leaky abstraction, and IMO this is just a better plug.

Alex


On Wed, 16 Aug 2017 at 12:12 Cristiano Coelho <cristianocc...@gmail.com>
wrote:

> I would rather have warnings as well, adding more magical behavior is bad
> and might even degrade performance on some cases, automatically selecting a
> bunch of data that "might" be used is bad, and specially considering how
> slow python is, accidentally loading/building 1k+ objects when maybe only
> one of them is used would be as bad as doing 1k+ queries.
>
> If the systems you are building are that large and complicated you can't
> have people with 0 SQL knowledge doing stuff neither! So many things to
> tweak, indexes, data denormalization, proper joins here and there, unique
> constraints, locks and race conditions, someone attempting to code
> something that's not a blog or hello world really needs to know a bit about
> all of that.
>
>
>
> El martes, 15 de agosto de 2017, 6:44:19 (UTC-3), Gordon Wrigley escribió:
>>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>> For context consider these three snippets using the Question & Choice
>> models from the tutorial
>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>> when
>> there are 100 questions each with 5 choices for a total of 500 choices.
>>
>> Default
>> for choice in Choice.objects.all():
>>     print(choice.question.question_text, ':', choice.choice_text)
>> 501 db queries, fetches 500 choice rows and 500 question rows from the DB
>>
>> Prefetch_related
>> for choice in Choice.objects.prefetch_related('question'):
>>     print(choice.question.question_text, ':', choice.choice_text)
>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>
>> Select_related
>> for choice in Choice.objects.select_related('question'):
>>     print(choice.question.question_text, ':', choice.choice_text)
>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>
>> I've included select_related for completeness, I'm not going to propose
>> changing anything about it's use. There are places where it is the best
>> choice and in those places it will still be up to the user to request it. I
>> will note that anywhere select_related is optimal prefetch_related is still
>> better than the default and leave it at that.
>>
>> The 'Default' example above is a classic example of the N+1 query
>> problem, a problem that is widespread in Django apps.
>> This pattern of queries is what new users produce because they don't know
>> enough about the database and / or ORM to do otherwise.
>> Experieced users will also often produce this because it's not always
>> obvious what fields will and won't be used and subsequently what should be
>> prefetched.
>> Additionally that list will change over time. A small change to a
>> template to display an extra field can result in a denial of service on
>> your DB due to a missing prefetch.
>> Identifying missing prefetches is fiddly, time consuming and error prone.
>> Tools like django-perf-rec <https://github.com/YPlan/django-perf-rec>
>> (which I was involved in creating) and nplusone
>> <https://github.com/jmcarp/nplusone> exist in part to flag missing
>> prefetches introduced by changed code.
>> Finally libraries like Django Rest Framework and the Admin will also
>> produce queries like this because it's very difficult for them to know what
>> needs prefetching without being explicitly told by an experienced user.
>>
>> As hinted at the top I'd like to propose changing Django so the default
>> code behaves like the prefetch_related code.
>> Longer term I think this should be the default behaviour but obviously it
>> needs to be proved first so for now I'd suggest a new queryset function
>> that enables this behaviour.
>>
>> I have a proof of concept of this mechanism that I've used successfully
>> in production. I'm not posting it yet because I'd like to focus on desired
>> behavior rather than implementation details. But in summary, what it does
>> is when accessing a missing field on a model, rather than fetching it just
>> for that instance, it runs a prefetch_related query to fetch it for all
>> peer instances that were fetched in the same queryset. So in the example
>> above it prefetches all Questions in one query.
>>
>> This might seem like a risky thing to do but I'd argue that it really
>> isn't.
>> The only time this isn't superior to the default case is when you are
>> post filtering the queryset results in Python.
>> Even in that case it's only inferior if you started with a large number
>> of results, filtered basically all of them and the code is structured so
>> that the filtered ones aren't garbage collected.
>> To cover this rare case the automatic prefetching can easily be disabled
>> on a per queryset or per object basis. Leaving us with a rare downside that
>> can easily be manually resolved in exchange for a significant general
>> improvement.
>>
>> In practice this thing is almost magical to work with. Unless you already
>> have extensive and tightly maintained prefetches everywhere you get an
>> immediate boost to virtually everything that touches the database, often
>> knocking orders of magnitude off page load times.
>>
>> If an agreement can be reached on pursuing this then I'm happy to put in
>> the work to productize the proof of concept.
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/8d272787-7b8a-430b-a7d4-453981f894d6%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/8d272787-7b8a-430b-a7d4-453981f894d6%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CA%2BKBOKyBGLsBRU9eNr9YAZgu85a_5mnvgP1T%3DDFG-vSyviEk2w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
    • R... Luke Plant
      • ... Sean Brant
      • ... Anthony King
        • ... Josh Smeaton
          • ... Curtis Maloney
          • ... Luke Plant
            • ... Marc Tamlyn
              • ... Tom Forbes
              • ... Gordon Wrigley
  • Re: Au... Cristiano Coelho
    • R... Alexander Hill
      • ... Josh Smeaton
        • ... Adam Johnson
          • ... 'Tom Evans' via Django developers (Contributions to Django itself)
            • ... Shai Berger
              • ... Josh Smeaton
              • ... Gordon Wrigley
        • ... Anssi Kääriäinen
  • Re: Au... Aymeric Augustin

Reply via email to