I believe we should be optimising for the **common** use case, not
expecting everyone to be experts with the ORM.
> If I can come up with a single example where it would significantly
decrease performance (either memory usage or speed) compared to the
default (and I'm sure I can), then I would be strongly opposed to it
ever being default behaviour.
The status quo is already one where thousands of users and sites are
doing the non-optimal thing because we're choosing to be conservative
and have users opt-in to the optimal behaviour. A massive complaint
against Django is how easy it is for users to build in 1+N behaviour.
Django is supposed to abstract the database away (so much so that we
avoid SQL related terms in our queryset methods), yet one of the more
fundamental concepts such as joins we expect users to know about and
optimise for.
I'd be more in favour of throwing an error on
non-select-related-foreign-key-access than what we're currently doing
which is a query for each access.
The only options users currently have of monitoring poor behaviour is:
1. Add logging to django.db.models
2. Add django-debug-toolbar
3. Investigate page slow downs
Here's a bunch of ways that previously tuned queries can "go bad":
1. A models `__str__` method is updated to include a related field
2. A template uses a previously unused related field
3. A report uses a previously unused related field
4. A ModelAdmin adds a previously unused related field
I think a better question to ask is:
- How many people have had their day/site ruined because we think
auto-prefetching is too magical?
- How many people would have their day/site ruined because we think
auto-prefetching is the better default?
If we were introducing a new ORM, I think the above answer would be
obvious given what we know of Django use today.
What I'd propose:
1. (optional) A global setting to disable autoprefetching
2. An opt out per queryset
3. (optional) An opt out per Meta?
4. Logging any autoprefetches - perhaps as a warning?
More experienced Django users that do not want this behaviour are going
to know about a global setting and can opt in to the old behaviour
rather easily. Newer users that do not know about
select/prefetch_related or these settings will fall into the new
behaviour by default.
It's unreasonable to expect every user of django learn the ins and outs
of all queryset methods. I'm probably considered a django orm expert,
and I still sometimes write queries that are non-optimal or *become*
non-optimal after changes in unrelated areas. At an absolute minimum we
should be screaming and shouting when this happens. But we can also fix
the issue while complaining, and help guide users into correct behaviour.
On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
Automatically prefetching is something I feel should be avoided.
A common gripe I have with ORMs is they hide what's actually
happening with the database, resulting in
beginners-going-on-intermediates building libraries/systems that
don't scale well.
We have several views in a dashboard, where a relation may be
accessed once or twice while iterating over a large python filtered
queryset.
Prefetching this relation based on the original queryset has the
potential to add around 5 seconds to the response time (probably
more, that table has doubled in size since I last measured it).
I feel it would be better to optimise for your usecase, as apposed
to try to prevent uncalled-for behaviour.
On Aug 15, 2017 23:15, "Luke Plant" <l.pla...@cantab.net
<javascript:>> wrote:
I agree with Marc here that the proposed optimizations are
'magical'. I think when it comes to optimizations like these you
simply cannot know in advance whether doing extra queries is
going to a be an optimization or a pessimization. If I can come
up with a single example where it would significantly decrease
performance (either memory usage or speed) compared to the
default (and I'm sure I can), then I would be strongly opposed
to it ever being default behaviour.
Concerning implementing it as an additional QuerySet method
like `auto_prefetch()` - I'm not sure what I think, I feel like
it could get icky (i.e. increase our technical debt), due to the
way it couples things together. I can't imagine ever wanting to
use it, though, I would always prefer the manual option.
Luke
On 15/08/17 21:02, Marc Tamlyn wrote:
Hi Gordon,
Thanks for the suggestion.
I'm not a fan of adding a layer that tries to be this clever.
How would possible prefetches be identified? What happens when
an initial loop in a view requires one prefetch, but a
subsequent loop in a template requires some other prefetch?
What about nested loops resulting in nested prefetches? Code
like this is almost guaranteed to break unexpectedly in
multiple ways. Personally, I would argue that correctly
setting up and maintaining appropriate prefetches and selects
is a necessary part of working with an ORM.
Do you know of any other ORMs which attempt similar magical
optimisations? How do they go about identifying the cases
where it is necessary?
On 15 August 2017 at 10:44, Gordon Wrigley
<gordon....@gmail.com <javascript:>> wrote:
I'd like to discuss automatic prefetching in querysets.
Specifically automatically doing prefetch_related where
needed without the user having to request it.
For context consider these three snippets using the
Question & Choice models from the tutorial
<https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> when
there are 100 questions each with 5 choices for a total of
500 choices.
Default
|
forchoice inChoice.objects.all():
print(choice.question.question_text,':',choice.choice_text)
|
501 db queries, fetches 500 choice rows and 500 question
rows from the DB
Prefetch_related
|
forchoice inChoice.objects.prefetch_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
2 db queries, fetches 500 choice rows and 100 question
rows from the DB
Select_related
|
forchoice inChoice.objects.select_related('question'):
print(choice.question.question_text,':',choice.choice_text)
|
1 db query, fetches 500 choice rows and 500 question rows
from the DB
I've included select_related for completeness, I'm not
going to propose changing anything about it's use. There
are places where it is the best choice and in those places
it will still be up to the user to request it. I will note
that anywhere select_related is optimal prefetch_related
is still better than the default and leave it at that.
The 'Default' example above is a classic example of the
N+1 query problem, a problem that is widespread in Django
apps.
This pattern of queries is what new users produce because
they don't know enough about the database and / or ORM to
do otherwise.
Experieced users will also often produce this because it's
not always obvious what fields will and won't be used and
subsequently what should be prefetched.
Additionally that list will change over time. A small
change to a template to display an extra field can result
in a denial of service on your DB due to a missing prefetch.
Identifying missing prefetches is fiddly, time consuming
and error prone. Tools like django-perf-rec
<https://github.com/YPlan/django-perf-rec> (which I was
involved in creating) and nplusone
<https://github.com/jmcarp/nplusone> exist in part to flag
missing prefetches introduced by changed code.
Finally libraries like Django Rest Framework and the Admin
will also produce queries like this because it's very
difficult for them to know what needs prefetching without
being explicitly told by an experienced user.
As hinted at the top I'd like to propose changing Django
so the default code behaves like the prefetch_related code.
Longer term I think this should be the default behaviour
but obviously it needs to be proved first so for now I'd
suggest a new queryset function that enables this behaviour.
I have a proof of concept of this mechanism that I've used
successfully in production. I'm not posting it yet because
I'd like to focus on desired behavior rather than
implementation details. But in summary, what it does is
when accessing a missing field on a model, rather than
fetching it just for that instance, it runs a
prefetch_related query to fetch it for all peer instances
that were fetched in the same queryset. So in the example
above it prefetches all Questions in one query.
This might seem like a risky thing to do but I'd argue
that it really isn't.
The only time this isn't superior to the default case is
when you are post filtering the queryset results in Python.
Even in that case it's only inferior if you started with a
large number of results, filtered basically all of them
and the code is structured so that the filtered ones
aren't garbage collected.
To cover this rare case the automatic prefetching can
easily be disabled on a per queryset or per object basis.
Leaving us with a rare downside that can easily be
manually resolved in exchange for a significant general
improvement.
In practice this thing is almost magical to work with.
Unless you already have extensive and tightly maintained
prefetches everywhere you get an immediate boost to
virtually everything that touches the database, often
knocking orders of magnitude off page load times.
If an agreement can be reached on pursuing this then I'm
happy to put in the work to productize the proof of concept.
--
You received this message because you are subscribed to
the Google Groups "Django developers (Contributions to
Django itself)" group.
To unsubscribe from this group and stop receiving emails
from it, send an email to
django-develop...@googlegroups.com <javascript:>.
To post to this group, send email to
django-d...@googlegroups.com <javascript:>.
Visit this group at
https://groups.google.com/group/django-developers
<https://groups.google.com/group/django-developers>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com
<https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the
Google Groups "Django developers (Contributions to Django
itself)" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to django-develop...@googlegroups.com
<javascript:>.
To post to this group, send email to
django-d...@googlegroups.com <javascript:>.
Visit this group at
https://groups.google.com/group/django-developers
<https://groups.google.com/group/django-developers>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com
<https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the
Google Groups "Django developers (Contributions to Django
itself)" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to django-develop...@googlegroups.com
<javascript:>.
To post to this group, send email to
django-d...@googlegroups.com <javascript:>.
Visit this group at
https://groups.google.com/group/django-developers
<https://groups.google.com/group/django-developers>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net
<https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to django-developers+unsubscr...@googlegroups.com
<mailto:django-developers+unsubscr...@googlegroups.com>.
To post to this group, send email to django-developers@googlegroups.com
<mailto:django-developers@googlegroups.com>.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/2f0b5932-1a38-4eaf-84aa-13960a303141%40googlegroups.com
<https://groups.google.com/d/msgid/django-developers/2f0b5932-1a38-4eaf-84aa-13960a303141%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.