Re: Automatic prefetching in querysets

2017-08-17 Thread Malcolm Box
Hi,

I think there's a potential to make this opt-in, and improve the out-of-box 
experience.

Summarising the discussion, it seems that the rough consensus is that if we 
were building the ORM from scratch, then this would be entirely sensible 
behaviour (with necessary per-QS ways to disable) - it would remove a 
common performance problem (N+1 queries), would improve areas where adding 
prefetch_related to queries is awkward, and in rare cases where it 
decreased performance there would be documented ways to fix.

So the main disagreement is about how to get there from here, and there's 
concern about three types of users:

1 - Existing, non-expert users whose sites work now (because otherwise they 
would have already fixed the problem) but who have lurking N+1 issues which 
will break them later
2 - Existing, non-expert users whose sites work now, but would have 
performance issues if this was enabled by an upgrade
3 - Existing, expert users who have already found and fixed the issues and 
who could therefore get no benefit but might suffer a performance 
degradation.

I'll assert that the size of these populations above is listed in roughly 
size order, with #1 being the biggest. This is a hunch based on most sites 
not having huge tables where N+1 becomes a problem - at least not until 
they've been running for a few years and accumulated lots of data...

There is another population that hasn't been considered - users starting 
django projects (ie people running django-admin startproject). Over time, 
this is by far the largest population. 

So would a sensible approach be:

- Feature is globally opt-in 
- startproject opts in for new projects
- Release notes mention the new flag loudly, and encourage people to try 
switching it on
- We add the debug tracing to help people find places where this setting 
would help - and encourage them to enable it globally before trying 
individual queryset.prefetch_related

Then over time, all new projects will have the new behaviour. Old projects 
will gradually upgrade - everyone in category 1 will hit the "make it work" 
switch the first time they see the warning / see a problem. Experts can 
choose how they migrate - as Adam points out, even experts can miss things.

Finally after a suitable warning period, this can become an opt-out feature 
and we arrive in the sunny world of an ORM that works better for all users.

Cheers,

Malcolm

On Wednesday, 16 August 2017 21:17:49 UTC+1, Aymeric Augustin wrote:
>
> On 15 Aug 2017, at 11:44, Gordon Wrigley  > wrote:
>
> I'd like to discuss automatic prefetching in querysets. Specifically 
> automatically doing prefetch_related where needed without the user having 
> to request it.
>
>
>
> Hello,
>
> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in 
> the admin or elsewhere gets old.
>
>
> In addition to everything that was said already, I'd like to point out 
> that Django already has a very similar "magic auto prefetching" behavior in 
> some cases :-)
>
> I'm referring to the admin which calls select_related() on non-nullable 
> foreign keys in the changelist view. The "non-nullable" condition makes 
> that behavior hard to predict — I'd go as far as to call it non 
> deterministic. For details, see slide 54 of 
> https://myks.org/data/20161103-Django_Under_the_Hood-Debugging_Performance.pdf
>  and 
> the audio commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>
>
> The feature proposed here is most useful if it's opt-out because it 
> targets people who aren't aware that the problem even exists — at best they 
> notice that Django is slow and that reminds them vaguely of a rant that 
> explains why ORMs are the worst thing since object oriented programming.
>
> It should kick in only when no select_related or prefetch_related is in 
> effect, to avoid interfering with pre-existing optimizations. It's still 
> easy to construct an example where it would degrade performance but I don't 
> think such situations will be common in practice. Still, there should be a 
> per-queryset opt-out for these cases.
>
> We may want to introduce it with a deprecation path, that is, make it 
> opt-in at first and log a deprecation warning where the behavior would 
> kick-in, so developers who want to disable it can add the per-queryset 
> opt-out.
>
>
> At this point, my main concerns are:
>
> 1) The difficulty of identifying where the queryset originates, given that 
> querysets are lazy. Passing objects around is common; sometimes it can be 
> hard to figure out where an object comes from. It isn't visible in the 
> stack trace. In my opinion this is the strongest argument against the 
> feature.
>
> 2) The lack of this feature for reverse one-to-one relations; it's only 
> implemented for foreign keys. It's hard to tell them apart in Python code. 
> The subtle differences, like return None vs. raise ObjectDoesNotExist when 
> there's no related object, degrade the developer experience.
>
> 3) The strong opin

Re: Automatic prefetching in querysets

2017-08-17 Thread Andrew Godwin
Just some quick thoughts, as most of the points I would bring up have been
discussed:

- This should definitely be opt-in for the first release, it's too big a
change to make opt-out straight away.

- You should be able to turn it on/off per model, probably in Meta, not in
a setting, and obviously be able to override it on the queryset level

- I am concerned for the nasty performance edge cases this will cause with
single-object access patterns (like Anssi's example), but I think the
overall gain would likely be worth it.

In general, very large sites won't be able to use this at all as JOINs or
cross-table queries in one piece of code aren't allowed, but they probably
already have plenty of expertise in this area. We also need to give
consideration to how it will interact with multiple database support, and
third-party solutions to things like sharding.

Andrew

On Thu, Aug 17, 2017 at 2:02 AM, Malcolm Box  wrote:

> Hi,
>
> I think there's a potential to make this opt-in, and improve the
> out-of-box experience.
>
> Summarising the discussion, it seems that the rough consensus is that if
> we were building the ORM from scratch, then this would be entirely sensible
> behaviour (with necessary per-QS ways to disable) - it would remove a
> common performance problem (N+1 queries), would improve areas where adding
> prefetch_related to queries is awkward, and in rare cases where it
> decreased performance there would be documented ways to fix.
>
> So the main disagreement is about how to get there from here, and there's
> concern about three types of users:
>
> 1 - Existing, non-expert users whose sites work now (because otherwise
> they would have already fixed the problem) but who have lurking N+1 issues
> which will break them later
> 2 - Existing, non-expert users whose sites work now, but would have
> performance issues if this was enabled by an upgrade
> 3 - Existing, expert users who have already found and fixed the issues and
> who could therefore get no benefit but might suffer a performance
> degradation.
>
> I'll assert that the size of these populations above is listed in roughly
> size order, with #1 being the biggest. This is a hunch based on most sites
> not having huge tables where N+1 becomes a problem - at least not until
> they've been running for a few years and accumulated lots of data...
>
> There is another population that hasn't been considered - users starting
> django projects (ie people running django-admin startproject). Over time,
> this is by far the largest population.
>
> So would a sensible approach be:
>
> - Feature is globally opt-in
> - startproject opts in for new projects
> - Release notes mention the new flag loudly, and encourage people to try
> switching it on
> - We add the debug tracing to help people find places where this setting
> would help - and encourage them to enable it globally before trying
> individual queryset.prefetch_related
>
> Then over time, all new projects will have the new behaviour. Old projects
> will gradually upgrade - everyone in category 1 will hit the "make it work"
> switch the first time they see the warning / see a problem. Experts can
> choose how they migrate - as Adam points out, even experts can miss things.
>
> Finally after a suitable warning period, this can become an opt-out
> feature and we arrive in the sunny world of an ORM that works better for
> all users.
>
> Cheers,
>
> Malcolm
>
> On Wednesday, 16 August 2017 21:17:49 UTC+1, Aymeric Augustin wrote:
>>
>> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
>>
>> I'd like to discuss automatic prefetching in querysets. Specifically
>> automatically doing prefetch_related where needed without the user having
>> to request it.
>>
>>
>>
>> Hello,
>>
>> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in
>> the admin or elsewhere gets old.
>>
>>
>> In addition to everything that was said already, I'd like to point out
>> that Django already has a very similar "magic auto prefetching" behavior in
>> some cases :-)
>>
>> I'm referring to the admin which calls select_related() on non-nullable
>> foreign keys in the changelist view. The "non-nullable" condition makes
>> that behavior hard to predict — I'd go as far as to call it non
>> deterministic. For details, see slide 54 of https://myks.org/data/20161
>> 103-Django_Under_the_Hood-Debugging_Performance.pdf and the audio
>> commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>>
>>
>> The feature proposed here is most useful if it's opt-out because it
>> targets people who aren't aware that the problem even exists — at best they
>> notice that Django is slow and that reminds them vaguely of a rant that
>> explains why ORMs are the worst thing since object oriented programming.
>>
>> It should kick in only when no select_related or prefetch_related is in
>> effect, to avoid interfering with pre-existing optimizations. It's still
>> easy to construct an example where it would degrade performance but I don't
>> thin

Re: Automatic prefetching in querysets

2017-08-17 Thread Collin Anderson
"turn it on/off per model" - I wonder if we just have a custom
manager/mixin for the per model case. objects =
AutoPrefetchRelatedManager(). The manager can return a queryset with
prefetch_related(auto=True) or whatever already applied.

On Thu, Aug 17, 2017 at 1:33 PM, Andrew Godwin  wrote:

> Just some quick thoughts, as most of the points I would bring up have been
> discussed:
>
> - This should definitely be opt-in for the first release, it's too big a
> change to make opt-out straight away.
>
> - You should be able to turn it on/off per model, probably in Meta, not in
> a setting, and obviously be able to override it on the queryset level
>
> - I am concerned for the nasty performance edge cases this will cause with
> single-object access patterns (like Anssi's example), but I think the
> overall gain would likely be worth it.
>
> In general, very large sites won't be able to use this at all as JOINs or
> cross-table queries in one piece of code aren't allowed, but they probably
> already have plenty of expertise in this area. We also need to give
> consideration to how it will interact with multiple database support, and
> third-party solutions to things like sharding.
>
> Andrew
>
> On Thu, Aug 17, 2017 at 2:02 AM, Malcolm Box  wrote:
>
>> Hi,
>>
>> I think there's a potential to make this opt-in, and improve the
>> out-of-box experience.
>>
>> Summarising the discussion, it seems that the rough consensus is that if
>> we were building the ORM from scratch, then this would be entirely sensible
>> behaviour (with necessary per-QS ways to disable) - it would remove a
>> common performance problem (N+1 queries), would improve areas where adding
>> prefetch_related to queries is awkward, and in rare cases where it
>> decreased performance there would be documented ways to fix.
>>
>> So the main disagreement is about how to get there from here, and there's
>> concern about three types of users:
>>
>> 1 - Existing, non-expert users whose sites work now (because otherwise
>> they would have already fixed the problem) but who have lurking N+1 issues
>> which will break them later
>> 2 - Existing, non-expert users whose sites work now, but would have
>> performance issues if this was enabled by an upgrade
>> 3 - Existing, expert users who have already found and fixed the issues
>> and who could therefore get no benefit but might suffer a performance
>> degradation.
>>
>> I'll assert that the size of these populations above is listed in roughly
>> size order, with #1 being the biggest. This is a hunch based on most sites
>> not having huge tables where N+1 becomes a problem - at least not until
>> they've been running for a few years and accumulated lots of data...
>>
>> There is another population that hasn't been considered - users starting
>> django projects (ie people running django-admin startproject). Over time,
>> this is by far the largest population.
>>
>> So would a sensible approach be:
>>
>> - Feature is globally opt-in
>> - startproject opts in for new projects
>> - Release notes mention the new flag loudly, and encourage people to try
>> switching it on
>> - We add the debug tracing to help people find places where this setting
>> would help - and encourage them to enable it globally before trying
>> individual queryset.prefetch_related
>>
>> Then over time, all new projects will have the new behaviour. Old
>> projects will gradually upgrade - everyone in category 1 will hit the "make
>> it work" switch the first time they see the warning / see a problem.
>> Experts can choose how they migrate - as Adam points out, even experts can
>> miss things.
>>
>> Finally after a suitable warning period, this can become an opt-out
>> feature and we arrive in the sunny world of an ORM that works better for
>> all users.
>>
>> Cheers,
>>
>> Malcolm
>>
>> On Wednesday, 16 August 2017 21:17:49 UTC+1, Aymeric Augustin wrote:
>>>
>>> On 15 Aug 2017, at 11:44, Gordon Wrigley  wrote:
>>>
>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>> automatically doing prefetch_related where needed without the user having
>>> to request it.
>>>
>>>
>>>
>>> Hello,
>>>
>>> I'm rather sympathetic to this proposal. Figuring out N + 1 problems in
>>> the admin or elsewhere gets old.
>>>
>>>
>>> In addition to everything that was said already, I'd like to point out
>>> that Django already has a very similar "magic auto prefetching" behavior in
>>> some cases :-)
>>>
>>> I'm referring to the admin which calls select_related() on non-nullable
>>> foreign keys in the changelist view. The "non-nullable" condition makes
>>> that behavior hard to predict — I'd go as far as to call it non
>>> deterministic. For details, see slide 54 of https://myks.org/data/20161
>>> 103-Django_Under_the_Hood-Debugging_Performance.pdf and the audio
>>> commentary at https://youtu.be/5fheDDj3oHY?t=2024.
>>>
>>>
>>> The feature proposed here is most useful if it's opt-out because it
>>> targets people who aren't aware that the prob

Re: Automatic prefetching in querysets

2017-08-17 Thread Aymeric Augustin
Hello,

> On 17 Aug 2017, at 14:48, Luke Plant  wrote:
> 
> On 16/08/17 23:17, Aymeric Augustin wrote:
>> It should kick in only when no select_related or prefetch_related is in 
>> effect, to avoid interfering with pre-existing optimizations. It's still 
>> easy to construct an example where it would degrade performance but I don't 
>> think such situations will be common in practice.
> 
> I think Ansii's example is a non-contrived and potentially very common one 
> where we will make things worse if this is default behaviour, especially for 
> the non-expert programmer this behaviour is supposed to help:
> 
> 
> if qs:
> qs[0].do_something()
> 
> (where do_something() access relations, or multiple relations). We will end 
> up with *multiple* large unnecessary queries instead of just one. The 
> difference is that the first one is easy to explain, the remainder are much 
> harder, and will contribute even more to the "Django is slow/ORMs are bad" 
> feeling.


I wasn't convinced by this example because it's already sub-optimal currently. 
`if qs` will fetch the whole queryset just to check if it has at least one 
element.

Assuming there's 1000 elements in qs, if you don't care about a 1000x 
inefficiency, I don't expect you to care much about two or three 1000x 
inefficiencies...

-- 
Aymeric.


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/E99B1838-FBE9-4E0F-91EC-8A5DD78B8106%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Default to BigAutoField

2017-08-17 Thread Kenneth Reitz
I have opened a pull request:

https://github.com/django/django/pull/8924

Andrew and I came up with a good solution for migrations, together at 
DjangoCon. 

On Wednesday, June 14, 2017 at 7:36:36 AM UTC-7, Melvyn Sopacua wrote:
>
> On Friday 09 June 2017 15:59:50 Kenneth Reitz wrote:
>
> > However, it should also be noted that those same larger applications
>
> > are the ones that are likely to run into this problem eventually, so
>
> > perhaps forcing the migration is the best path moving forward.
>
>  
>
>  
>
> Existing models are the problem. Then again the database knows the truth. 
> So with a little inspection during apps.get_models we might be able to do 
> the right thing and even allow migrating in steps.
>
>  
>
> Apps is also the place to mark an app as migrated.
>
>  
>
> In fact - couldn't an AppConfig grow a method "get_autoid_type()" and 
> inject the right one?
>
>  
>
> You asked fr thoughts, so there's my 2c stream.
>
> -- 
>
> Melvyn Sopacua
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/e3effc41-10e1-42e2-9037-84c98217cd91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Default to BigAutoField

2017-08-17 Thread Andrew Godwin
To elaborate on the solution we eventually came up with - we default models
to use a new BigAutoField that migrations will pick up on and generate
migrations to alter columns to, but for safety reasons for those that don't
read release notes, made the migration autodetector ask you if you want to
make these migrations with a slowness warning.

It also tells you how to preserve the old behaviour and avoid new
migrations if you wish (manually set id = SmallAutoField)

I like this approach as it means no new settings or Meta options or
anything, has a reasonable upgrade path, and won't let people unwittingly
wander into giant changes. The downside is that it does add slightly more
friction to the upgrade process.

Andrew

On Thu, Aug 17, 2017 at 2:36 PM, Kenneth Reitz  wrote:

> I have opened a pull request:
>
> https://github.com/django/django/pull/8924
>
> Andrew and I came up with a good solution for migrations, together at
> DjangoCon.
>
> On Wednesday, June 14, 2017 at 7:36:36 AM UTC-7, Melvyn Sopacua wrote:
>>
>> On Friday 09 June 2017 15:59:50 Kenneth Reitz wrote:
>>
>> > However, it should also be noted that those same larger applications
>>
>> > are the ones that are likely to run into this problem eventually, so
>>
>> > perhaps forcing the migration is the best path moving forward.
>>
>>
>>
>>
>>
>> Existing models are the problem. Then again the database knows the truth.
>> So with a little inspection during apps.get_models we might be able to do
>> the right thing and even allow migrating in steps.
>>
>>
>>
>> Apps is also the place to mark an app as migrated.
>>
>>
>>
>> In fact - couldn't an AppConfig grow a method "get_autoid_type()" and
>> inject the right one?
>>
>>
>>
>> You asked fr thoughts, so there's my 2c stream.
>>
>> --
>>
>> Melvyn Sopacua
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/e3effc41-10e1-42e2-9037-
> 84c98217cd91%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFwN1uo4Y_pWSf3zAe_4R0GGkDqBv1YGus8Q%2BWPPZ%3DZ6FPwdYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Automatic prefetching in querysets

2017-08-17 Thread Anssi Kääriäinen
On Thursday, August 17, 2017 at 11:30:45 PM UTC+3, Aymeric Augustin wrote:
>
> Hello, 
>
> > On 17 Aug 2017, at 14:48, Luke Plant > 
> wrote: 
> > 
> > On 16/08/17 23:17, Aymeric Augustin wrote: 
> >> It should kick in only when no select_related or prefetch_related is in 
> effect, to avoid interfering with pre-existing optimizations. It's still 
> easy to construct an example where it would degrade performance but I don't 
> think such situations will be common in practice. 
> > 
> > I think Ansii's example is a non-contrived and potentially very common 
> one where we will make things worse if this is default behaviour, 
> especially for the non-expert programmer this behaviour is supposed to 
> help: 
> > 
> > 
> > if qs: 
> > qs[0].do_something() 
> > 
> > (where do_something() access relations, or multiple relations). We will 
> end up with *multiple* large unnecessary queries instead of just one. The 
> difference is that the first one is easy to explain, the remainder are much 
> harder, and will contribute even more to the "Django is slow/ORMs are bad" 
> feeling. 
>
>
> I wasn't convinced by this example because it's already sub-optimal 
> currently. `if qs` will fetch the whole queryset just to check if it has at 
> least one element. 
>
> Assuming there's 1000 elements in qs, if you don't care about a 1000x 
> inefficiency, I don't expect you to care much about two or three 1000x 
> inefficiencies...
>

I agree that this example isn't particularly worrying. It's something an 
experienced developer wouldn't do. On the other hand, we are aiming at 
making things simpler for non-experienced developers.

To me the worrying part here is that we really don't have any data or 
experience about if the cure will be worse than the disease. Likely not, 
but what do we gain by taking risks here?

Maybe we should just add the queryset method. This is the smallest atomic 
task that can be done. Even if there's only the queryset method available, 
it's possible to enable prefetches per model by using a Manager.

 - Anssi


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c9d9dca0-f71d-48dd-9985-17ef8b8ec95d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.