Re: Django Async DEP

2019-06-08 Thread Pascal Chambon
Hello,

There is something a little scary for me, in changing all the core of
Django to async, when this really helps only, imho, a tiny fraction of
users : websocket/long polling services, and reddit-like sites with
thousands+ hits per second. For most webpages and webservices, async
artillery sounds quite overkill.

Are cpython threads inefficient ? As far as I know they are only kernel
threads constrained by the Gil, so they shouldnt wake up when they are
blocked on io syscalls/mutexes (or do they?), and context switches remain
acceptable compared to the slowness of python itself.

We used to provide provisioning and automatic authentication for 20 million
users, with partner webservices tar-pitting us for sometimes 1mn. The
nightmare scenario. But with 2 machines, 1 process by core, and 800 threads
by process, it did the job, enough for us to answer millions of hits a day.
Without even relying on other no-recoding optimizations like pypy or gevent.

Async would certainly have been a relevant techno if we had known in
advance that our partners would be so slow, but avoiding the extra
complexity burden of this style (where a single buggy dependency can block
all requests in a process, where all modules have to be recoded for it) was
also a huge benefit. And the limited thread pool also protected our DB from
unbearable loads.

It's very nice if a proper async ecosystem emerges in python, but I fear
lots of people are currently jumping into it without a need for such
performance, and at the expense of lots of much more important matters like
robust ess, correctness, compatibility... like it happened for docker and
microservices, transforming into fragile bloatwares simple intranets, which
just needed a single django codebase deployed in a single container.

A few days ago I audited a well used django module, the current user was
stored in a global variable (!!!). People might eventually fix that ticket,
use threadlocals, and then switch to a future django-async without
realizing that the security issue has come back due to the way async works.

Still I hope I'm wrong, that the performance gains will prove worth the
software fragmentation and complexity brought by asyncio, but I still dont
understand them for 99% users... Especially as long as key-in-hand
solutions like greenlets exist for power users.

Regards,
Pascal





Le ven. 7 juin 2019 à 19:41, Andrew Godwin  a écrit :

>
>
> On Fri, Jun 7, 2019 at 9:19 AM John Obelenus  wrote:
>
>> I wonder about the end-result payoff of this approach. In general,
>> Django/Python code is not going to be I/O bound, which is where
>> asynchronous approaches are going to get the bang for your buck. Even when
>> it comes to DB access—the DB is a lot faster than the python and django
>> code running against the result set. And too much context-switching (as you
>> noted) has painful ramifications for performance.
>>
>
> To the contrary, I have found that as you scale up, a large amount of your
> time becomes I/O (either HTTP calls to other components/hosted serviecs or
> database calls). Our APM at work shows me that it's around 80% of request
> time.
>
> Obviously we don't design Django just for large use cases, which is why
> it's not going to be the default, but with the massive growth of hosted
> services, I suspect this trend will continue to trickle down to smaller
> deploys too. And ultimately, for smaller deploys performance is rarely a
> concern anyway.
>
>
>>
>> I can absolutely see why creating a layer that handles asgi, websockets,
>> and http requests asynchronously is going to pay off. Bit time. But I'm
>> less certain that the ORM access will benefit from an asyncio approach. Do
>> we have anything that approaches a hard number that would tell us re-doing
>> the ORM layer in asyncio would get us X% performance benefit?
>>
>> I'm basing my thoughts off this well-reasoned look at performance:
>> https://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
>>
>>
> I do not personally have hard numbers that I am allowed to share,
> unfortunately, but I would encourage you to look at results on benchmarks
> that include database access - like this one (
> https://twitter.com/_tomchristie/status/1005001902092967936) using Python
> asyncio/ASGI - and see that it does make a difference. Obviously it doesn't
> matter for all deploys, but I believe it matters for the majority of site
> architectures as they scale up.
>
> Andrew
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/django-developers/5CVsR9FSqmg/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django

Re: Django Async DEP

2019-06-08 Thread Andrew Godwin
On Sat, Jun 8, 2019 at 3:14 AM Pascal Chambon 
wrote:

> Hello,
>
> There is something a little scary for me, in changing all the core of
> Django to async, when this really helps only, imho, a tiny fraction of
> users : websocket/long polling services, and reddit-like sites with
> thousands+ hits per second. For most webpages and webservices, async
> artillery sounds quite overkill.
>
> Are cpython threads inefficient ? As far as I know they are only kernel
> threads constrained by the Gil, so they shouldnt wake up when they are
> blocked on io syscalls/mutexes (or do they?), and context switches remain
> acceptable compared to the slowness of python itself.
>

It's fine when you only at 5/10 threads - which, notably, is what most WSGI
servers run at. When you get to the hundreds, though, you start losing a
large proportion of your execution time (tens of percent, in some cases).


>
> We used to provide provisioning and automatic authentication for 20
> million users, with partner webservices tar-pitting us for sometimes 1mn.
> The nightmare scenario. But with 2 machines, 1 process by core, and 800
> threads by process, it did the job, enough for us to answer millions of
> hits a day. Without even relying on other no-recoding optimizations like
> pypy or gevent.
>
> Async would certainly have been a relevant techno if we had known in
> advance that our partners would be so slow, but avoiding the extra
> complexity burden of this style (where a single buggy dependency can block
> all requests in a process, where all modules have to be recoded for it) was
> also a huge benefit. And the limited thread pool also protected our DB from
> unbearable loads.
>

Please remember that even after this change, Django will still expect you
to write synchronously by default, and not impose any of that extra
complexity on you. We will only swap out the "native" implementation of
things if the performance matches (within ~10%) or exceeds that of the
synchronous version when there's a couple of threads going; it's expected
this will largely be the case due to the direct benefits of idling less.

But - the plan is not to make it more complex by default (you only have to
interact with the async if you want to) or slower.


>
> It's very nice if a proper async ecosystem emerges in python, but I fear
> lots of people are currently jumping into it without a need for such
> performance, and at the expense of lots of much more important matters like
> robust ess, correctness, compatibility... like it happened for docker and
> microservices, transforming into fragile bloatwares simple intranets, which
> just needed a single django codebase deployed in a single container.
>
> A few days ago I audited a well used django module, the current user was
> stored in a global variable (!!!). People might eventually fix that ticket,
> use threadlocals, and then switch to a future django-async without
> realizing that the security issue has come back due to the way async works.
>
> Still I hope I'm wrong, that the performance gains will prove worth the
> software fragmentation and complexity brought by asyncio, but I still dont
> understand them for 99% users... Especially as long as key-in-hand
> solutions like greenlets exist for power users.
>
>
I agree with you that there's a chance this is all useless and doesn't bear
fruit, in which case I will be the first person to pull the plug and say
that Python async isn't ready. However, I've been working with it for the
last four years, including on several very large deployments, and there are
some direct benefits that I believe we can get without making things a lot
more complex, even inside Django.

Andrew

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFwN1upp%3D0L9diMZr5qTxLeri329PuWWdMeN8uxyfmTt2jZM1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.