Re: Streaming HttpResponse revisted. Any core devs, please take a look if you can :)

Anssi Kääriäinen Thu, 23 Aug 2012 05:52:04 -0700

On 23 elo, 12:52, Tai Lee <real.hu...@mrmachine.net> wrote:
> After discussion with akaarai and mYk on IRC, I have two updated proposals.
>
> OPTION 6:
>
> 1. Consume `HttpResponse.content` on first access, so that it can be read
> multiple times and eliminate the current buggy behaviour (e.g.
> `response.content != response.content`) when a generator is passed in as
> content.
>
> 2. Create a new `HttpStreamingResponse` class, which has no `.content`
> attribute but instead has `.stream` attribute, and can be iterated only
> once. If it is iterated a second time, an exception will be raised to
> trigger a loud failure.


<SNIP>

Consuming the generator on first access and caching the result will
allow streaming to work if nobody accesses the .content. If a
middleware accesses the content, processes it somehow (gzip), and then
assigns back to content we will still not consume any more memory than
currently. The content property will first generate a byte string,
then return it. If we cache it in the response, the only problem is
that the cached value might be garbage collected just a little later,
but it seems we still will use the same amount of maximum memory.

In short: I think this is a good idea.

My current feeling about the options is that this is getting
complex... So, as-simple-as-can-be solution:
  - Add a flag to the response which tells if the response is
streaming or not. You can set this in response's.__init__. We might be
able to set this to True by default if the content is generator based
(can we autodetect this reliably?).
  - Cache the .content on first access if the content is generator
based. No matter if the response is flagged as streaming or not.
  - Update middlewares that need to skip streaming responses to check
for the flag.

This solves the immediate issue. This should be 100% backwards
compatible: If somebody accesses content when they should not, things
will just work. The content will get cached, but this should not
matter - we are already generating the content, and thus waste the
memory. We could add a warning when streaming response's content is
accessed. We might even want to remove the .content totally for
streams after deprecation.

The only backwards compatibility problem I can see is if we update
some middlewares to skip stream based responses, and also set
is_streaming to True in HttpResponse.__init__ if a generator is passed
in. Now, HttpResponse(['1', '2']) will not be processed by the altered
middlewares. Do we care?

If content is accessed, then streaming will not be effective, but this
is not a regression. If nobody touches the content, then streaming
will work.

As for removing the .content altogether: it seems this
requires .clone() API for response so that middleware can safely copy
response subclasses.

We might need to solve the skip/include middlewares conditionally
issue separately. For example, if I am streaming a large zipped file,
I will not want to apply my GZipMiddleware even if it handles stream
based responses correctly.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: Streaming HttpResponse revisted. Any core devs, please take a look if you can :)

Reply via email to