Re: Streaming HttpResponse revisted. Any core devs, please take a look if you can :)

Anssi Kääriäinen Mon, 20 Aug 2012 12:19:11 -0700


On 20 elo, 18:39, Tai Lee <real.hu...@mrmachine.net> wrote:
> I'd like to re-visit the discussion surrounding #7581 [1], a ticket about
> streaming responses that is getting quite long in the tooth now, which
> Jacob finally "accepted" 11 months ago (after a long time as DDN) and said
> that it is clear we have to do *something*, but *what* remains to be seen.
>
> I'd like to try provide a little refresher and summarise the options that
> have been suggested, and ask any core devs to please weigh in with their
> preference so that I can work up a new patch that will be more likely to
> gain acceptance.
>
> THE PROBLEM:
>
> 1. There are bugs and surprising behaviour that arise when creating an
> HttpResponse with a generator as its content, as a result of "quantum
> state" of `HttpResponse.content` (measuring it changes it).
>
>
>
> >>> from django.http import HttpResponse
> >>> r = HttpResponse(iter(['a', 'b']))
> >>> r.content
> 'ab'
> >>> r.content
> ''
> >>> r2 = HttpResponse(iter(['a', 'b']))
> >>> print r2.content
> ab
> >>> print r2.content
> >>> r3 = HttpResponse(iter(['a', 'b']))
> >>> r3.content == r3.content
>
> False
>
> 2. Some middleware prematurely consume generator content by accessing
> `HttpResponse.content`, which can use a lot of memory and cause browser
> timeouts when attempting to stream large amounts of data or
> slow-to-generate data.
>
> There have been several tickets [2] [3] [4] and django-developers
> discussions [5] [6] [7] [8] about these issues.
>
> SOME USE CASES FOR STREAMING RESPONSES:
>
> A. Generating and exporting CSV data directly from the database.
>
> B. Restricting file access to authenticated users for files that may be
> hosted on external servers.
>
> C. Drip-feeding chunks of content to prevent timeout when requesting a page
> that takes a long time to generate.
>
> OPTION 1:
>
> Remove support for "streaming" responses. If an iterator is passed in as
> content to `HttpResponse`, consume it in `HttpResponse.__init__()` to
> eliminate buggy behaviour. Middleware won't have to worry about what type
> of content a response has.
>
> Now that Jacob has accepted #7581 and said that it is clear we need to do
> *something*, I hope we can rule out this option.
>
> OPTION 2:
>
> Make `HttpResponse.__init__()` consume any iterator content, and add an
> `HttpResponseStreaming` class or an `HttpResponse(streaming=False)`
> argument. Allow middleware to check via `hasattr()` or `isinstance()`
> whether or not the response has generator content, and conditionally skip
> code that is incompatible with streaming responses.
>
> Some middleware will have to be updated for compatibility with streaming
> responses, and any 3rd party middleware that prematurely consumes generator
> content will continue to work, only without the bugs (and potentially with
> increased memory usage and browser timeouts).
>
> OPTION 3:
>
> Build a capabilities API for `HttpResponse` objects, and have middleware
> inspect responses to determine "can I read content?", "can I replace
> content?", "can I change etag?", etc. This will likely become a bigger and
> more complicated design decision as we work out what capabilities we want
> to support. Some have argued that it should be sufficient to know if we
> have generator content or not, for all the cases that people have reported
> so far.
>
> OPTION 4:
>
> Provide a way for developers to specify on an `HttpResponse` object or
> subclass that specific middleware should be skipped for that response. This
> would be problematic because 3rd party views won't know what other
> middleware is installed in a project in order to name them for exclusion.
>
> OPTION 5:
>
> Add Yet Another Setting that would allow developers to define
> `CONDITIONAL_MIDDLEWARE_CLASSES` at a project level. At the project level,
> developers would know which middleware classes they are using, and when
> they should be executed or skipped. This would give very fine grained
> control at a project level to match middleware conditionally with
> `HttpResponse` subclasses, without requiring any changes to existing or 3rd
> party middleware. This could look something like this:
>
> MIDDLEWARE_CLASSES = (
>     'django.middleware.common',
> )
>
> CONDITIONAL_MIDDLEWARE_CLASSES = {
>     'exclude': {
>         'django.http.HttpResponseStreaming': ['django.middleware.common',
> 'otherapp.othermiddleware', ...],
>     },
>     'include': {
>         'myapp.MyHttpResponse': ['myapp.mymiddleware', ...],
>     },
>
> }


My initial feeling is that anything accessing "large content" in
Python is broken and we should not try to make sure these things
continue to work. The problem is how we define if the content is too
large to access. Assuming any generator passed to the __init__ is
large maybe too drastic.

We should either remove the content attribute altogether from
generator based responses, or at least raise an error if the content
is accessed a second time. The current behaviour is definitely broken,
and we should not silently return corrupt data, instead we should
complain loudly.

To me option 5 feels like we are trying to work around the symptoms
instead of solving the underlying problem.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: Streaming HttpResponse revisted. Any core devs, please take a look if you can :)

Reply via email to