[RFC] HttpResponse: streaming, freeze API (v2)

Forest Bond Wed, 15 Sep 2010 11:04:06 -0700

Hi,

I've gone through each of the use cases that I'm aware of and evaluated how well
the previous proposal handled each.


While doing this I came to the conclusion that the "read_once" attribute was
poorly named and as such semantically ambiguous since all iterator content
should only be read once.

I also realized the content freezing wasn't as important as the other proposed
features.  Most use cases that might use it could get the same result by
freezing the right headers.

My reworked proposal follows, along with some real-world use cases using the new
API.


Proposed API
============


Content Iterators & Streaming Responses
---------------------------------------

HttpResponse will be changed such that:

* Content iterators will never be read more than once.
* A boolean attribute named "streaming" will be introduced.  This indicates
  how iterator content should be handled.
* An attribute named "content_iterator" will be introduced, and if the response
  content is provided as an iterator, the iterator will be available here.

For responses with streaming set to True, we make a guarantee that the content
is delivered to the client in chunks as they are produced by the iterator.  Any
middleware that modifies the response content must do so by replacing the
content iterator with another iterator (usually by wrapping the content iterator
with a generator function).

Streaming does not preclude caching, but the caching middleware will need to be
taught how to cache streaming responses.  It will probably have to capture the
content as it is emitted from the iterator and cache a new HttpResponse object
with the same headers as the original response but with the content stored as a
string and with streaming = False.

Response handling in more detail:

With streaming set to False (the default):

* If response.content is accessed by middleware, the content iterator is
  evaluated and stored as a string on the response.  Subsequent accesses to
  response.content return the stored string.  This response is not streamed.
* If response.content is not accessed by middleware, the response will be
  streamed by the HTTP handler (i.e. not converted to a string but sent in
  chunks to the client).  This behavior is required to be backwards compatible
  with current Django behavior.

With streaming set to True:

* Middleware must check response.streaming and use response.content_iterator
  instead of accessing response.content directly.  Accessing response.content
  causes an exception (or deprecation warning if we want a transition period).
* Middleware that wants to alter the response content must do so by replacing
  response.content_iterator with a new iterator (usually by wrapping
  response.content_iterator with a generator function).

These changes are enough by themselves to acceptably fix most problems with
content iterators.


Examples
~~~~~~~~

A view that simply returns a response with iterator content may be streamed, but
streaming is not strictly required (e.g. if the content is accessed by
middleware)::

  return HttpResponse(iterator)

A view can specifically request streaming behavior like this::

  response = HttpResponse(iterator)
  response.streaming = True
  return response

Middleware must be changed to support streaming responses like this::

  if response.streaming:
      response.content_iterator = process_iterator(response.content_iterator)
  else:
      response.content = process_string(response.content)

Middleware that does not check response.streaming will cause an exception::

  # Raises an exception with streaming responses.
  response['Content-Length'] = len(response.content)

  # Raises an exception with streaming responses.
  response.content = process_content(response.content)


Header Freezing
---------------

As mentioned above, introducing explicit streaming behavior fixes most problems
with current handling of iterator content.  Header freezing provides more
fine-grained control for situations that require it.

HttpResponse will gain two additional methods to support header freezing:

HttpResponse.freeze_header(self, header)
  Causes a header to be frozen.  Subsequent attempts to set or delete this
  header will cause an exception (although we may choose to emit a deprecation
  warning at first).
HttpResponse.header_is_frozen(self, header)
  Returns True if the header is frozen, False otherwise.

Header freezing is useful in two ways:

* Views can have precise control over specific headers, overriding middleware.
* Because the semantics of HTTP headers are well-defined, they are a reasonable
  proxy for controlling response handling.

If I know what my ETag should be I can prevent middleware from recalculating
it::

  response = HttpResponse(content)
  response['ETag'] = etag
  response.freeze_header('ETag')

Compression can be disabled by preventing the Content-Encoding header from being
sent::

  response = HttpResponse(content)
  # Prevent compression.
  response.freeze_header('Content-Encoding')

Conditional GET can be disabled by preventing the ETag and Last-Modified headers
from being sent::

  response = HttpResponse(content)
  response.freeze_header('ETag')
  response.freeze_header('Last-Modified')

Caching can be disabled by setting and freezing the Cache-Control header::

  response = HttpResponse(content)
  response['Cache-Control'] = 'no-cache'
  response.freeze_header('Cache-Control')


Content Freezing
----------------

Just like header freezing prevents headers from being changed by middleware,
content freezing prevents the response content from being modified.

To implement this, HttpResponse would need two new methods:

HttpResponse.freeze_content(self)
  Causes the content to be frozen.  Any attempt to change the response content
  after this is called will cause an exception (although we may choose to emit a
  deprecation warning at first).
HttpResponse.content_is_frozen(self)
  Returns True if the response content has been frozen, False otherwise.

Content freezing is the least compelling part of this proposal.  Generally, you
can prevent the response content from being changed in certain ways by freezing
related headers (e.g. freezing Content-Encoding to prevent compression).

However, there are some reasons we might want to provide content freezing in
addition to header freezing.

* Freezing the content prevents changes to content for which there would be no
  header changes, or the header changes would be difficult to predict.  I'm
  struggling to come up with a good example of this, but perhaps a middleware
  that implements stream rechunking would be one.
* It is arguably clearer and more readable to explicitly freeze content if that
  is your intention rather than freezing related headers such as Content-Length,
  Content-Type, Content-Encoding, etc.


Use Cases
=========

These are the use cases that I am aware of with sample view code using the new
APIs.


Non-Streaming Response With Iterator Content
--------------------------------------------

Parameters:

* Content is in iterator, but it is okay to convert it to a string.
* Normal middleware processing should occur (compression, ETags, caching, etc.)

This was brought up in a previous discussion.  I don't see a good reason for the
view to pass in iterator in this case except for convenience.  It must be
supported for backwards compatibility.

Middleware is free to access response.content.  The first time this happens, the
iterator content will be captured as a string.

View code::

  return HttpResponse(iterator)


Streaming Response With Iterator Content
----------------------------------------

Parameters:

* Content is an iterator, and chunks must be sent to the client as they are
  emitted.
* Normal middleware processing should occur (compression, ETags, etc.)

View code::

  response = HttpResponse(iterator)
  response.streaming = True
  return response

If middleware wants to do anything to the content, it must do so by wrapping
response.content_iterator with a generator function.  Accessing response.content
directly raises an exception because it would break streaming.


Streaming A Large File
----------------------

Parameters:

* Content is an iterator that must not be converted into a string because it may
  be quite large.
* Server-side caching is undesirable due to the size of the content.

Satisfying the second requirement without disabling external caching (client
side and proxy caches) is difficult because all caching is controlled by HTTP
headers.  I don't think this is a new problem, though.

There are two variations on this use case related to whether or not the response
should be compressed.


Without Compression
~~~~~~~~~~~~~~~~~~~

My particular use case is streaming large, compressed files from Rackspace Cloud
Files.

* Last-Modified and ETag are provided by the cloud storage service and should
  not be recalculated.
* Content is already compressed, so additional compression would be a waste of
  CPU time.

Note that conditional GET would be fine in this case but we disable caching to
prevent the content from hitting the server-side cache, and clients will not
cache the response either.  If we had a way to prevent server-side caching
without disabling client-side caching, that would be good.

We'll send Last-Modified and ETag even though caching is disabled.  They do no
harm and if we can address the caching issue above, conditional GET will work
with this view.

View code::

  # Assume content_type, content_length, etag, and last_modified come from the
  # storage service.
  response = HttpResponse(content = iterator, content_type = content_type)
  response.streaming = True
  response['Content-Length'] = content_length
  response['ETag'] = etag
  response['Last-Modified'] = http_date(last_modified)

  # Make sure our ETag and Last-Modified headers stay the same.
  response.freeze_header('ETag')
  response.freeze_header('Last-Modified')

  # Prevent compression.
  response.freeze_header('Content-Encoding')

  # Prevent caching.
  response['Cache-Control'] = 'no-cache'
  response.freeze_header('Cache-Control')

  return response


With Compression
~~~~~~~~~~~~~~~~

I know that some people want to stream large files and have them compressed
on-the-fly, so in that case:

* If we know Last-Modified and ETag values, they can be provided.
* Content-Length is unknown.
* Content should be compressed.

The same caveats as in the previous example apply to caching, Last-Modified, and
ETag.

View code::

  # Assume content_type and last_modified come from the storage service.
  response = HttpResponse(content = iterator, content_type = content_type)
  response.streaming = True
  response['ETag'] = etag
  response['Last-Modified'] = http_date(last_modified)

  # Make sure our ETag and Last-Modified headers stay the same.
  response.freeze_header('ETag')
  response.freeze_header('Last-Modified')

  # Prevent caching.
  response['Cache-Control'] = 'no-cache'
  response.freeze_header('Cache-Control')

  return response


Streaming Long Responses Without Timing Out
-------------------------------------------

Parameters:

* Content is an iterator that must not be converted into a string because that
  would cause the HTTP connection to timeout.
* Content-Length is unknown.
* Compression is undesirable since rechunking could result in longer delays
  between chunks, possibly leading to a timeout.

This is the only use case I've seen that can really benefit from content
freezing, just in case some middleware would wrap the content iterator in such a
way as to reduce the frequency with which content chunks are sent.

View code::

  response = HttpResponse(content = iterator)
  response.streaming = True
  response.freeze_content()
  return response

But in practice, the most likely source of content modification for a streaming
response is middleware implementing compression, so this would probably work
just as well.

View code::

  response = HttpResponse(content = iterator)
  response.streaming = True

  # Prevent compression.
  response.freeze_header('Content-Encoding')

  return response

In either case, the response will be cached (although note that the cache
middleware will need to be modified to support caching streaming responses as
discussed above).


Are there use cases I've missed?

Thanks,
Forest
-- 
Forest Bond
http://www.alittletooquiet.net
http://www.pytagsfs.org

signature.asc
Description: Digital signature

[RFC] HttpResponse: streaming, freeze API (v2)

Reply via email to