Hi, I've gone through each of the use cases that I'm aware of and evaluated how well the previous proposal handled each.
While doing this I came to the conclusion that the "read_once" attribute was poorly named and as such semantically ambiguous since all iterator content should only be read once. I also realized the content freezing wasn't as important as the other proposed features. Most use cases that might use it could get the same result by freezing the right headers. My reworked proposal follows, along with some real-world use cases using the new API. Proposed API ============ Content Iterators & Streaming Responses --------------------------------------- HttpResponse will be changed such that: * Content iterators will never be read more than once. * A boolean attribute named "streaming" will be introduced. This indicates how iterator content should be handled. * An attribute named "content_iterator" will be introduced, and if the response content is provided as an iterator, the iterator will be available here. For responses with streaming set to True, we make a guarantee that the content is delivered to the client in chunks as they are produced by the iterator. Any middleware that modifies the response content must do so by replacing the content iterator with another iterator (usually by wrapping the content iterator with a generator function). Streaming does not preclude caching, but the caching middleware will need to be taught how to cache streaming responses. It will probably have to capture the content as it is emitted from the iterator and cache a new HttpResponse object with the same headers as the original response but with the content stored as a string and with streaming = False. Response handling in more detail: With streaming set to False (the default): * If response.content is accessed by middleware, the content iterator is evaluated and stored as a string on the response. Subsequent accesses to response.content return the stored string. This response is not streamed. * If response.content is not accessed by middleware, the response will be streamed by the HTTP handler (i.e. not converted to a string but sent in chunks to the client). This behavior is required to be backwards compatible with current Django behavior. With streaming set to True: * Middleware must check response.streaming and use response.content_iterator instead of accessing response.content directly. Accessing response.content causes an exception (or deprecation warning if we want a transition period). * Middleware that wants to alter the response content must do so by replacing response.content_iterator with a new iterator (usually by wrapping response.content_iterator with a generator function). These changes are enough by themselves to acceptably fix most problems with content iterators. Examples ~~~~~~~~ A view that simply returns a response with iterator content may be streamed, but streaming is not strictly required (e.g. if the content is accessed by middleware):: return HttpResponse(iterator) A view can specifically request streaming behavior like this:: response = HttpResponse(iterator) response.streaming = True return response Middleware must be changed to support streaming responses like this:: if response.streaming: response.content_iterator = process_iterator(response.content_iterator) else: response.content = process_string(response.content) Middleware that does not check response.streaming will cause an exception:: # Raises an exception with streaming responses. response['Content-Length'] = len(response.content) # Raises an exception with streaming responses. response.content = process_content(response.content) Header Freezing --------------- As mentioned above, introducing explicit streaming behavior fixes most problems with current handling of iterator content. Header freezing provides more fine-grained control for situations that require it. HttpResponse will gain two additional methods to support header freezing: HttpResponse.freeze_header(self, header) Causes a header to be frozen. Subsequent attempts to set or delete this header will cause an exception (although we may choose to emit a deprecation warning at first). HttpResponse.header_is_frozen(self, header) Returns True if the header is frozen, False otherwise. Header freezing is useful in two ways: * Views can have precise control over specific headers, overriding middleware. * Because the semantics of HTTP headers are well-defined, they are a reasonable proxy for controlling response handling. If I know what my ETag should be I can prevent middleware from recalculating it:: response = HttpResponse(content) response['ETag'] = etag response.freeze_header('ETag') Compression can be disabled by preventing the Content-Encoding header from being sent:: response = HttpResponse(content) # Prevent compression. response.freeze_header('Content-Encoding') Conditional GET can be disabled by preventing the ETag and Last-Modified headers from being sent:: response = HttpResponse(content) response.freeze_header('ETag') response.freeze_header('Last-Modified') Caching can be disabled by setting and freezing the Cache-Control header:: response = HttpResponse(content) response['Cache-Control'] = 'no-cache' response.freeze_header('Cache-Control') Content Freezing ---------------- Just like header freezing prevents headers from being changed by middleware, content freezing prevents the response content from being modified. To implement this, HttpResponse would need two new methods: HttpResponse.freeze_content(self) Causes the content to be frozen. Any attempt to change the response content after this is called will cause an exception (although we may choose to emit a deprecation warning at first). HttpResponse.content_is_frozen(self) Returns True if the response content has been frozen, False otherwise. Content freezing is the least compelling part of this proposal. Generally, you can prevent the response content from being changed in certain ways by freezing related headers (e.g. freezing Content-Encoding to prevent compression). However, there are some reasons we might want to provide content freezing in addition to header freezing. * Freezing the content prevents changes to content for which there would be no header changes, or the header changes would be difficult to predict. I'm struggling to come up with a good example of this, but perhaps a middleware that implements stream rechunking would be one. * It is arguably clearer and more readable to explicitly freeze content if that is your intention rather than freezing related headers such as Content-Length, Content-Type, Content-Encoding, etc. Use Cases ========= These are the use cases that I am aware of with sample view code using the new APIs. Non-Streaming Response With Iterator Content -------------------------------------------- Parameters: * Content is in iterator, but it is okay to convert it to a string. * Normal middleware processing should occur (compression, ETags, caching, etc.) This was brought up in a previous discussion. I don't see a good reason for the view to pass in iterator in this case except for convenience. It must be supported for backwards compatibility. Middleware is free to access response.content. The first time this happens, the iterator content will be captured as a string. View code:: return HttpResponse(iterator) Streaming Response With Iterator Content ---------------------------------------- Parameters: * Content is an iterator, and chunks must be sent to the client as they are emitted. * Normal middleware processing should occur (compression, ETags, etc.) View code:: response = HttpResponse(iterator) response.streaming = True return response If middleware wants to do anything to the content, it must do so by wrapping response.content_iterator with a generator function. Accessing response.content directly raises an exception because it would break streaming. Streaming A Large File ---------------------- Parameters: * Content is an iterator that must not be converted into a string because it may be quite large. * Server-side caching is undesirable due to the size of the content. Satisfying the second requirement without disabling external caching (client side and proxy caches) is difficult because all caching is controlled by HTTP headers. I don't think this is a new problem, though. There are two variations on this use case related to whether or not the response should be compressed. Without Compression ~~~~~~~~~~~~~~~~~~~ My particular use case is streaming large, compressed files from Rackspace Cloud Files. * Last-Modified and ETag are provided by the cloud storage service and should not be recalculated. * Content is already compressed, so additional compression would be a waste of CPU time. Note that conditional GET would be fine in this case but we disable caching to prevent the content from hitting the server-side cache, and clients will not cache the response either. If we had a way to prevent server-side caching without disabling client-side caching, that would be good. We'll send Last-Modified and ETag even though caching is disabled. They do no harm and if we can address the caching issue above, conditional GET will work with this view. View code:: # Assume content_type, content_length, etag, and last_modified come from the # storage service. response = HttpResponse(content = iterator, content_type = content_type) response.streaming = True response['Content-Length'] = content_length response['ETag'] = etag response['Last-Modified'] = http_date(last_modified) # Make sure our ETag and Last-Modified headers stay the same. response.freeze_header('ETag') response.freeze_header('Last-Modified') # Prevent compression. response.freeze_header('Content-Encoding') # Prevent caching. response['Cache-Control'] = 'no-cache' response.freeze_header('Cache-Control') return response With Compression ~~~~~~~~~~~~~~~~ I know that some people want to stream large files and have them compressed on-the-fly, so in that case: * If we know Last-Modified and ETag values, they can be provided. * Content-Length is unknown. * Content should be compressed. The same caveats as in the previous example apply to caching, Last-Modified, and ETag. View code:: # Assume content_type and last_modified come from the storage service. response = HttpResponse(content = iterator, content_type = content_type) response.streaming = True response['ETag'] = etag response['Last-Modified'] = http_date(last_modified) # Make sure our ETag and Last-Modified headers stay the same. response.freeze_header('ETag') response.freeze_header('Last-Modified') # Prevent caching. response['Cache-Control'] = 'no-cache' response.freeze_header('Cache-Control') return response Streaming Long Responses Without Timing Out ------------------------------------------- Parameters: * Content is an iterator that must not be converted into a string because that would cause the HTTP connection to timeout. * Content-Length is unknown. * Compression is undesirable since rechunking could result in longer delays between chunks, possibly leading to a timeout. This is the only use case I've seen that can really benefit from content freezing, just in case some middleware would wrap the content iterator in such a way as to reduce the frequency with which content chunks are sent. View code:: response = HttpResponse(content = iterator) response.streaming = True response.freeze_content() return response But in practice, the most likely source of content modification for a streaming response is middleware implementing compression, so this would probably work just as well. View code:: response = HttpResponse(content = iterator) response.streaming = True # Prevent compression. response.freeze_header('Content-Encoding') return response In either case, the response will be cached (although note that the cache middleware will need to be modified to support caching streaming responses as discussed above). Are there use cases I've missed? Thanks, Forest -- Forest Bond http://www.alittletooquiet.net http://www.pytagsfs.org
signature.asc
Description: Digital signature