#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
     Reporter:  David Smith          |                    Owner:  Natalia
                                     |  Bidart
         Type:  Bug                  |                   Status:  assigned
    Component:  HTTP handling        |                  Version:  dev
     Severity:  Release blocker      |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Comment (by Natalia Bidart):

 I started looking into this AGAIN to see how the branch using bytes would
 look like. In my head, "passing bytes" to `Message.get_params` would just
 be:

 {{{#!diff
 diff --git a/django/utils/http.py b/django/utils/http.py
 index 504f28c678..54dc85aead 100644
 --- a/django/utils/http.py
 +++ b/django/utils/http.py
 @@ -327,7 +327,7 @@ def parse_header_parameters(line,
 max_length=MAX_HEADER_LENGTH):
          raise ValueError("Unable to parse header parameters (value too
 long).")

      m = Message()
 -    m["content-type"] = line
 +    m["content-type"] = line.encode("utf-8")
      params = m.get_params()

      pdict = {}
 }}}

 But this change generates all sort of test failures: some are "expected"
 because we'd need to `decode` the results into what's actually expected as
 a `str` result, but other show jusy plain wrong results. For example, for
 a `rfc2231` like this:

 {{{
 Content-Type: application/x-stuff; title*=us-ascii'en-
 us'This%20is%20%2A%2A%2Afun%2A%2A%2A
 }}}

 Passing str to `Message.get_params` results in the correct:

 {{{
 [('Content-Type: application/x-stuff', ''),  ('title', ('us-ascii', 'en-
 us', 'This is ***fun***'))]
 }}}

 But passing bytes yields:

 {{{
 [('b"content-type: application/x-stuff; title*',  'us-ascii\'en-
 us\'This%20is%20%2A%2A%2Afun%2A%2A%2A"')]
 }}}

 I further debugged and ended up in a rabbit hole in the Python code, to
 realize that `get_params` is not, at least to my exhausted eyes, expecting
 bytes in any way! I posted an update in the Python Discourse, but I would
 truly appreciate your opinion on this since at this point my head is very
 confused.

 I dug into `Message.get_params()`, and from the call chain it seems clear
 that everything is built around `str` headers. For example,
 `_parseparam()` (the call chain is `get_params` -> `_get_params_preserve`
 -> `_parseparam`) starts with these two lines
 (https://github.com/python/cpython/blob/main/Lib/email/message.py#L73):

 {{{#!python
 def _parseparam(s):
     # RDM This might be a Header, so for now stringify it.
     s = ';' + str(s)
 }}}

 It then string-splits, string-strips, and string-compares all the way
 down. If I assign bytes to the header, the parsing doesn’t work as
 expected: you basically just get the `repr()` of the bytes object. So
 while I understand the idea of avoiding decoding overhead, I don't see a
 way to feed bytes into `Message` and get equivalent results. The API
 really assumes `str` throughout.

 === Side note===

 There is ALSO the question of whether the header name should or should not
 be passed to `parse_header_parameters`. On a second look in our code, I
 think that `line` is expected to have the header name in it. Tests are
 super confusing in this regard, see how `test_basic` does not pass it but
 the rest of test cases do!
 https://github.com/django/django/blob/main/tests/utils_tests/test_http.py
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:21>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/0107019910145582-f6c424f5-b796-4ba3-867e-301d5e7bef10-000000%40eu-central-1.amazonses.com.

Reply via email to