#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
     Reporter:  David Smith          |                    Owner:  Natalia
                                     |  Bidart
         Type:  Bug                  |                   Status:  assigned
    Component:  HTTP handling        |                  Version:  dev
     Severity:  Release blocker      |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):

 Going with David's short-circuit suggestion in comment:2 seems wise to me.
 I don't think it's worth reverting and waiting to hear from upstream,
 since if I were CPython I don't think I'd take any action:
 - the performance impact although measurable is extremely minor
 - it's inside a
 
[https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.get_param
 "legacy method"] for the "legacy" `Message` class
 - the burden of optimizing for the "right thing" in David's words is
 probably on us (Django)

 Re: "legacy", porting to `EmailMessage` performs 10x worse. For posterity
 I had something like this, but with test failures:
 {{{#!diff
 diff --git a/django/utils/http.py b/django/utils/http.py
 index 504f28c678..8a05780160 100644
 --- a/django/utils/http.py
 +++ b/django/utils/http.py
 @@ -3,7 +3,7 @@ import re
  import unicodedata
  from binascii import Error as BinasciiError
  from datetime import UTC, datetime
 -from email.message import Message
 +from email.message import EmailMessage
  from email.utils import collapse_rfc2231_value, formatdate
  from urllib.parse import quote
  from urllib.parse import urlencode as original_urlencode
 @@ -326,13 +326,13 @@ def parse_header_parameters(line,
 max_length=MAX_HEADER_LENGTH):
      if max_length is not None and line and len(line) > max_length:
          raise ValueError("Unable to parse header parameters (value too
 long).")

 -    m = Message()
 +    m = EmailMessage()
      m["content-type"] = line
 -    params = m.get_params()
 +    params = m["content-type"].params

      pdict = {}
 -    key = params.pop(0)[0].lower()
 -    for name, value in params:
 +    key = m["content-type"].split(";")[0]
 +    for name, value in params.items():
          if not name:
              continue
          if isinstance(value, tuple):
 }}}

 ----
 I checked DRF, and DRF calls Django's `parse_header_parameters` in
 [https://github.com/encode/django-rest-
 
framework/blob/14728485018098a477e302afebc6d61547a0789f/rest_framework/negotiation.py#L51
 a nested loop], about 17 times for the view I was testing. The values it
 tested were:

 {{{
 application/json
 text/html; q=1.0
 */*
 application/json
 text/html; q=1.0
 text/html
 text/html; q=1.0
 text/html; q=1.0
 text/html
 application/json
 application/json
 application/json
 application/json
 text/html; q=1.0
 text/html; q=1.0
 application/json
 multipart/form-data
 }}}

 Simplifying the bench (to remove the red herring bytes stuff and add in
 David's suggestion from comment:2):
 ||=          Python 3.13           =||=  `cgi`  =||=  `requests`  =||=
 `werkzeug`  =||= `get_params(str)` =||= `get_params#comment:2` =||
 || `       application/json       ` ||   0.358   || 0.145 🚀 -2.5x ||
 0.806 🐢 2.3x  ||    1.758 🐢 4.9x    ||    0.089 🚀 -4.0x     ||
 || `       text/html; q=1.0       ` ||   0.887   || 0.375 🚀 -2.4x ||
 1.210 🐢 1.4x  ||    2.954 🐢 3.3x    ||     3.497 🐢 3.9x     ||


 So, with comment:2, all but a handful of these calls will be faster. I
 think we could call this a wash at that point.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:22>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/010701991130fd56-64ce8afc-891b-42a0-8539-ef5deddb10f0-000000%40eu-central-1.amazonses.com.

Reply via email to