#36520: Performance Regression in parse_header_params
-------------------------------------+-------------------------------------
Reporter: David Smith | Owner: Natalia
| Bidart
Type: Bug | Status: assigned
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):
Going with David's short-circuit suggestion in comment:2 seems wise to me.
I don't think it's worth reverting and waiting to hear from upstream,
since if I were CPython I don't think I'd take any action:
- the performance impact although measurable is extremely minor
- it's inside a
[https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.get_param
"legacy method"] for the "legacy" `Message` class
- the burden of optimizing for the "right thing" in David's words is
probably on us (Django)
Re: "legacy", porting to `EmailMessage` performs 10x worse. For posterity
I had something like this, but with test failures:
{{{#!diff
diff --git a/django/utils/http.py b/django/utils/http.py
index 504f28c678..8a05780160 100644
--- a/django/utils/http.py
+++ b/django/utils/http.py
@@ -3,7 +3,7 @@ import re
import unicodedata
from binascii import Error as BinasciiError
from datetime import UTC, datetime
-from email.message import Message
+from email.message import EmailMessage
from email.utils import collapse_rfc2231_value, formatdate
from urllib.parse import quote
from urllib.parse import urlencode as original_urlencode
@@ -326,13 +326,13 @@ def parse_header_parameters(line,
max_length=MAX_HEADER_LENGTH):
if max_length is not None and line and len(line) > max_length:
raise ValueError("Unable to parse header parameters (value too
long).")
- m = Message()
+ m = EmailMessage()
m["content-type"] = line
- params = m.get_params()
+ params = m["content-type"].params
pdict = {}
- key = params.pop(0)[0].lower()
- for name, value in params:
+ key = m["content-type"].split(";")[0]
+ for name, value in params.items():
if not name:
continue
if isinstance(value, tuple):
}}}
----
I checked DRF, and DRF calls Django's `parse_header_parameters` in
[https://github.com/encode/django-rest-
framework/blob/14728485018098a477e302afebc6d61547a0789f/rest_framework/negotiation.py#L51
a nested loop], about 17 times for the view I was testing. The values it
tested were:
{{{
application/json
text/html; q=1.0
*/*
application/json
text/html; q=1.0
text/html
text/html; q=1.0
text/html; q=1.0
text/html
application/json
application/json
application/json
application/json
text/html; q=1.0
text/html; q=1.0
application/json
multipart/form-data
}}}
Simplifying the bench (to remove the red herring bytes stuff and add in
David's suggestion from comment:2):
||= Python 3.13 =||= `cgi` =||= `requests` =||=
`werkzeug` =||= `get_params(str)` =||= `get_params#comment:2` =||
|| ` application/json ` || 0.358 || 0.145 🚀 -2.5x ||
0.806 🐢 2.3x || 1.758 🐢 4.9x || 0.089 🚀 -4.0x ||
|| ` text/html; q=1.0 ` || 0.887 || 0.375 🚀 -2.4x ||
1.210 🐢 1.4x || 2.954 🐢 3.3x || 3.497 🐢 3.9x ||
So, with comment:2, all but a handful of these calls will be faster. I
think we could call this a wash at that point.
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:22>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/010701991130fd56-64ce8afc-891b-42a0-8539-ef5deddb10f0-000000%40eu-central-1.amazonses.com.