#36520: Performance Regression in parse_header_params
---------------------------------+------------------------------------
Reporter: David Smith | Owner: (none)
Type: Bug | Status: new
Component: HTTP handling | Version: dev
Severity: Release blocker | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+------------------------------------
Comment (by Natalia Bidart):
Replying to [comment:11 Jake Howard]:
> Encoding a string into bytes is incredibly fast. Perhaps doing the
short-circuit, converting the data to bytes and then parsing it with
`email.message` is fast enough? It's still going to be a performance
regression, but hopefully not by quite as much.
Below are benchmark results using my testing code, where `line =
line.encode("utf-8")` is called inside the benchmarked function, just
before invoking `Message.get_params()`. This approach could work: we see a
potential 2x performance improvement for very large headers (which are
often the source of security reports). However, the average case (having a
`charset` and potentially a `boundary`) appears to suffer a 3x slowdown,
and I don't think we could easily avoid that penalty.
||= Python 3.11 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.337 || 1.635 || 1.672 || 0.20 || 5.0x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.362 || 4.348 ||
4.514 || 0.30 || 3.3x slower ||
|| `application/x-stuff; ...` || 1.955 || 8.675 || 2.494 || 1.27 || 1.3x
slower ||
||= Python 3.12 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.356 || 1.657 || 1.725 || 0.21 || 4.8x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.407 || 4.582 ||
4.697 || 0.30 || 3.3x slower ||
|| `application/x-stuff; ...` || 2.017 || 9.609 || 2.645 || 1.31 || 1.3x
slower ||
||= Python 3.13 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.325 || 1.613 || 1.717 || 0.19 || 5.3x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.167 || 3.862 ||
3.943 || 0.30 || 3.4x slower ||
|| `application/x-stuff; ...` || 4.263 || 9.445 || 2.252 || 1.89 || 1.9x
faster ||
||= Python 3.14 =||= cgi =||= get_params(str) =||= get_params(bytes) =||=
ratio =||= get_params(bytes) is =||
|| `text/plain` || 0.258 || 1.601 || 1.725 || 0.16 || 6.7x slower ||
|| `text/html; charset=UTF-8; boundary=something` || 1.037 || 3.773 ||
3.870 || 0.27 || 3.7x slower ||
|| `application/x-stuff; ...` || 3.978 || 8.789 || 2.132 || 1.87 || 1.9x
faster ||
--
Ticket URL: <https://code.djangoproject.com/ticket/36520#comment:12>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/010701985724f1cc-62cb7f78-40a7-4c57-a10e-f92404dcd491-000000%40eu-central-1.amazonses.com.