#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+---------------------------------------
Reporter: Thomas Hooper | Owner: David Smith
Type: Bug | Status: assigned
Component: Utilities | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+---------------------------------------
Changes (by Carlton Gibson):
* cc: Matthias Kestenholz (added)
Comment:
Adding some detail after the last post, since you're looking at it David.
There was a discussion (with various folks from html5lib, and Mozilla, and
...) about whether html5lib could be put on a better footing.
I'm not sure how that panned out in the medium term. (I didn't check what
the rhythm looks like now.)
There was alternate talk about whether bleach (or an alternate) could
build off `html5ever` which is the HTML parser from the Mozilla servo
project.
* https://github.com/servo/html5ever
* https://github.com/SimonSapin/html5ever-python (Py03 bindings.)
That would be pretty cool, but it was clearly a lot of work, and then 2020
happened, so...
The other candidate in this space in Matthias' html-sanitizer:
https://github.com/matthiask/html-sanitizer — which is built on `lxml`.
That's just to lay down the notes I had gathered. I'm not sure the way
forward, but hopefully it's helpful.
Very open to ideas though! Thanks for picking it up.
--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:19>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/01070185787661d1-c7e57374-c41b-4001-b6a7-181273e417c2-000000%40eu-central-1.amazonses.com.