Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Django Tue, 03 Jan 2023 08:47:55 -0800

#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+---------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  David Smith
         Type:  Bug            |                   Status:  assigned
    Component:  Utilities      |                  Version:  dev
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+---------------------------------------

Comment (by Matthias Kestenholz):

Hi all

lxml is quite a heavy dependency. It works **very** well but you'll wait
for the compilation a long time if you do not have wheels. (see
https://pypi.org/project/lxml/#files) I think Python packaging is almost a
non-issue these days except when it comes to transitive dependencies, and
I wouldn't want to be in charge of specifying and updating the supported
range of lxml versions. That being said, I encountered almost no breaking
changes in lxml since

[https://github.com/feincms/feincms/commit/0ec8e834dd2e0927bb23d46ee9102716c7735add
~2009], I use lxml in almost all projects and I can heartily recommend it
to anyone.

I'm sure that the regex-based solution has some problems; I'm sorry to
admit I haven't read the full thread but I just cannot imagine a situation
where using `|strip_tags` without `|safe` would lead to a security issue,
and why would you want to combine these? There's no point to mark a string
as safe after stripping all tags. So it's only about the fact that the
output sometimes isn't nice, something which may be fixed by converting as
many entities to their unicode equivalents as possible and only truncating
afterwards?

Last but not least: I haven't benchmarked it ever, but I have the
suspicion that running bleach or html-sanitizer during rendering may be
wasteful in terms of CPU cycles. I only ever use the sanitizer when
saving, never when rendering. `|strip_tags` is obviously applied when
rendering and performs well enough in many situations.

So, to me `strip_tags` is a clear case of
[https://hachyderm.io/@matthiask/109545192841761218 a simple
implementation with "worse is better" characteristics].

I truly hope this is helpful and not just a cold shower (sorry for using
"just" here)

Thanks,
Matthias

--
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:21>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/01070185788808ad-e23cf74c-66cb-4adc-8550-2a42ab0a39fd-000000%40eu-central-1.amazonses.com.

Re: [Django] #30686: Improve utils.text.Truncator &co to use a full HTML parser.

Reply via email to