#30686: Improve utils.text.Truncator &co to use a full HTML parser.
-------------------------------+---------------------------------------
     Reporter:  Thomas Hooper  |                    Owner:  David Smith
         Type:  Bug            |                   Status:  assigned
    Component:  Utilities      |                  Version:  dev
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+---------------------------------------

Comment (by Matthias Kestenholz):

 Hi all

 lxml is quite a heavy dependency. It works **very** well but you'll wait
 for the compilation a long time if you do not have wheels. (see
 https://pypi.org/project/lxml/#files) I think Python packaging is almost a
 non-issue these days except when it comes to transitive dependencies, and
 I wouldn't want to be in charge of specifying and updating the supported
 range of lxml versions. That being said, I encountered almost no breaking
 changes in lxml since
 
[https://github.com/feincms/feincms/commit/0ec8e834dd2e0927bb23d46ee9102716c7735add
 ~2009], I use lxml in almost all projects and I can heartily recommend it
 to anyone.

 I'm sure that the regex-based solution has some problems; I'm sorry to
 admit I haven't read the full thread but I just cannot imagine a situation
 where using `|strip_tags` without `|safe` would lead to a security issue,
 and why would you want to combine these? There's no point to mark a string
 as safe after stripping all tags. So it's only about the fact that the
 output sometimes isn't nice, something which may be fixed by converting as
 many entities to their unicode equivalents as possible and only truncating
 afterwards?

 Last but not least: I haven't benchmarked it ever, but I have the
 suspicion that running bleach or html-sanitizer during rendering may be
 wasteful in terms of CPU cycles. I only ever use the sanitizer when
 saving, never when rendering. `|strip_tags` is obviously applied when
 rendering and performs well enough in many situations.

 So, to me `strip_tags` is a clear case of
 [https://hachyderm.io/@matthiask/109545192841761218 a simple
 implementation with "worse is better" characteristics].

 I truly hope this is helpful and not just a cold shower (sorry for using
 "just" here)

 Thanks,
 Matthias

-- 
Ticket URL: <https://code.djangoproject.com/ticket/30686#comment:21>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070185788808ad-e23cf74c-66cb-4adc-8550-2a42ab0a39fd-000000%40eu-central-1.amazonses.com.

Reply via email to