I agree with Mariusz on the ticket/PR that my answer alone isn't enough
impetus to make this change. Hopefully someone more involved in i18n can
weigh in.

Although it changes the order of operations, I think this still works to
achieve the same behaviour. This snippet can be run at the end of a page to
wrap window.URLify.

(function () {
    const originalURLify = window.URLify;

    function URLify(s, num_chars, allowUnicode) {
        let result = originalURLify(s, num_chars, allowUnicode);

        const hadUnicodeChars = /[^\u0000-\u007f]/.test(s);
        // Remove English words only if the string contains ASCII (English)
        // characters.
        if (!hasUnicodeChars) {
            const removeList = [
                "a", "an", "as", "at", "before", "but", "by", "for", "from",
                "is", "in", "into", "like", "of", "off", "on", "onto",
"per",
                "since", "than", "the", "this", "that", "to", "up", "via",
                "with"
            ];
            const r = new RegExp('\\b(' + removeList.join('|') + ')\\b',
'gi');
            result = result.replace(r, '');
        }
        return result;
    };

    window.URLify = newURlify;
})();

On Thu, 23 Apr 2020 at 21:21, Andy Chosak <cho...@gmail.com> wrote:

> Thanks, Adam, for your reply. I've opened a ticket at
> https://code.djangoproject.com/ticket/31511, which includes a link to a
> PR that makes this change.
>
> Any advice on documenting how to wrap window.URLify?
>
> Thanks,
> Andy
>
> On Thursday, April 9, 2020 at 1:41:30 PM UTC-4, Adam Johnson wrote:
>>
>> I for one am quite surprised to learn the admin has this behaviour.
>>
>> I'm extra surprised it assumes it's in English if only ASCII letters are
>> used. This is quite a naïve assumption 😂 (See what I did in that sentence?)
>>
>> Was removal of these words introduced for SEO reasons?
>>>
>>
>> Seems likely.
>>
>> Personally, for the reasons you've presented I think it would make sense
>> to remove this behaviour. We can probably document how to wrap
>> window.URLify to preserve the old behaviour.
>>
>> On Wed, 8 Apr 2020 at 20:38, Andy Chosak <cho...@gmail.com> wrote:
>>
>>> Automatic slug generation in ModelAdmin via prepopulated_fields
>>> <https://docs.djangoproject.com/en/3.0/ref/contrib/admin/#django.contrib.admin.ModelAdmin.prepopulated_fields>
>>> uses a urlify.js
>>> <https://github.com/django/django/blob/b2bd08bb7a912a1504f5fb5018f5317e6b5423cd/django/contrib/admin/static/admin/js/urlify.js>
>>> file which, among other behaviors, removes certain stop words
>>> <https://github.com/django/django/blob/b2bd08bb7a912a1504f5fb5018f5317e6b5423cd/django/contrib/admin/static/admin/js/urlify.js#L168-L176>
>>> from the slug. For example, a string like "To be or not to be, that is the
>>> question" will generate a slug "be-or-not-be-question", not
>>> "to-be-or-not-to-be-that-is-the-question" as one might expect. I’d like to
>>> solicit feedback on the idea of removing this logic so that slugs can
>>> contain these words.
>>>
>>> For reference, the current list is: a, an, as, at, before, but, by, for,
>>> from, is, in, into, like, of, off, on, onto, per, since, than, the, this,
>>> that, to, up, via, with.
>>>
>>> Django ticket #30538 <https://code.djangoproject.com/ticket/30538>
>>> mentions this behavior as part of a more general comparison between
>>> urlify.js and Python slugify
>>> <https://github.com/django/django/blob/b2bd08bb7a912a1504f5fb5018f5317e6b5423cd/django/utils/text.py#L394>.
>>> It was closed as wontfix due to reasons of backwards compatibility. Per the 
>>> triaging
>>> guidelines
>>> <https://docs.djangoproject.com/en/3.0/internals/contributing/triaging-tickets/#closing-tickets>,
>>> I’m making this post to solicit feedback on the more specific question of
>>> addressing stopword removal in the JS code only -- not to try to address
>>> any other differences in behavior between these two methods. There’s been
>>> quite a bit of discussion on generating slugs for non-English languages
>>> (for example #2282 <https://code.djangoproject.com/ticket/2282>), and
>>> this post is not intended to reopen that discussion.
>>>
>>> The current list of stopwords being removed seems to have been the same 
>>> since
>>> at least 2005
>>> <https://github.com/django/django/blob/dd5320d1d56ca7603747dd68871e72eee99d9e67/media/js/urlify.js>
>>> (the earliest code I can find including this logic). Some of these words
>>> feel a little unexpected, for example “before” and “since”. After 15 years
>>> it seems reasonable to revisit the list and consider whether it still makes
>>> sense.
>>>
>>> Was removal of these words introduced for SEO reasons? If so, is this
>>> still a recommended default behavior? In 2020, search engines like Google
>>> seem smart enough to interpret them properly. Here's
>>> <https://cseo.com/blog/google-stop-words/> an arbitrary page that
>>> discusses this and includes a much longer list of what might be considered
>>> stopwords. As another datapoint, the popular WordPress Yoast SEO plugin
>>> used to remove stopwords, but stopped doing so
>>> <https://yoast.com/yoast-seo-7-0/> a few years back.
>>>
>>> Potentially outdated SEO concerns aside, does this behavior still align
>>> well with the needs and desires of Django users? Is this something this
>>> community would be open to revisiting? Thanks for your consideration.
>>>
>>> (One minor point on language support: allowing these words would help to
>>> resolve at least some of the unequal treatment given to English over other
>>> languages, for example #12905
>>> <https://code.djangoproject.com/ticket/12905>. See also wagtail#4899
>>> <https://github.com/wagtail/wagtail/issues/4899>, from which much of
>>> this post has been copied, for an example of how this logic impacts a
>>> Django-based CMS.)
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-d...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/fb6c9596-951d-4102-91b5-b5fd9c8c6340%40googlegroups.com
>>> <https://groups.google.com/d/msgid/django-developers/fb6c9596-951d-4102-91b5-b5fd9c8c6340%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>> Adam
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/8f1e9719-da61-421a-97a1-9313ee0dd8db%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/8f1e9719-da61-421a-97a1-9313ee0dd8db%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM0fOEXVyP%2Br1Eyw1izE7TFGCcE5i2STqpAWUFsjumhBVA%40mail.gmail.com.

Reply via email to