On 28 January 2016 at 20:23, Albert-Jan Roskam <sjeik_ap...@hotmail.com> wrote:
>
> Out of curiosity, I wrote the throw-away script below to find a character 
> that is classified (--> LC_CTYPE) as digit in one locale, but not in another.
> I ran it with 5000 locale combinations in Python 2 but did not find any 
> (somebody shut down my computer!). I just modified the code so it also
> runs in Python 3. Is this the correct way to find such locale-dependent regex 
> matches?

Eryk already gave you a better explanation of the locale stuff than I
could but I have a separate comment about the algorithmic performance
of your code (since you mentioned that it took a long time).

You're looping over all pairs of locales:

...
> for n, (locale1, locale2) in enumerate(itertools.combinations(locales, 2), >
...
>     for i in xrange(sys.maxunicode + 1):   # 1114111
>         s = unichr(i)  #.encode("utf8")
>         try:
>             locale.setlocale(locale.LC_CTYPE, locale1)
>             m1 = bool(regex.match(s))
>             locale.setlocale(locale.LC_CTYPE, locale2)
>             m2 = bool(regex.match(s))
>             if m1 ^ m2:  # m1 != m2

Suppose there are N locales and M is sys.maxunicode. The number of
pairs of locales is N*(N-1)/2 which grows like N**2. For each pair you
loop over M characters so the innermost loop body is repeated
something like M*N**2 times.

Assume that f(locale, c) is the function that gets e.g. m1 or m2 in
your code above. We can swap the loops around so that the outer loop
is over unicode characters. Then the inner loop can be over the
locales but we only loop over all N locales once rather than over all
N**2 pairs of locales. This looks like this:

    for c in unicode_chacters:
        matched = f(locales[0], c) # Check the first locale
        for locale in locales:
            assert all(f(locale, c) == matched for locale in locales)

This way you call f(locale, c) M*N times which if N is not small
should be a lot faster than M*N**2 times.

--
Oscar
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to