Hi everybody,

We (me and Carl Meyer) did some experimentation with encoding behavior on Python 3. Carl did some hacking on getting virtualenv running on Python 3 and it turned out that his version of virtualenv did not work on Python 3 on my server either. So none of the virtulenv installations did though they all seemed to work for some people.

Looking closer the problem is that virtualenv was assuming that 'open(filename).read()' works. However on my particular system the default encoding in Python 3 for files was 'ASCII'. That encoding was picked up because of three things: a) Python 3's default encoding for opening files is picked up from the system locale, b) the ssh server accepts the client's encoding for everything (including filenames) and c) the OS X default installation for many people does not initialize locales properly which forces the server to fall back to 'POSIX' which then by applications (including Python) is picked up as ASCII.

Now this showcases a couple of problems on different levels:

-   developers assume that the default for encodings is UTF-8 because
    that is the encoding on their local machine.  Now falling back to
    the platform dependent encoding is documented but does not make a
    lot of sense.  The limiting platform is probably Windows which
    historically has problems with UTF-8 in the notepad editor.

    As a compromise I recommend UTF-8 for POSIX and UTF-8-sig for
    Windows as the Windows editor feels happier with this encoding.
    As the latter reads every file of the former that should not cause
    that many problems in practice

-   Seeing that SSH happily overrides the filesystem encoding I would
    like to forward this issue to some of the linux maintainers.  Having
    the SSH client override your filesystem encoding sounds like a
    terrible decision.  Apparently Python guesses the filesystem
    encoding from LC_CTYPES which however is overriden by connecting
    SSH clients.  Seeing how ubuntu and a bunch of other distributions
    are using Gnome which uses UTF-8 for filesystems as somewhat
    established default I would argue that Python should just assume
    UTF-8 as default encoding on a Linux environment.

-   Inform Apple about the fact that some Snow Leopard machines are
    by default setting the LC_CTYPES (and all other locales) variables
    to something that is not even a valid locale.  I am not yet sure why
    it does not happen on all machines, but it happens on more than one
    at PyCon alone.  On top of that I know that issue because it broke
    the Python "Babel" package for a while which is why I added a work-
    around for that particular problem.

    I will either way file a bug report at Apple for what the SSH client
    is doing on mixed local environments.


Are we missing anything?  Any suggestions?


Regards,
Armin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to