from:"\"Neil Hodgson\""

Re: [Python-Dev] Cut/Copy/Paste items in IDLE right click context menu

2013-02-16 Thread Neil Hodgson

Nick Coghlan:

> - no need for extensive cross-OS testing prior to commit, that's a key
> part of the role of the buildbots

   Are the buildbots able to test UI features like menu selections?

   Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] cffi in stdlib

2013-02-27 Thread Neil Hodgson

Armin Rigo:

> So the general answer to your question is: we google MessageBox and
> copy that line from the microsoft site, and manually remove the
> unnecessary WINAPI and _In_opt declarations:

   Wouldn't it be better to understand the SAL annotations like _In_opt so that 
spurious NULLs (for example) produce a good exception from cffi instead of 
failing inside the system call?

   Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] cffi in stdlib

2013-02-28 Thread Neil Hodgson

Armin Rigo:

> Maybe.  Feel like adding an issue to
> https://bitbucket.org/cffi/cffi/issues, with references?

   OK, issue #62 added.

>  This looks
> like a Windows-specific extension, which means that I don't
> automatically know about it.

   While SAL is Windows-specific, gcc supports some similar attributes 
including nonnull and sentinel.

   Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] IDLE in the stdlib

2013-03-20 Thread Neil Hodgson

Terry Reedy:

> Broken (and quirky): it has an absurdly limited output buffer (under a 
> thousand lines)

   The limit is actually  lines.

> Quirky: Windows uses cntl-C to copy selected text to the clipboard and (where 
> appropriate) cntl-V to insert clipboard text at the cursor pretty much 
> everywhere.

   CP uses Ctrl+C to interrupt programs similar to Unix. Therefore it moves 
copy to a different key in a similar way to Unix consoles like GNOME Terminal 
and MATE Terminal which use Shift+Ctrl+C for copy despite Ctrl+C being the 
standard for other applications.

   Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-10 Thread Neil Hodgson

   The technique advocated by Theodore Ts'o (save to temporary then
rename) discards metadata. What would be useful is a simple, generic
way in Python to copy all the appropriate metadata (ownership, ACLs,
...) to another file so the temporary-and-rename technique could be
used.

   On Windows, there is a hack in the file system that tries to track
the use of temporary-and-rename and reapply ACLs and on OS X there is
a function FSPathReplaceObject but I don't know how to do this
correctly on Linux.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Neil Hodgson

Antoine Pitrou:

> How about shutil.copystat()?

   shutil.copystat does not copy over the owner, group or ACLs.

   Modeling a copymetadata method on copystat would provide an easy to
understand API and should be implementable on Windows and POSIX.
Reading the OS X documentation shows a set of low-level POSIX
functions for ACLs. Since there are multiple pieces of metadata and an
application may not want to copy all pieces there could be multiple
methods (copygroup ...) or one method with options
shutil.copymetadata(src, dst, group=True, resource_fork=False)

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Neil Hodgson

Antoine Pitrou:

> It depends on what you call "ACLs". It does copy the chmod permission bits.

Access Control Lists are fine grained permissions. Perhaps you
want to allow Sam to read a file and for Ted to both read and write
it. These permissions should not need to be reset every time you
modify the file.

> As for owner and group, I think there is a very good reason that it doesn't 
> copy
> them: under Linux, only root can change these properties.

   Since I am a member of both "staff" and "everyone", I can set group
on one of my files from "staff" to "everyone" or back again:

$ chown :everyone x.pl
$ ls -la x.pl
-rwxrwxrwx  1 nyamatongwe  everyone  269 Mar 11  2008 x.pl
$ chown :staff x.pl
$ ls -la x.pl
-rwxrwxrwx  1 nyamatongwe  staff  269 Mar 11  2008 x.pl

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Evaluated cmake as an autoconf replacement

2009-03-29 Thread Neil Hodgson

Jeffrey Yasskin:

>  1. It can autogenerate the Visual Studio project files instead of
> needing them to be maintained separately

   I have looked at a couple of build tools (scons was probably one)
that generate Visual Studio project files in the past and they
produced fairly poor project files, which would compile the code but
wouldn't be as capable as project files created by hand. Its been a
while so I can't remember the details. The current Python project
files are hierarchical, building several DLLs and an EXE and I think
this was outside the scope of the tools I looked at.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Evaluated cmake as an autoconf replacement

2009-04-09 Thread Neil Hodgson

   cmake does not produce relative paths in its generated make and
project files. There is an option CMAKE_USE_RELATIVE_PATHS which
appears to do this but the documentation says:

"""This option does not work for more complicated projects, and
relative paths are used when possible. In general, it is not possible
to move CMake generated makefiles to a different location regardless
of the value of this variable."""

   This means that generated Visual Studio project files will not work
for other people unless a particular absolute build location is
specified for everyone which will not suit most. Each person that
wants to build Python will have to run cmake before starting Visual
Studio thus increasing the prerequisites.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support for Python/Windows

2009-07-21 Thread Neil Hodgson

Curt Hagenlocher:

> Ah, you're right -- the PGO bits probably need VS Pro. The 64-bit
> compilers should be in the Windows SDK, but it wouldn't surprise me if
> they were not included in Express.

   64-bit isn't in Express and merging the 64 bit compiler from the
SDK into Express may be possible but certainly isn't easy. I just use
the command line compiler to check 64 bit issues.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] command line attachable debugger

2009-07-24 Thread Neil Hodgson

Glyph Lefkowitz:

> Sounds like this is moving into hypothetical territory better-suited to
> python-ideas.  (Although I'm sure that if you wanted to contribute polished,
> tested code for a standard remote debugger interface, few people would
> complain.)

   There is a remote debugger protocol called DBGP for different
languages (including Python) and debuggers (such as Komodo)
http://xdebug.org/docs-dbgp.php

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw32 and gc-header weirdness

2009-07-25 Thread Neil Hodgson

Martin v. Löwis:

> I propose to add another (regular) double into the union.

   Adding a regular double as a second dummy gives the same sizes and
alignments with Mingw or MSVC as the original definition with MSVC:

typedef union _gc_head {
   struct {
   union _gc_head *gc_next;
   union _gc_head *gc_prev;
   Py_ssize_t gc_refs;
   } gc;
   long double dummy;  /* force worst-case alignment */
   double dummy2;  /* in case long double doesn't trigger worst-case */
} PyGC_Head;

   In regard to alignment penalties, a simple copy loop for doubles
runs about 20% slower when misaligned on an my AMD processor. Other
x86 processors can be much worse. As much as 2 to 3.25 times according
to
http://msdn.microsoft.com/en-us/library/aa290049%28VS.71%29.aspx

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw32 and gc-header weirdness

2009-07-25 Thread Neil Hodgson

Martin v. Löwis:

> Yes: alignof(PyGC_HEAD) would be specified as being the maximum
> alignment on a platform; sizeof(PyGC_HEAD) would be frozen.

   Maximum alignment currently on x86 is 16 bytes for SSE vector
types. Next year AVX will add 32 byte types and while they are
supposed to work OK with 16 byte alignment, performance will be better
with 32 byte alignment.

   It is possible that some use could be found for vector instructions
in core Python but it is more likely that they will only be used in
specialized extensions that can take care of alignment issues for
their own cases.

http://en.wikipedia.org/wiki/Advanced_Vector_Extensions
http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/61891/

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-04 Thread Neil Hodgson

Mark Hammond:

> Thanks Nick; I didn't want to be the only one saying that.  There is a fine
> line between asserting reasonable requirements for Windows users and being
> obstructionist and unhelpful, and I'm trying to stay on the former side :)

   I haven't commented on this issue before because I can't really be
helpful. I just don't understand why hg is being considered before
it's Windows support is roughly equivalent to svn and cvs.

   There has been some similar experience with the main repository for
the Cocoa port of Scintilla which is in bzr on launchpad. Several
times in that repository, files were checked in with wrong line ends
making every line appear changed when looking through history. There
are several causes for this including user error but bzr (and hg)
should default to more helpful behaviour on text files.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Neil Hodgson

Martin v. Löwis:

> Is it really that you don't *understand*? It's fairly easy: there was
> a PEP ...

   The PEP process is straightforward. However, a PEP may produce an
outcome that proves after more experience to be wrong. ISTM a
prerequisite to choosing a DVCS is that it should support the full
range of development platforms and thus the PEP was accepted
prematurely. At some point the PEP should be reexamined and, if
necessary, rescinded. What I don't understand is why the plan is still
to move to hg despite, after several months, there not being a known
good way to include Windows eol support.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Neil Hodgson

Martin v. Löwis:

> Or don't you understand why that single unresolved item didn't manage
> to revert the decision? Well, there are many unresolved items in
> the Mercurial conversion, some much more stressful than the eol issue
> (e.g. the branching discussion).

   Then these issues should have been included in the initial PEP for
choosing a DVCS since the issues could have driven the choice. PEP 374
implies that win32text effectively solves the Windows eol issue which
no longer appears to be correct.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-05 Thread Neil Hodgson

Glenn Linderman:

> and perhaps other things (and
> are there new Unicode control characters that could be used for line
> endings?),

   Unicode includes Line Separator U+2028 and Paragraph Separator
U+2029 but they are rarely supported and very rarely used. They are a
pain to work with since they are 3 byte sequences in UTF-8. Visual
Studio does support them.

   Python does not currently support these line separators such as in
this example which only reads 2 lines rather than 3:

with open("x.txt", "wb") as f:
f.write("a\nb\u2029c\n".encode('utf-8'))
with open("x.txt", "r") as f:
n = 1
for l in f.readlines():
print(n, repr(l))
n += 1

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 385: the eol-type issue

2009-08-06 Thread Neil Hodgson

M.-A. Lemburg:

> ... and because of this, the feature is already available if
> you use codecs.open() instead of the built-in open():

   So should I not add an issue for the basic open because codecs.open
should be used for this case?

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mercurial migration: help needed

2009-09-05 Thread Neil Hodgson

Dirkjan Ochtman:

> I know a lot of projects use Mercurial on Windows as well, I'm not
> aware of any big problems with it.

   If you have a Windows-only project with CRLF files using Mercurial
then there is no line end problem as Mercurial preserves the CRLFs for
you. Line end problems occur on mixed projects where both Windows and
Unix tools are used.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mercurial migration: help needed

2009-09-05 Thread Neil Hodgson

Paul Moore:

> 1. Given that the "problematic" tools (notepad and Visual Studio) are
> Windows tools, we seem to be back to the idea that this extension is
> only needed by Windows developers. As I understood the consensus to be
> that the extension should be for all users, I suspect I've missed
> something.

   Some of the problems come from users on Unix checking in files with
CRLF line ends that they have received using some other mechanism such
as sharing a disk between Windows and Linux. I was going to point to a
bad revision in a bzr housed project I work on but launchpad isn't
working currently. What happened was that an OS X user committed a set
of changes but with all the files having a different line ending to
the repository. The result is that it is no longer easy to track
changes before that revision. It also makes a check out larger.

   It would help in such cases for the commit command on Unix to
either automatically change any CRLF line ends to LF for text files
(but not files with an explicitly specified line end) or to display a
warning.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] please consider changing --enable-unicode default to ucs4

2009-10-07 Thread Neil Hodgson

Ronald Oussoren:

> Both Carbon and the modern APIs use UTF-16.

   If Unicode size standardization is seen as sufficiently beneficial
then UTF-16 would be more widely applicable than UTF-32. Unix mostly
uses 8-bit APIs which are either explicitly UTF-8 (such as GTK+) or
can accept UTF-8 when the locale is set to UTF-8. They don't accept
UTF-32. It is possible that Unix could move towards UTF-32 but that
hasn't been the case up to now and with both OS X and Windows being
UTF-16, it is more likely that UTF-16 APIs will become more popular on
Unix.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PyPI comments and ratings, really?

2009-11-12 Thread Neil Hodgson

   When SourceForge started having comments and ratings, I was a
little upset at having poor negative comments there (like "not
work!"). But after it has been running for a while it appears useful.
I suppose it helps that Scintilla has 88% thumbs up from 134
respondents. Because there is voting on comments, the more useful
comments have bubbled onto the front page.

   As the system is used more, you'll see a wider range of comments on
projects and you'll be able to tell more from them. It should be seen
as a completely separate thing to the existing fora and trackers that
each project has. While you want people to become involved in your
project, many are just having a quick look and don't want to sign up
for mailing lists or to interact with project members. They may just
want to quickly comment about whether it was suitable or not.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-31 Thread Neil Hodgson

Tim Delaney:

> I like this solution combined with having a single cache directory and a few
> other things I've added below.
> ...
> 2. /tmp is often on non-volatile memory. If it is (e.g. my Windows system
> temp dir is on a RAMdisk) then it seems wise to respect the obvious desire
> to throw away temporary files on shutdown.

   This may create security vulnerabilities. I could, for example,
insert a manipulated .pyc that logs passwords when other users run it.

   I can also see advantages to allowing out of tree compiled cache
directories. For example, you could have a locked down .py tree with
.pycs going into per-user trees. This prevents another user from
spoofing a .pyc I use as well as allowing users to install arbitrary
versions of Python without getting an admin to compile the .py tree
with the new compiler.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Reworking the GIL

2010-02-02 Thread Neil Hodgson

Eric Hopper:

> I don't suppose it will ever be ported back to Python 2.x?  It doesn't
> look like the whole GIL concept has changed much between Python 2.x and
> 3.x so I expect back-porting it would be pretty easy.

   There was a patch but it has been rejected.
http://bugs.python.org/issue7753

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposal for virtualenv functionality in Python

2010-02-21 Thread Neil Hodgson

Larry Hastings:

> But IIUC telling the compiler how to
> do that is only vaguely standardized--Microsoft's CL.EXE doesn't seem to
> support any environment variable containing an include /path/.

   The INCLUDE environment variable is a list of ';' separated paths
http://msdn.microsoft.com/en-us/library/36k2cdd4%28VS.100%29.aspx

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and Windows 2000

2010-03-01 Thread Neil Hodgson

Martin v. Löwis:

> I don't recall whether we have already decided about continued support
> for Windows 2000.
>
> If not, I'd like to propose that we phase out that support: the Windows
> 2.7 installer should display a warning; 3.2 will stop supporting Windows
> 2000.

   Is there any reason for this? I can understand dropping Windows 9x
due to the lack of Unicode support but is there anything missing from
Windows 2000 that makes supporting it difficult?

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and Windows 2000

2010-03-02 Thread Neil Hodgson

Martin v. Löwis:

> See http://bugs.python.org/issue6926
>
> The SDK currently hides symbolic constants from us that people are
> asking for.

   Setting the version to 0x501 (XP) doesn't actively try to stop
running on version 0x500 (2K), it just reveals the symbols and APIs
from 0x501. Including a call to an 0x501-specific API will then fail
at load.

IPPROTO_IPV6 (the cause of issue 6926) isn't a new symbol that
started working in Windows XP - it was present in older SDKs without a
version guard so was visible when compiling for any version of
Windows.

> In addition, we could simplify the code in dl_nt.c around
> GetCurrentActCtx and friends, by linking to these functions directly.

   It would be simpler but its not as if this code needs any changes
at this point.

   I don't really have a strong need for Windows 2000 although I keep
an instance for checking compatibility of my code and I do still get
queries from people using old versions of Windows, including 9x. There
is the question of whether to force failure on Windows 2000 or just
remove it from the list of known-working platforms while still
allowing it to run.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] C++

2010-03-12 Thread Neil Hodgson

Antoine Pitrou:

> Is this concern still valid? We are in the 2010s now.
> I'm not saying I want us to put some C++ in the core interpreter, but
> the portability argument sounds a little old...

   There are still viable platforms which only support subsets of C++.
IIRC, Android does not support exceptions in C++.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support byte string API of Windows in Python3?

2010-04-19 Thread Neil Hodgson

Victor Stinner:

> It's a choice, I didn't want to patch Windows because I know that Windows use
> unicode internally. I consider that developers using Python3 should use
> unicode on Windows, and byte or unicode+surrogates on other OS.

   The Win32 byte string APIs convert their inputs to Unicode and then
run Unicode code. You don't get additional capabilities by calling the
byte string APIs and should avoid them completely.

   Including an easy way to invoke them on Windows will just lead to
failures. People may think that Unix code that uses the byte string
APIs for better platform fidelity can just run this code on Windows
and get equivalent benefits. They won't and instead will see an
inverted form of the problems they are trying to avoid on Unix.

   If there is ever a reason to use a byte string API on Windows (and
I can't think of any) then ctypes can be used to explicitly call the
API desired.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-18 Thread Neil Hodgson

Michael Foord:

> Python 3.0 was *declared* to be an experimental release, and by most
> standards 3.1 (in terms of the core language and functionality) was a solid
> release.

   That looks to me like an after-the-event rationalization. The
release note for Python 3.0 (and the "What's new") gives no indication
that it is experimental but does say """
We are confident that Python 3.0 is of the same high quality as our
previous releases ...
you can safely choose either version (or both) to use in your projects. """
http://mail.python.org/pipermail/python-dev/2008-December/083824.html

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread Neil Hodgson

Steven D'Aprano:

> Do any other languages have any equivalent to this ebtyes type?

   The String type in Ruby 1.9 is a byte string with an encoding attribute.

   Most online Ruby documentation is for 1.8 but the API can be examined here:
http://ruby-doc.org/ruby-1.9/index.html
   Here's something more explanatory:
http://blog.grayproductions.net/articles/ruby_19s_string

   My view is that this actually makes things much more complex by
making encoding combination an n*n problem (where n is the number of
encodings) rather an n sized problem when you have a single core
string type

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-05 Thread Neil Hodgson

anatoly techtonik:

> The file consists of several licenses for multiple versions of Python.
> It is an unusual mix that negatively affects understanding.

   A simpler license would be better.

   There have been moves in the past to simplify the license of Python
but this would require agreement from the current rights owners
including CWI and CNRI. IIRC not all of the rights owners are willing
to agree to a change.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-12 Thread Neil Hodgson

Kurt B. Kaiser:

> I'm mystified about the comments that the GUI is ugly.  It is minimal.
> On XP, it looks exactly like an XP window with a simple menubar.  Those
> who haven't looked at it for awhile may not be aware of the recent
> advances made by Tk in native look and feel.  What is ugly?

   While Tk has improved at emulating native appearance, there are
still many differences.

   On the main editing screen of IDLE, the most noticeable issue is
that there is no horizontal scroll bar even though the text will move
left when you move the caret beyond the rightmost visible character.
The scrollbar and status bar have an appearance that looks to be from
Windows 2000, not Windows XP and there is no resizing gripper on the
right side of the status bar. The tear off menus are ugly as well as
being non-standard on all three major platforms.

   Use the "Configure IDLE..." and an "idle" dialog appears that also
looks to be from Windows 2000. I know Tk can do better than this as
Git Gui (the Tk (8.5.8) program I use most often) at least shows XP
themed buttons, scrollbars and other controls. However, the "idle"
dialog (as well as Git Gui) shows the largest remaining problem for Tk
user interfaces: keyboard navigation. When the "idle" dialog opens,
try doing anything with the keyboard. Chances are nothing will happen.
If you press Tab 16 times (yes, 16!) a focus rectangle will finally
show on the "Bold" check box. Another Tab takes you to the
"Indentation Width" slider. After that you don't see the focus until
it wraps around to "Bold" again. The Enter key doesn't trigger OK and
the Escape key doesn't let you escape.

   The Find and Replace dialogs are better as focus works as do Enter
and Escape but none of the buttons have mnemonics.

   This may all sound like picking nits but details and consistency
are important in user interfaces and this is just looking at the most
easily discovered problems.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library

2010-07-12 Thread Neil Hodgson

Kurt B. Kaiser:

>> The tear off menus are ugly as well as being non-standard on all three
>> major platforms.
>
> Well, would you discard them? They can (occasionally) be useful.

   Yes, I would replace the menus with ones missing the tear line.
Most of the GUI toolkits experimented with tear-offs (Mac in late 80s,
GTK+ up until 2002) and dropped them or hid them in a rarely visited
API. The idea initially appeared reasonable ("I can have the Run and
Check commands available with a single click") but was found to be too
confusing in use.

   IDLE, because it uses a separate top-level window for each file and
shell suffers more than most applications. A menu is torn off from one
window and always applies to that window but shows no visual affinity
with that window: its window is not even activated when a menu command
acts on it.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Removing IDLE from the standard library

2010-07-14 Thread Neil Hodgson

Stephen J. Turnbull:

> But it's very important to be able to *move* tabs across windows or
> panes.  ...
> In many apps, however, you would have to select the foo.c tab, close
> it, bring up a new window, and open foo.c using the long path
> (presumably with a file browser interface, but often enough the
> default directory is wherever you started the editor, not most
> recently used file).

   The common GUI technique is to drag a tab from one window into
another window. Drag onto the desktop for a new top level
 window. This is supported by, among others, Firefox; Chrome; gedit;
and GNOME Terminal.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-08 Thread Neil Hodgson

Terry Reedy:

> I suspect that the persons who first ported Python to MSDOS simply used what
> they were used to using, perhaps in their paid job. And I am sure that is
> still true of at least some of the people doing Windows support today.

   Some Windows developers actually prefer Visual Studio, including me.

   MingW has become less attractive in recent years by the difficulty
in downloading and installing a current version and finding out how to
do so. Some projects have moved on to the TDM packaging of MingW.

http://tdm-gcc.tdragon.net/

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fwd: i18n

2010-08-25 Thread Neil Hodgson

Terry Reedy:

>  File "C:\Python26\lib\socket.py", line 406, in readline
>    data = self._sock.recv(self._rbufsize)
> socket.error: [Errno 10054] A lÚtez§ kapcsolatot a tßvoli ßllomßs
> kÚnyszerÝtette n bezßrta

   That is pretty good mojibake. One of the problems of providing
localized error messages is that the messages may be messed up at
different stages. The original text was
A létező kapcsolatot a távoli állomás kényszerítetten bezárta.
   It was printed in iso8859_2 (ISO standard for Eastern European)
then those bytes were pasted in as if they were cp852 (MS-DOS Eastern
European).

text = "A létező kapcsolatot a távoli állomás kényszerítetten bezárta."
print(str(text.encode('iso8859_2'), 'cp852'))

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 384 status

2010-08-31 Thread Neil Hodgson

M.-A. Lemburg:

> Is it possible to have multiple versions of the lib C loaded
> on Windows ?

   Yes. It is possible not only to mix C runtimes from different
vendors but different variants from a single vendor.

   Historically, each vendor has shipped their own C runtime
libraries. This was also the case with CP/M and OS/2.

   Many applications can be extended with DLLs and if it were not
possible to load DLLs which use different runtimes then that would
limit which compilers can be used to extend particular applications.
If Microsoft were to stop DLLs compiled with Borland or Intel from
working inside Internet Explorer or Excel then there would be
considerable controversy and likely anti-trust actions.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)

2010-09-21 Thread Neil Hodgson

Ian Bicking:

> I think the use case everyone has in mind here is where
> you get a URL from one of these sources, and you want to handle it.  I have
> a hard time imagining the sequence of events that would lead to mojibake.
> Naive parsing of a document in bytes couldn't do it, because if you have a
> non-ASCII-compatible document your ASCII-based parsing will also fail (e.g.,
> looking for b'href="(.*?)"').

   It depends on what the particular ASCII-based parsing is doing. For
example, the set of trail bytes in Shift-JIS includes the same bytes
as some of the punctuation characters in ASCII as well as all the
letters. A search or split on '@' or '|' may find the trail byte in a
two-byte character rather than a true occurrence of that character so
the operation 'succeeds' but produces an incorrect result.

   Over time, the set of trail bytes used has expanded - in GB18030
digits are possible although many of the most important characters for
parsing such as ''' "#%&.?/''' are still safe as they may not be trail
bytes in the common double-byte character sets.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-02 Thread Neil Hodgson

Stephen J. Turnbull:

> Here's why: '''print "%d" %
> some_integer''' doesn't now, and never will (unless Kristan gets his
> Python 2.8), produce Arabic or Han numerals.  Not in any
> language I know of, not in Microsoft Excel, and definitely not in
> Python 2.

   While I don't have Excel to test with, OpenOffice.org Calc will
display in Arabic or Han numerals using the NatNum format codes.
http://www.scintilla.org/ArabicNumbers.png

> Ditto Arabic, I
> would imagine; ISO 8859/6 (aka Latin/Arabic) does not contain the
> Arabic digits that have been presented here earlier AFAICT.  Note that
> there's plenty of space for them in that code table (eg, 0xB0-0xB9 is
> empty).  Apparently nobody *ever* thought it was useful to have them!

   DOS code page 864 does use 0xB0-0xB9 for ٠ .. ٩.
http://www.ascii.ca/cp864.htm

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] first draft of bug guidelines for www.python.org/dev/

2006-07-20 Thread Neil Hodgson

Brett Cannon:

> But SourceForge does not support anonymous reporting.

   SourceForge does support anonymous reporting. A large proportion of
the fault reports I receive for Scintilla are anonymous as indicated
by "nobody" in the "Submitted By" column.
https://sourceforge.net/tracker/?group_id=2439&atid=102439

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 2.4, VS 2005 & Profile Guided Optmization

2006-07-23 Thread Neil Hodgson

Trent Nelson:

> I ended up playing around with Profile Guided Optimization, running
> ``python.exe pystones.py'' to collect call-graph data after
> python.exe/Python24.dll had been instrumented, then recompiling with the
> optimizations fed back in.

   It'd be an idea to build a larger body of Python code to run the
profiling pass on so it doesn't just optimize the sort of code in
pystone which is not very representative. Could run the test suite as
it would have good coverage but would hit exceptional cases too
heavily. Other compilers (Intel?) support profile directed
optimization so would also benefit from such a body of code.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] More tracker demos online

2006-07-25 Thread Neil Hodgson

Martin v. Löwis:

> Currently, we have two running tracker demos online:

   After playing with them for 30 minutes, Jira seems to have too busy
an interface and finicky behaviour: not liking the back button
sometimes (similar to SF) and clicking on diffs wants to download them
rather than view them. Its disappointing that Jira and Launchpad use
different bug IDs as continuity should be maintained with the SF bug
IDs which will be referred to in other areas such as commit messages.
They do include the SF bug ID (as a field in Jira and a nickname in
Launchpad) but this makes it harder to navigate between related bugs.
I mostly looked at "os.startfile() still doesn't work with Unicode
filenames" and I would have tagged the patch on SF with a "looks OK to
me" if SF was working.

   The text in Launchpad was a bit sparsely formatted for me so would
like to see if indvidual users can choose a different style. The
others are OK although Roundup is clearer.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-21 Thread Neil Hodgson

Travis Oliphant:

> 3) information about discontiguous memory segments
>
>
> Number 3 is where I could use feedback --- especially from PIL users and
> developers.   Strides are a common way to think about a possibly
> discontiguous chunk of memory (which appear in NumPy when you select a
> sub-region from a larger array). The strides vector tells you how many
> bytes to skip in each dimension to get to the next memory location for
> that dimension.

   I think one of the motivations for discontiguous segments was for
split buffers which are commonly used in text editors. A split buffer
has a gap in the middle where insertions and deletions can often occur
without moving much memory. When an insertion or deletion is required
elsewhere then the gap is first moved to that position. I have long
intended to implement a good split buffer extension for Python but the
best I have currently is an extension written using Boost.Python which
doesn't implement the buffer interface. Here is a description of split
buffers:

http://www.cs.cmu.edu/~wjh/papers/byte.html

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-21 Thread Neil Hodgson

Greg Ewing:

> So an array-of-pointers interface wouldn't be a direct
> substitute for the existing multi-segment buffer
> interface.

   Providing an array of (pointer,length) wouldn't be too much extra
work for a split vector implementation.

Guido van Rossum:

> But there's always a call to remove the gap (or move it to the end).

   Yes, although its something you try to avoid.

   I'm not saying that this is an important use-case since no one
seems to have produced a split vector implementation that provides the
buffer protocol. Numeric-style array handling is much more common so
deserves priority.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-23 Thread Neil Hodgson

   I have developed a split vector type that implements the buffer protocol at
http://scintilla.sourceforge.net/splitvector-1.0.zip

   It acts as a mutable string implementing most of the sequence
protocol as well as the buffer protocol. splitvector.SplitVector('c')
creates a vector containing 8 bit characters and
splitvector.SplitVector('u') is for Unicode.

   A writable attribute bufferAppearence can be set to 0 (default) to
respond to buffer protocol calls by moving the gap to the end and
returning the address of all of the data. Setting bufferAppearence to
1 responds as a two segment buffer. I haven't found any code that
understands responding with two segments. sre and file.write handle
SplitVector fine when it responds as a single segment:

import re, splitvector
x = splitvector.SplitVector("c")
x[:] = "The life of brian"
r = re.compile("l[a-z]*", re.M)
print x
y = r.search(x)
print y.group(0)
x.bufferAppearence = 1
y = r.search(x)
print y.group(0)

   produces

The life of brian
life
Traceback (most recent call last):
  File "qt.py", line 9, in 
y = r.search(x)
TypeError: expected string or buffer

   It is likely that adding multi-segment ability to sre would
complexify and slow it down. OTOH multi-segment buffers may be
well-suited to scatter/gather I/O calls like writev.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-09 Thread Neil Hodgson

Travis Oliphant:

> PEP: 3118
> ...

   I'd like to see the PEP include discussion of what to do when an
incompatible request is received while locked. Should there be a
standard "Can't do that: my buffer has been got" exception?

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-03 Thread Neil Hodgson

Stephen J. Turnbull:

> Will it accept Arabic on input?  (Han might be too much to ask for
> since Unicode considers Han digits to be "impure".)

   I couldn't find a direct way to input Arabic digits into OO Calc,
the normal use of Alt+number didn't work in Calc although it did in
WordPad where Alt+1632 is ٠ and so on.

   OO Calc does have settings in the Complex Text Layout section for
choosing different numerals but I don't understand the interaction of
choices here.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Import and unicode: part two

2011-01-20 Thread Neil Hodgson

Toshio Kuratomi:

> My examples that you're replying to involve two "properly
> configured" OS's.  The Linux workstations are configured with a UTF-8
> locale.  The Windows OS's use wide character unicode.  The problem occurs in
> that the code that one of the parties develops (either the students or the
> professors) is developed on one of those OS's and then used on the other OS.

   This implies a symmetric issue,. but I can not see how there can be
a problem with non-ASCII module names on Windows as the file system
allows all Unicode characters so can represent any module name.

   OS X is also based on Unicode file names. While it is possible to
mount file systems on Windows or OS X that do not support Unicode file
names these are a very unusual situation that will cause problems in
other ways.

   Common Linux distributions like Ubuntu and Fedora now default to
UTF-8 locales. The situations in which users may encounter
installations that do not support Unicode file names have reduced
greatly.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Import and unicode: part two

2011-01-26 Thread Neil Hodgson

Toshio Kuratomi:

> When they update their OS to a version that has
> utf-8 python module names, they will find that they have to make a choice.
> They can either change their locale settings to a utf-8 encoding and have
> the system installed modules work or they can leave their encoding on their
> non-utf-8 encoding and have the modules that they've created on-site work.

   When switching to a UTF-8 locale, they can also change the file
names of their modules to be encoded in UTF-8. It would be fairly easy
to write a script that identifies non-ASCII file names in a directory
and offers to transcode their names from their current encoding to
UTF-8.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mercurial conversion repositories

2011-02-25 Thread Neil Hodgson

   With hg 1.7.5 on Windows 7 I performed a non-core checkout:

hg clone http://hg.python.org/cpython

   The eol extension is enabled in global settings. I looked at things
a bit, opening some files and using the Tortoise Hg Repository
Explorer. But made no actual changes. Running hg diff produces a large
amount of output with almost all the *.decTest and most of the Windows
build files (*.mk, *.sln, *.vcproj, *.bat) showing as changed but with
identical text.

   I've had problems like this with Hg before
(http://mercurial.selenic.com/bts/issue2287). The situation can be
fixed by hg update to another version and then back to default.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mercurial conversion repositories

2011-02-25 Thread Neil Hodgson

Antoine Pitrou:

> It should now be fixed in current SVN, meaning the final conversion
> should be perfectly usable with the eol extension enabled.

   Good.

> Do you find other issues under Windows? Have you tried pushing changes?

   Since I'm not a member of core developers I used a http pull and can't push:

C:\u\cpython>hg push
pushing to http://hg.python.org/cpython
searching for changes
remote: ssl required

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pymigr: Ask for hgeol-checking hook.

2011-02-26 Thread Neil Hodgson

   Line end problems do occur in real projects. A scintilla-cocoa
project was branched off Scintilla to support the Cocoa GUI framework
on OS X. Here is one of the revisions in that project:
http://bazaar.launchpad.net/~mike-lischke/scintilla-cocoa/trunk/revision/5#include/ScintillaWidget.h

   If the ScintillaWidget.h changes aren't visible (after a brief
wait) then click on the arrow next to it. There are only 3 real
changed lines in this file (which are changing comments from C++ to C)
but the whole file appears to have been changed.

   This is far from the worst I have seen with some revisions showing
almost every line in a project changed.

   There are several effects from this:
1) The blame command loses usefulness as all lines in the file appear
to be from this revision.
2) Downloads become bigger, and take longer.
3) Fixing the issues takes time, effort and junks the history further.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] devguide (hg_transition): Advertise hg import over patch.

2011-02-27 Thread Neil Hodgson

Scott Dial:

> I don't believe TortoiseHG has such a feature (or I can't find it),
> although if you have TortoiseSVN, you can still use that as a patch tool.

   The Import... command is in the Synchronize menu of Hg Repository Explorer.

   There is no GUI equivalent to --no-commit but you can exit the
commit message editor without saving which causes the commit to be
abandoned with the patch still having been applied.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] devguide (hg_transition): Advertise hg import over patch.

2011-02-27 Thread Neil Hodgson

Adrian Buehlmann:

> FWIW, we are very close to releasing TortoiseHg 2.0 (due March 1st),
> which ported the current Gtk based TortoiseHg to Qt (although, it was
> more like a rewrite :-).

   I hope this is going to be fast. One of the reasons I chose Hg over
Bzr for another project was that the Bzr GUI tools which are written
using Qt are much slower, particularly when starting. A cold start of
Bazaar Explorer takes around 7 seconds on a new fast machine compared
with under a second to launch Hg Repository Explorer. Warm starts and
internal actions are better but the Hg GUI tools are still much
smoother than Bzr's.

   This slowness is quite common for Qt applications and I think is
because of the large set of DLLs that are loaded. Qt Creator is better
at around 4 seconds for a cold launch but, naturally, it doesn't
matter for an environment which you use for an extended period like Qt
Creator. It does matter for a VCS tool that you may invoke hundreds of
times in a day.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython hg transition complete

2011-03-05 Thread Neil Hodgson

Georg Brandl:

> I'm very happy to announce that the core Python repository switch
> to Mercurial is complete and the new repository at
> http://hg.python.org/cpython/ is now officially open for cloning,

   OK, I just performed a clone OK. It seems wrong to me that the
*.vcproj and *.vsprops files in PCBuild use Unix line ends. These
extensions are marked BIN in .hgeol. This machine does not have VS
2008 installed so I can't really check if that is OK.

   Just in case it is not all files, here are two with this issue
cpython\PCbuild\kill_python.vcproj
cpython\PCbuild\debug.vsprops

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython hg transition complete

2011-03-05 Thread Neil Hodgson

Antoine Pitrou:

> It mimicks their settings in the SVN repository, so it should be ok.

   It doesn't match how they are checked out by svn since they have
the property svn:eol-style set to 'native'. Therefore these files are
checked out by svn with Windows \r\n line ends.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython hg transition complete

2011-03-05 Thread Neil Hodgson

   To minimize differences from previous behaviour, it is probably
best to mimic svn more closely by changing .hgeol to either have all
the project files as native or allow fall through to the default ** =
native.

   Another possibility is to set Visual Studio project files to CRLF
but this is less compatible with how svn has been used. The advantage
to explicit CRLF is that if you clone onto a Unix system and then
share that disk with Windows or create an archive that is expanded on
Windows (in binary mode) then you have the expected line ends.
Similarly for sharing from Windows to Unix where the main problem is
that bash can be upset by CRLF line ends since it assumes that the CR
is part of the line and if the line ends with a file name (like "cat
.profile\r") will treat the CR as part of the file name.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] hgeol

2011-03-05 Thread Neil Hodgson

Martin v. Löwis:

> So how can I fix this properly: so that all files have CRLF, but
> are still attributed to whoever last modified them, rather than
> having them attributed to me?

   I don't think this is possible from the current state. It may be
possible to change the conversion process to 'rewrite history' to
produce clean annotations. On other projects, I've just changed the
files and accepted a degraded history.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bugs in thread_nt.h

2011-03-10 Thread Neil Hodgson

Martin v. Löwis:

> I guess all this advice doesn't really apply to this case, though.
> The Microsoft API declares the parameter as a volatile*, indicating
> that they consider it "proper" usage of the API to declare the storage
> volatile.

   The 'volatile' here is a modifier on the parameter and does not
require a corresponding agreement in the variable declaration. It
indicates that all access through the pointer inside the function will
be with volatile semantics. As long as all functions that operate on
the variable do so treating access as volatile then everything is
fine. You should only need to declare the variable as volatile if
there is other code that accesses it directly.

   If agreement was required then the compiler would print a warning.

   It is similar to declaring a function to take a const parameter:
there is no need for the variable to also be const.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII

2011-05-09 Thread Neil Hodgson

Victor Stinner:

> C and C++ identifiers are restricted to ASCII. I don't know for Fortran
> or Java.

   Some C and C++ implementations currently allow non-ASCII
identifiers and the forthcoming C1X and C++0x language standards
include non-ASCII identifiers. The allowed characters are specified in
Annexes of the respective standards.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf - Annex D
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf - Annex E

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII

2011-05-09 Thread Neil Hodgson

Victor Stinner:

> I read these documents but they don't explain which encoding is used in
> libraries and programs. Does it mean that Windows and Linux may use
> different encodings?

   Yes, Windows will use UTF-16 as it does for almost everything. From
a user's point of view, these should both just be seen as Unicode.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII

2011-05-09 Thread Neil Hodgson

Michael Urman:

> I'm not convinced this is correct for this case. GetProcAddress takes
> an "ANSI" string, meaning while it could theoretically use UTF-8, in
> practice I doubt it uses anything outside of ASCII safely. So while
> the name of the library would be encoded in UTF-16, the name of the
> function loaded from the library would not be.

   Yes you are right:
http://scintilla.org/NarrowName.png

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII

2011-05-09 Thread Neil Hodgson

Michael Urman:

> That screenshot seems to show UTF-8 is being used. This may just be
> the literal bytes in the .c file, but could it be something more
> dependable?

   The file is in UTF-8 so the compiler may just be copying the bytes.
There is a setlocale pragma but that seems to be just for string
literals.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] The socket HOWTO

2011-06-04 Thread Neil Hodgson

Antoine Pitrou:

> So what you're saying is that the text is mostly useless (or at least
> quite dispensable), but you think it's fine that people waste their
> time trying to read it?

   I found it useful when starting to write socket code. Later on I
learnt more but, as an introduction, this document was great. It is
written in an approachable manner and doesn't spend time on details
unimportant to initial understanding.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)

2008-03-21 Thread Neil Hodgson

zooko:

>  Um, isn't this tool called "unzip"?  I have done this -- accessed the
>  source code -- many times, and unzip suffices.

   The type of issue I ran into with eggs is when you get an exception
with a trace that includes an egg, you can't use the normal means to
look at the code. Instead you have to understand that its an egg,
unzip the code, manually translate the path, open the file and go to
the line number. Similarly, you can't easily grep the code in its egg
state. If there was a global flag where I could say 'install eggs as
directories of source' then I'd be much happier. Just reread the
EasyInstall documentation and '--always-unzip' is portrayed as a
'don't do this' option.

   As it is I just avoid eggs. They may make sense for installing
applications but for development they get in the way.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Neil Hodgson

Glenn Linderman:

> That said, regexp, or some sort of cursor on a string, might be a workable
> solution.  Will it have adequate performance?  Perhaps, at least for some
> applications.  Will it be as conceptually simple as indexing an array of
> graphemes?  No.  Will it ever reach the efficiency of indexing an array of
> graphemes? No.  Does that matter? Depends on the application.

   Using an iterator for cluster access is a common technique
currently. For example, with the Pango text layout and drawing
library, you may create a PangoLayoutIter over a text layout object
(which contains a UTF-8 string along with formatting information) and
iterate by clusters by calling pango_layout_iter_next_cluster. Direct
access to clusters by index is not as useful in this domain as access
by pixel positions - for example to examine the portion of a layout
visible in a window.

   
http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter
   In this API, 'index' is used to refer to a byte index into UTF-8,
not a character or cluster index.

   Rather than discuss functionality in the abstract, we need some use
cases involving different levels of character and cluster access to
see whether providing indexed access is worthwhile. I'll start with an
example: some text drawing engines draw decomposed characters ("o"
followed by " ̈" -> "ö") differently compared to their composite
equivalents ("ö") and this may be perceived as better or worse. I'd
like to offer an option to replace some decomposed characters with
their composite equivalent before drawing but since other characters
may look worse, I don't want to do a full normalization. The API style
that appears most useful for this example is an iterator over the
input string that yields composed and decomposed character strings
(that is, it will yield both "ö" and "ö"), each character string is
then converted if in a substitution dictionary and written to an
output string. This is similar to an iterator over grapheme clusters
although, since it is only aimed at composing sequences, the iterator
could be simpler than a full grapheme cluster iterator.

   One of the benefits of iterator access to text is that many
different iterators can be built without burdening the implementation
object with extra memory costs as would be likely with techniques that
build indexes into the representation.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Neil Hodgson

Guido van Rossum:

> On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson  wrote:
>> [...] some text drawing engines draw decomposed characters ("o"
>> followed by " ̈" -> "ö") differently compared to their composite
>> equivalents ("ö") and this may be perceived as better or worse. I'd
>> like to offer an option to replace some decomposed characters with
>> their composite equivalent before drawing but since other characters
>> may look worse, I don't want to do a full normalization.
>
> Isn't this an issue properly solved by various normal forms?

   No, since normalization of all cases may actually lead to worse
visuals in some situations. A potential reason for drawing decomposed
characters differently is that more room may be allocated for the
generic condition where a character may be combined with a wide
variety of accents compared with combining it with a specific accent.

   Here is an example on Windows drawing composite and decomposed
forms to show the types of difference often encountered.
http://scintilla.org/Composite.png
   Now, this particular example displays both forms quite reasonably
so would not justify special processing but I have seen on other
platforms and earlier versions of Windows where the umlaut in the
decomposed form is displaced to the right even to the extent of
disappearing under the next character. In the example, the decomposed
'o' is shorter and lighter and the umlauts are round instead of
square.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-09-01 Thread Neil Hodgson

Glenn Linderman:

> How many different iterators into the same text would be concurrently needed
> by an application?  And why? Seems like if it is dealing with text at the
> level of grapheme clusters, it needs that type of iterator.  Of course, if
> it does I/O it needs codec access, but that is by nature sequential from the
> starting point to the end point.

   I would expect that there would mostly be a single iterator into a
string but can imagine scenarios in which multiple iterators may be
concurrently active and that these could be of different types. For
example, say we wanted to search for each code point in a text that
fails some test (such as being a member of a set of unwanted vowel
diacritics) and then display that failure in context with its
surrounding text of up to 30 graphemes either side.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-09-01 Thread Neil Hodgson

Stephen J. Turnbull:

> ...  Eg, this is why the common GUIs for Unix (X.org, GTK+, and
> Qt) either provide or require UTF-8 coding for their text.

   Qt uses UTF-16 for its basic QString type. While QString is mostly
treated as a black box which you can create from input buffers in any
encoding, the only encoding allowed for a contents-by-reference
QString (QString::fromRawData) is UTF-16.
http://doc.qt.nokia.com/latest/qstring.html#fromRawData

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Windows 8 support

2011-09-14 Thread Neil Hodgson

Austin Fernandes:

> Which versions of python will be compatible with windows8. I am using
> currently 2.7.2 version.

   Current releases of both Python 2.7 and Python 3.2 appear to run
fine on the Windows 8 Developer Preview. You should download and
install the preview to ensure that your own code is compatible.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python as a Metro-style App

2012-01-07 Thread Neil Hodgson

Antoine Pitrou:

> When you say MoveFile is absent, is MoveFileEx supported instead?

   WinRT strongly prefers asynchronous methods for all lengthy
operations. The most likely call to use for moving files is
StorageFile.MoveAsync.
http://msdn.microsoft.com/en-us/library/windows/apps/br227219.aspx

> Depending on the extent of removed/disabled functionality, it might not
> be very interesting to have a Metro port at all.

   Asynchronous APIs will become much more important on all platforms
in the future to ensure responsive user interfaces. Python should not
be left behind.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python as a Metro-style App

2012-01-07 Thread Neil Hodgson

Antoine Pitrou:

> How does it translate to C?

   The simplest technique would be to use C++ code to bridge from C to
the API. If you really wanted to you could explicitly call the
function pointer in the COM vtable but doing COM in C is more effort
than calling through C++.

> I'm not sure why "responsive user interfaces" would be more important
> today than 10 years ago, but at least I hope Microsoft has found
> something more usable than overlapped I/O.

   They are more important now due to the use of phones and tablets
together with distant file systems.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] VS 11 Express is Metro only.

2012-05-30 Thread Neil Hodgson

Curt:

>> But will it be able to target Windows XP?

   It will likely be possible in a reasonable manner at some point. From 
http://blogs.msdn.com/b/visualstudio/archive/2012/05/18/a-look-ahead-at-the-visual-studio-11-product-lineup-and-platform-support.aspx
 :

"""C++ developers can also use the multi-targeting capability included in 
Visual Studio 11 to continue using the compilers and libraries included in 
Visual Studio 2010 to target Windows XP and Windows Server 2003. 
Multi-targeting for C++ applications currently requires a side-by-side 
installation of Visual Studio 2010. Separately, we are evaluating options for 
C++ that would enable developers to directly target XP without requiring a 
side-by-side installation of Visual Studio 2010 and intend to deliver this 
update post-RTM. """

Martin v. Löwis wrote:

> The only place where platform support matters is the CRT, and this is
> what I still want to test. E.g. it might be that the C RT works on XP,
> and the C++ RT might use newer API.

   C++ runtime is more dependent on post-XP features than C runtime but even 
the C runtime currently needs some thunks:
http://tedwvc.wordpress.com/

   Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Is msvcr71.dll re-redistributable?

2005-02-02 Thread Neil Hodgson

Anders J. Munch:

> 1. John X. Programmer buys the product, agrees to the EULA and puts
>the DLL up for download, with the explicit and stated intent of
>distributing it to anyone who needs it.

   Disallowed in 3.1(a):
# you agree: ... to distribute the Redistributables only ... in 
# conjunction with and as a part of a software application 
# product developed by you that adds significant and primary 
# functionality to the Redistributables

> Unless the EULA contains specific language to forbid such multi-stage
> open-ended redistribution, I'd say you can just re-redistribute away.

   Lawyers think like lawyers much better than developers do.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] comprehension abbreviation (was: Adding any() and all())

2005-03-13 Thread Neil Hodgson

Guido van Rossum:

> - Before anybody asks, I really do think the reason this is requested
> at all is really just to save typing; there isn't the "avoid double
> evaluation" argument that helped acceptance for assignment operators
> (+= etc.), and I find redability is actually improved with 'for'.

   For me, the main motivation is to drop an unnecessarily repeated
identifier. If you repeat something there is a chance that one of the
occurrances will be wrong which is one reason behind the Don't Repeat
Yourself principle. The reader can more readily see that this is a
filter expression rather than a transforming expression.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Visual studio 2005 express now free

2006-04-24 Thread Neil Hodgson

Martin v. Löwis:

> Apparently, the status of this changed right now: it seems that
> the 2003 compiler is not available anymore; the page now says
> that it was replaced with the 2005 compiler.
>
> Should we reconsider?

   I expect Microsoft means that Visual Studio Express will be
available free forever, not that you will always be able to download
Visual Studio 2005 Express. They normally only provide a particular
product version for a limited time after it has been superceded.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] unicode imports

2006-06-16 Thread Neil Hodgson

Kristján V. Jónsson:

> Although python has had full unicode support for filenames for a long time
> on selected platforms (e.g. Windows), there is one glaring deficiency:  It
> cannot import from paths containing unicode.  I´ve tried creating folders
> with chinese characters and adding them to path, to no avail.
> The standard install path in chinese distributions can be with a non-ANSI
> path, and installing an embedded python application there will break it.

   It should be unusual for a Chinese installation to use an install
path that can not be represented in MBCS. Try encoding the install
directory into MBCS before adding it to sys.path.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-06-27 Thread Neil Hodgson

Andrew Durdin:

> While we'ew discussing outstanding issues: In a related discussion of
> the path module on c.l.py, Thomas Heller pointed out that the path
> module doesn't correctly handle unicode paths:
> ...

   Here is a patch that avoids failure when paths can not be
represented in a single 8 bit encoding. It adds a _cwd variable in the
initialisation and then calls this rather than os.getcwd. I sent the
patch to Jason as well.

_base = str
_cwd = os.getcwd
try:
   if os.path.supports_unicode_filenames:
   _base = unicode
   _cwd = os.getcwdu
except AttributeError:
   pass

#...

   def getcwd():
   """ Return the current working directory as a path object. """
   return path(_cwd())

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-06-30 Thread Neil Hodgson

Guido van Rossum:

> Whoa! Do we really need a completely different mechanism for doing the
> same stuff we can already do? 

   One benefit I see for the path module is that it makes it easier to
write code that behaves correctly with unicode paths on Windows.
Currently, to implement code that may see unicode paths, you must
first understand that unicode paths may be an issue, then write
conditional code that uses either a string or unicode string to hold
paths whenever a new path is created.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-01 Thread Neil Hodgson

Thomas Heller:

> OTOH, Python is lacking a lot when you have to handle unicode strings on
> sys.path, in command line arguments, environment variables and maybe
> other places.  

   A new patch #1231336 "Add unicode for sys.argv, os.environ,
os.system" is now in SourceForge. New parallel features sys.argvu and
os.environu are provided and os.system accepts unicode arguments
similar to PEP 277. A screenshot showing why the existing features are
inadequate and the new features an enhancement are at
http://www.scintilla.org/pyunicode.png
   One problem is that when using "python -c cmd args", sys.argvu
includes the "cmd" but sys.argv does not. They both contain the "-c".
   os.system was changed to make it easier to add some test cases but
then that looked like too much trouble. There are far too many
variants on exec*, spawn* and popen* to write a quick patch for these.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-03 Thread Neil Hodgson

Guido van Rossum:

> Then maybe the code that handles Unicode paths in arguments should be
> fixed rather than adding a module that encapsulates a work-around...

   It isn't clear whether you are saying this should be fixed by the
user or in the library. For a quick example, say someone wrote some
code for counting lines in a directory:

import os
root = "docs"
lines = 0
for p in os.listdir(root):
lines += len(file(os.path.join(root,p)).readlines())
print lines, "document lines"

   Quite common code. Running it now with one file "abc" in the
directory yields correct behaviour:

>pythonw -u "xlines.py"
1 document lines

   Now copy the file "Здравствуйте" into the directory and run it again:

>pythonw -u "xlines.py"
Traceback (most recent call last):
  File "xlines.py", line 5, in ?
lines += len(file(os.path.join(root,p)).readlines())
IOError: [Errno 2] No such file or directory: 'docs\\'

   Changing line 2 to [root = u"docs"] will make the code work. If
this is the correct fix then all file handling code should be written
using unicode names.

   Contrast this to using path:

import path
root = "docs"
lines = 0
for p in path.path(root).files():
lines += len(file(p).readlines())
print lines, "document lines"

   The obvious code works with only "abc" in the directory and also
when "Здравствуйте" is added.

   Now, if you are saying it is a library failure, then there are
multiple ways to fix it.

   1) os.listdir should always return unicode. The problem with this
is that people will see breakage of existing scripts because of
promotion issues. Much existing code assumes a fixed locale, often
8859-1 and combining unicode and accented characters will raise
UnicodeDecodeError.

   2) os.listdir should not return "???" garbage, instead
promoting to unicode whenever it sees garbage. This may also lead to
UnicodeDecodeError as in (1).

   3) This is an exceptional situation but the exception should be
more explicit and raised earlier when os.listdir first encounters name
garbage.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-04 Thread Neil Hodgson

Thomas Heller:

> Not only that, all the other flags like -O and -E are also in sys.argvu
> but not in sys.argv.

   OK, new patch fixes these and the "-c" issue.

> Those are nearly obsoleted by the subprocess module (although I do not
> know how that handles unicode.

   It breaks. The argspec is zzOOiiOzO:CreateProcess.

>>> z = subprocess.Popen(u"cmd /c echo \u0417")
Traceback (most recent call last):
  File "", line 1, in ?
  File "c:\zed\python\dist\src\lib\subprocess.py", line 600, in __init__
errread, errwrite)
  File "c:\zed\python\dist\src\lib\subprocess.py", line 791, in _execute_child
startupinfo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0417' in
position 12: ordinal not in range(128)

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-06 Thread Neil Hodgson

Guido van Rossum:

> Ah, sigh. I didn't know that os.listdir() behaves differently when the
> argument is Unicode. Does os.listdir(".") really behave differently
> than os.listdir(u".")? 

   Yes:
>>> os.listdir(".")
['abc', '']
>>> os.listdir(u".")
[u'abc', 
u'\u0417\u0434\u0440\u0430\u0432\u0441\u0442\u0432\u0443\u0439\u0442\u0435']

> Bah! I don't think that's a very good design
> (although I see where it comes from). 

   Partly my fault. At the time I was more concerned with making
functionality possible rather than convenient.

> Promoting only those entries
> that need it seems the right solution -- user code that can't deal
> with the Unicode entries shouldn't be used around directories
> containing unicode -- if it needs to work around unicode it should be
> fixed to support that!

   OK, I'll work on a patch for that but I'd like to see the opinions
of the usual unicode guys as this will produce more opportunities for
UnicodeDecodeError. The modification will probably work in the
opposite way, asking for all the names in unicode and then attempting
to convert to the default code page with failures retaining the
unicode name.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-08 Thread Neil Hodgson

Thomas Heller:

> OTOH, I once had a bug report from a py2exe user who complained that the
> program didn't start when installed in a path with japanese characters
> on it.  I tried this out, the bug existed (and still exists), but I was
> astonished how many programs behaved the same: On a PC with english
> language settings, you cannot start WinZip or Acrobat Reader (to give
> just some examples) on a .zip or .pdf file contained in such a
> directory.

   Much of the time these sorts of bugs don't make themselves too hard
to live with because  most non-ASCII names that any user encounters
are still in the user's locale and so get mapped by Windows. It can be
a lot of work supporting wide file names. I have just added wide file
name support to my editor, SciTE, for the second time and am about to
rip it out again as it complicates too much code for too few
beneficiaries. (I want one executable for both Windows NT+ and 9x, so
wide file names has to be a runtime choice leading to maybe 50 new
branches in the code).

   If returning a mixture of unicode and narrow strings from
os.listdir is the right thing to do then maybe it better for sys.argv
and os.environ to also be mixtures. In patch #1231336 I added parallel
attributes, sys.argvu and os.environu to hold unicode versions of this
information. The alternative, placing unicode items in the existing
attributes minimises API size.

   One question here is whether unicode items should be added only
when the element is outside the user's locale (the CP_ACP code page)
or whenever the item is outside ASCII. The former is more similar to
existing behaviour but the latter is safer as it makes it harder to
implicitly treat the data as being in an incorrect encoding.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-08 Thread Neil Hodgson

Thomas Heller:

> But adding u'\u5b66\u6821\u30c7\u30fc' to sys.path won't allow to import
> this file as module.  Internally Python\import.c converts everything to
> strings.  I started to refactor import.c to work with PyStringObjects
> instead of char buffers as a first step - PyUnicodeObjects could have
> been added later, but I gave up because there seems absolute zero
> interest in it.

   Well, most people when confronted with this will rename the
directory to something simple like "ulib" and continue.

> I can't judge on this - but it's easy to experiment with it, even in
> current Python releases since sys.argvu, os.environu can also be
> provided by extension modules.

   It is the effect of this on the non-unicode-savvy that is
important: if os.environu goes into prereleases of 2.5 then the only
people that will use it are likely to be those who already try to keep
their code unicode compliant. There is only likely to be (negative)
feedback if existing features are made unicode-only or use unicode for
non-ASCII.

> But thanks that you care about this stuff - I'm a little bit worried
> because all the other folks seem to think everything's ok (?).

   Unicode is becoming more of an issue: many Linux distributions now
install by default with a UTF8 locale and other tools are starting to
use this: GCC 4 now delivers error messages using Unicode quote
characters like 'these' rather than `these'. There are 131 threads
found by Google Groups for (UnicodeEncodeError OR UnicodeDecodeError)
and 21 of these were in this June. A large proportion of the threads
are in language-specific groups so are not as visible to core
developers.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-09 Thread Neil Hodgson

M.-A. Lemburg:

> I don't really buy this "trick": what if you happen to have
> a home directory with Unicode characters in it ?

   Most people choose account names and thus home directory names that
are compatible with their preferred locale settings: German users are
unlikely to choose an account name that uses Japanese characters.
Unicode is only necessary for file names that are outside your default
locale. An administration utility may need to visit multiple user's
home directories and so is more likely to encounter files with names
that can not be represented in its default locale.

   I think it would be better if sys.path could include unicode
entries but expect the code will rarely be exercised.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-11 Thread Neil Hodgson

Guido van Rossum:

> In some sense the safest approach from this POV would be to return
> Unicode as soon as it can't be encoded using the global default
> encoding. IOW normally this would return Unicode for all names
> containing non-ASCII characters.

   On unicode versions of Windows, for attributes like os.listdir,
os.getcwd, sys.argv, and os.environ, which can usefully return unicode
strings, there are 4 options I see:

1) Always return unicode. This is the option I'd be happiest to use,
myself, but expect this choice would change the behaviour of existing
code too much and so produce much unhappiness.

2) Return unicode when the text can not be represented in ASCII. This
will cause a change of behaviour for existing code which deals with
non-ASCII data.

3) Return unicode when the text can not be represented in the default
code page. While this change can lead to breakage because of combining
byte string and unicode strings, it is reasonably safe from the point
of view of data integrity as current code is returning garbage strings
that look like '?'.

4) Provide two versions of the attribute, one with the current name
returning byte strings and a second with a "u" suffix returning
unicode. This is the least intrusive, requiring explicit changes to
code to receive unicode data. For patch #1231336 I chose this approach
producing sys.argvu and os.environu.

For os.listdir the current behaviour of returning unicode when its
argument is unicode can be retained but that is not extensible to, for
example, sys.argv.

   Since this issue may affect many attributes a common approach
should be chosen.

   For experimenting with os.listdir, there is a patch for
posixmodule.c at http://www.scintilla.org/difft.txt which implements
(2). To specify the US-ASCII code page, the number 20127 is used as
there is no definition for this in the system headers. To change to
(3) comment out the line with 20127 and uncomment the line with
CP_ACP. Unicode arguments produce unicode results.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-11 Thread Neil Hodgson

M.-A. Lemburg:

> It's naive to assume that all people in Germany using the German
> locale have German names ;-) 

   That is not an assumption I would make. The assumption I would make
is that if it is important to you to have your account name in a
particular character set then you will normally set your locale to
enable easy use of that account.

> I'm not sure why you bring up an administration tool: isn't
> the discussion about being able to load Python modules from
> directories with Unicode path components ?

   The discussion has moved between various aspects of unicode support
in Python. There are many areas of the Python library which are not
compatible with unicode and having an idea of the incidence of
particular situations helps define where effort is most effectively
spent. My experience has been that because of the way Windows handles
character set conversions, problems are less common on individual's
machines than they are on servers.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-11 Thread Neil Hodgson

M.-A. Lemburg:

> > 2) Return unicode when the text can not be represented in ASCII. This
> > will cause a change of behaviour for existing code which deals with
> > non-ASCII data.
> 
> +1 on this one (s/ASCII/Python's default encoding).

   I assume you mean the result of sys.getdefaultencoding() here.
Unless much of the Python library is modified to use the default
encoding, this will break. The problem is that different implicit
encodings are being used for reading data and for accessing files.
When calling a function, such as open, with a byte string, Python
passes that byte string through to Windows which interprets it as
being encoded in CP_ACP. When this differs from
sys.getdefaultencoding() there will be a mismatch.

   Say I have been working on a machine set up for Australian English
(or other Western European locale) but am working with Russian data so
have set Python's default encoding to cp1251. With this simple script,
g.py:

import sys
print file(sys.argv[1]).read()

   I process a file called '€.txt' with contents "European Euro" to produce

C:\zed>python_d g.py €.txt
European Euro

   With the proposed modification, sys.argv[1] u'\u20ac.txt' is
converted through cp1251 to '\x88.txt' as the Euro is located at 0x88
in CP1251. The operating system is then asked to open '\x88.txt' which
it interprets through CP_ACP to be u'\u02c6.txt' ('ˆ.txt') which then
fails. If you are very unlucky there will be a file called 'ˆ.txt' so
the call will succeed and produce bad data.

   Simulating with str(sys.argvu[1]):

C:\zed>python_d g.py €.txt
Traceback (most recent call last):
  File "g.py", line 2, in ?
print file(str(sys.argvu[1])).read()
IOError: [Errno 2] No such file or directory: '\x88.txt'

> -1: code pages are evil and the reason why Unicode was invented
> in the first place. This would be a step back in history.

   Features used to specify files (sys.argv, os.environ, ...) should
match functions used to open and perform other operations with files
as they do currently. This means their encodings should match.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-12 Thread Neil Hodgson

   Hi Marc-Andre,

> >With the proposed modification, sys.argv[1] u'\u20ac.txt' is
> > converted through cp1251
> 
> Actually, it is not: if you pass in a Unicode argument to
> one of the file I/O functions and the OS supports Unicode
> directly or at least provides the notion of a file system
> encoding, then the file I/O should use the Unicode APIs
> of the OS or convert the Unicode argument to the file system
> encoding. AFAIK, this is how posixmodule.c already works
> (more or less).

   Yes it is. The initial stage is reading the command line arguments.
The proposed modification is to change behaviour when constructing
sys.argv, os.environ or when calling os.listdir to "Return unicode
when the text can not be represented in Python's default encoding". I
take this to mean that when the value can be represented in Python's
default encoding then it is returned as a byte string in the default
encoding.

   Therefore, for the example, the code that sets up sys.argv has to
encode the unicode command line argument into cp1251.

> On input, file I/O APIs should accept both strings using
> the default encoding and Unicode. How these inputs are then
> converted to suit the OS is up to the OS abstraction layer, e.g.
> posixmodule.c.

   This looks to me to be insufficiently compatible with current
behaviour whih accepts byte strings outside the default encoding.
Existing code may call open("€.txt"). This is perfectly legitimate
current Python (with a coding declaration) as "€.txt" is a byte string
and file systems will accept byte string names. Since the standard
default encoding is ASCII, should such code raise UnicodeDecodeError?

> Changing this is easy, though: instead of using the "et"
> getargs format specifier, you'd have to use "es". The latter
> recodes strings based on the default encoding assumption to
> whatever other encoding you specify.

   Don't you want to convert these into unicode rather than another
byte string encoding? It looks to me as though the "es" format always
produces byte strings and the only byte string format that can be
passed to the operating system is the file system encoding which may
not contain all the characters in the default encoding.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-16 Thread Neil Hodgson

Martin v. Löwis:

> - But then, the wide API gives all results as Unicode. If you want to
>   promote only those entries that need it, it really means that you
>   only want to "demote" those that don't need it. But how can you tell
>   whether an entry needs it? There is no API to find out.

   I wrote a patch for os.listdir at
http://www.scintilla.org/difft.txt that uses WideCharToMultiByte to
check if a wide name can be represented in a particular code page and
only uses that representation if it fits. This is good for Windows
code pages including ASCII and "mbcs" but since Python's
sys.getdefaultencoding() can be something that has no code page
equivalent, it would have to try converting using strict mode and
interpret failure as leaving the name as unicode.

>   You could declare that anything with characters >128 needs it,
>   but that would be an incompatible change: If a character >128 in
>   the system code page is in a file name, listdir currently returns
>   it in the system code page. It then would return a Unicode string.

   I now quite like returning unicode for anything non-ASCII on
Windows as there is no ambiguity in what the result means and there
will be no need to change all the system calls to translate from the
default encoding. It is a change to the API which can lead to code
breaking but it should break with an exception. Assuming that byte
string arguments are using Python's default encoding looks more
dangerous with a behavioural change but no notification.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

2005-07-17 Thread Neil Hodgson

Martin v. Löwis:

> This appears to be based on the usedDefault return value of
> WideCharToMultiByte. I believe this is insufficient:
> WideCharToMultiByte might convert Unicode characters to
> codepage characters in a lossy way, without using the default
> character. For example, it converts U+0308 (combining diaeresis)
> to U+00A8 (diaeresis) (or something like that, I forgot the
> exact details). So if you have, say, "p-umlaut" (i.e. U+0070
> U+0308), it converts it to U+0070 U+00A8 (in the local code page).
> Trying to use this as a filename later fails.

   There is WC_NO_BEST_FIT_CHARS to defeat that. It says that it will
use the default character if the translation can't be round-tripped.
Available on WIndows 2000 and XP but not NT4. We could compare the
original against the round-tripped as described at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_2bj9.asp

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Replacement for print in Python 3.0

2005-09-02 Thread Neil Hodgson

Gareth McCaughan:

> 3. It's convenient for debugging, interactive use, simple scripts,
>and various other things.

   Interactive use is its own mode and works differently to the base
language. To print the value of something, just type an expression.
Python will evaluate and print the value of the expression. Much
easier than adding 'print '. Extended interactive modes like ipython
include other conveniences that don't belong in the python language.

   The problem with print is it becomes a barrier to extending a
script into something more ambitious. This then leads to ugly
'features' like '>>' and trailing commas. By all means provide a
simple syntax for i/o with the standard streams but ensure it is
something that is a firm basis for extension.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Replacement for print in Python 3.0

2005-09-04 Thread Neil Hodgson

Gareth McCaughan:

> >Interactive use is its own mode and works differently to the base
> > language. To print the value of something, just type an expression.
> 
> Doesn't do the same thing.

   In interactive mode, you are normally interested in the values of
things, not their formatting so it does the right thing. If you need
particular formatting or interpretation, you can always achieve this.

> Do you have any suggestion that's as practically usable
> as "print"?

   The print function proposal is already as usable as the print
statement. When I write a print statement, I'd like to be able to
redirect that to a log or GUI easily. If print is a function then its
interface can be reimplemented but users can't add new statements to
Python.

   Creation of strings containing values could be simplified as that
would be applicable in many cases. I actually like being able to
append to strings in Java with the second operand being stringified.
Perhaps a stringify and catenate operator could be included in Python.
Like this:
MessageBox("a=" ° a ° "pos=" ° x°","°y)
   
   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] international python

2005-09-09 Thread Neil Hodgson

Antoine Pitrou:

> As for seamless unicode support, there are also problems sometimes with
> filenames and filepaths: see e.g.
> https://sourceforge.net/tracker/?func=detail&aid=1283895&group_id=5470&atid=105470

   This bug report is using byte string arguments causing byte string
processing rather than unicode calls with unicode processing. Windows
code that may encounter file paths outside the default locale should
stick to unicode for paths. Try converting os.curdir to unicode before
calling other functions:

os.path.abspath(unicode(os.curdir))

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] international python

2005-09-09 Thread Neil Hodgson

Antoine Pitrou:

> I don't have a Windows machine at hand right now to test it, but, even
> if this solution works, it breaks the principle of least astonishment:

   Astonishment is subjective and so a poor tool to measure by. At one
stage Ruby tried to follow the more common formulation "principle of
least surprise" (POLS) but this produced arguments of the following
form:

   I am surprised by X.
   Therefore, X contradicts POLS.
   Therefore, X must be fixed.

   POLS was then abandoned.

> os.path.abspath() should do the Right Thing regardless of what the
> current locale is.

   This was discussed recently and the consensus position was for
functions that can not return a value in the default encoding to
instead return a unicode value. Correct implementation of this would
require not only changing the behaviour of functions returning strings
but also those receiving strings (which should treat byte strings as
being in the default encoding). This would require a large amount of
work, and is unlikely to be performed in the near future.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pythonic concurrency

2005-10-10 Thread Neil Hodgson

Bruce Eckel:

> I would say that the troublesome meme is that "threads are easy." I
> posted an earlier, rather longish message about this. The gist of
> which was: "when someone says that threads are easy, I have no idea
> what they mean by it."

   I think you are overcomplicating the issue by looking at too many
levels at once. The memory model is something that implementers of
threading support need to understand. Users of that threading support
just need to know that concurrent access to variables is dangerous and
that they should use locks to access shared variables or use other
forms of packaged inter-thread communication.

   Double Checked Locking is an optimization (removal of a lock) of an
attempt to better modularize code (by automating the helper object
creation). I'd either just leave the lock in or if benchmarking
revealed an unacceptable performance problem, allocate the helper
object before the resource is accessible to more than one thread. For
statics, expose an Init method that gets called when the application
is in the initial one user thread state.

> But I just finished a 150-page chapter on Concurrency in Java which
> took many months to write, based on a large chapter on Concurrency in
> C++ which probably took longer to write. I keep in reasonably good
> touch with some of the threading experts. I can't get any of them to
> say that it's easy, even though they really do understand the issues
> and think about it all the time. *Because* of that, they say that it's
> hard.

   Implementing threading is hard. Using threading is not that hard.
Its a source of complexity but so are many aspects of development. I
get scared by reentrance in UI code.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Neil Hodgson

Guido van Rossum:

> Folks, please focus on what Python 3000 should do.
>
> I'm thinking about making all character strings Unicode (possibly with
> different internal representations a la NSString in Apple's Objective
> C) and introduce a separate mutable bytes array data type. But I could
> use some validation or feedback on this idea from actual
> practitioners.

   I'd like to more tightly define Unicode strings for Python 3000.
Currently, Unicode strings may be implemented with either 2 byte
(UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
contain any Unicode character and should be indexable yielding
characters rather than half characters. Therefore Python strings
should appear to be UTF-32. There could still be multiple
implementations (using UTF-16 or UTF-8) to preserve space but all
implementations should appear to be the same apart from speed and
memory use.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson

Martin v. Löwis:

> That's very tricky. If you have multiple implementations, you make
> usage at the C API difficult. If you make it either UTF-8 or UTF-32,
> you make PythonWin difficult. If you make it UTF-16, you make indexing
> difficult.

   For Windows, the code will get a little uglier, needing to perform
an allocation/encoding and deallocation more often then at present but
I don't think there will be a speed degradation as Windows is
currently performing a conversion from 8 bit to UTF-16 inside many
system calls. To minimize the cost of allocation, Python could copy
Windows in keeping a small number of commonly sized preallocated
buffers handy.

   For indexing UTF-16, a flag could be set to show if the string is
all in the base plane and if not, an index could be constructed when
and if needed. It'd be good to get some feel for what proportion of
string operations performed require indexing. Many, such as
startswith, split, and concatenation don't require indexing. The
proportion of operations that use indexing to scan strings would also
be interesting as adding a (currentIndex, currentOffset) cursor to
string objects would be another approach.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

1 2 >

1 - 100 of 115 matches

Mail list logo