Re: [Python-Dev] Cut/Copy/Paste items in IDLE right click context menu
Nick Coghlan: > - no need for extensive cross-OS testing prior to commit, that's a key > part of the role of the buildbots Are the buildbots able to test UI features like menu selections? Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cffi in stdlib
Armin Rigo: > So the general answer to your question is: we google MessageBox and > copy that line from the microsoft site, and manually remove the > unnecessary WINAPI and _In_opt declarations: Wouldn't it be better to understand the SAL annotations like _In_opt so that spurious NULLs (for example) produce a good exception from cffi instead of failing inside the system call? Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cffi in stdlib
Armin Rigo: > Maybe. Feel like adding an issue to > https://bitbucket.org/cffi/cffi/issues, with references? OK, issue #62 added. > This looks > like a Windows-specific extension, which means that I don't > automatically know about it. While SAL is Windows-specific, gcc supports some similar attributes including nonnull and sentinel. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IDLE in the stdlib
Terry Reedy: > Broken (and quirky): it has an absurdly limited output buffer (under a > thousand lines) The limit is actually lines. > Quirky: Windows uses cntl-C to copy selected text to the clipboard and (where > appropriate) cntl-V to insert clipboard text at the cursor pretty much > everywhere. CP uses Ctrl+C to interrupt programs similar to Unix. Therefore it moves copy to a different key in a similar way to Unix consoles like GNOME Terminal and MATE Terminal which use Shift+Ctrl+C for copy despite Ctrl+C being the standard for other applications. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ext4 data loss
The technique advocated by Theodore Ts'o (save to temporary then rename) discards metadata. What would be useful is a simple, generic way in Python to copy all the appropriate metadata (ownership, ACLs, ...) to another file so the temporary-and-rename technique could be used. On Windows, there is a hack in the file system that tries to track the use of temporary-and-rename and reapply ACLs and on OS X there is a function FSPathReplaceObject but I don't know how to do this correctly on Linux. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ext4 data loss
Antoine Pitrou: > How about shutil.copystat()? shutil.copystat does not copy over the owner, group or ACLs. Modeling a copymetadata method on copystat would provide an easy to understand API and should be implementable on Windows and POSIX. Reading the OS X documentation shows a set of low-level POSIX functions for ACLs. Since there are multiple pieces of metadata and an application may not want to copy all pieces there could be multiple methods (copygroup ...) or one method with options shutil.copymetadata(src, dst, group=True, resource_fork=False) Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ext4 data loss
Antoine Pitrou: > It depends on what you call "ACLs". It does copy the chmod permission bits. Access Control Lists are fine grained permissions. Perhaps you want to allow Sam to read a file and for Ted to both read and write it. These permissions should not need to be reset every time you modify the file. > As for owner and group, I think there is a very good reason that it doesn't > copy > them: under Linux, only root can change these properties. Since I am a member of both "staff" and "everyone", I can set group on one of my files from "staff" to "everyone" or back again: $ chown :everyone x.pl $ ls -la x.pl -rwxrwxrwx 1 nyamatongwe everyone 269 Mar 11 2008 x.pl $ chown :staff x.pl $ ls -la x.pl -rwxrwxrwx 1 nyamatongwe staff 269 Mar 11 2008 x.pl Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Evaluated cmake as an autoconf replacement
Jeffrey Yasskin: > 1. It can autogenerate the Visual Studio project files instead of > needing them to be maintained separately I have looked at a couple of build tools (scons was probably one) that generate Visual Studio project files in the past and they produced fairly poor project files, which would compile the code but wouldn't be as capable as project files created by hand. Its been a while so I can't remember the details. The current Python project files are hierarchical, building several DLLs and an EXE and I think this was outside the scope of the tools I looked at. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Evaluated cmake as an autoconf replacement
cmake does not produce relative paths in its generated make and project files. There is an option CMAKE_USE_RELATIVE_PATHS which appears to do this but the documentation says: """This option does not work for more complicated projects, and relative paths are used when possible. In general, it is not possible to move CMake generated makefiles to a different location regardless of the value of this variable.""" This means that generated Visual Studio project files will not work for other people unless a particular absolute build location is specified for everyone which will not suit most. Each person that wants to build Python will have to run cmake before starting Visual Studio thus increasing the prerequisites. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for Python/Windows
Curt Hagenlocher: > Ah, you're right -- the PGO bits probably need VS Pro. The 64-bit > compilers should be in the Windows SDK, but it wouldn't surprise me if > they were not included in Express. 64-bit isn't in Express and merging the 64 bit compiler from the SDK into Express may be possible but certainly isn't easy. I just use the command line compiler to check 64 bit issues. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] command line attachable debugger
Glyph Lefkowitz: > Sounds like this is moving into hypothetical territory better-suited to > python-ideas. (Although I'm sure that if you wanted to contribute polished, > tested code for a standard remote debugger interface, few people would > complain.) There is a remote debugger protocol called DBGP for different languages (including Python) and debuggers (such as Komodo) http://xdebug.org/docs-dbgp.php Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] mingw32 and gc-header weirdness
Martin v. Löwis: > I propose to add another (regular) double into the union. Adding a regular double as a second dummy gives the same sizes and alignments with Mingw or MSVC as the original definition with MSVC: typedef union _gc_head { struct { union _gc_head *gc_next; union _gc_head *gc_prev; Py_ssize_t gc_refs; } gc; long double dummy; /* force worst-case alignment */ double dummy2; /* in case long double doesn't trigger worst-case */ } PyGC_Head; In regard to alignment penalties, a simple copy loop for doubles runs about 20% slower when misaligned on an my AMD processor. Other x86 processors can be much worse. As much as 2 to 3.25 times according to http://msdn.microsoft.com/en-us/library/aa290049%28VS.71%29.aspx Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] mingw32 and gc-header weirdness
Martin v. Löwis: > Yes: alignof(PyGC_HEAD) would be specified as being the maximum > alignment on a platform; sizeof(PyGC_HEAD) would be frozen. Maximum alignment currently on x86 is 16 bytes for SSE vector types. Next year AVX will add 32 byte types and while they are supposed to work OK with 16 byte alignment, performance will be better with 32 byte alignment. It is possible that some use could be found for vector instructions in core Python but it is more likely that they will only be used in specialized extensions that can take care of alignment issues for their own cases. http://en.wikipedia.org/wiki/Advanced_Vector_Extensions http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/61891/ Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 385: the eol-type issue
Mark Hammond: > Thanks Nick; I didn't want to be the only one saying that. There is a fine > line between asserting reasonable requirements for Windows users and being > obstructionist and unhelpful, and I'm trying to stay on the former side :) I haven't commented on this issue before because I can't really be helpful. I just don't understand why hg is being considered before it's Windows support is roughly equivalent to svn and cvs. There has been some similar experience with the main repository for the Cocoa port of Scintilla which is in bzr on launchpad. Several times in that repository, files were checked in with wrong line ends making every line appear changed when looking through history. There are several causes for this including user error but bzr (and hg) should default to more helpful behaviour on text files. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 385: the eol-type issue
Martin v. Löwis: > Is it really that you don't *understand*? It's fairly easy: there was > a PEP ... The PEP process is straightforward. However, a PEP may produce an outcome that proves after more experience to be wrong. ISTM a prerequisite to choosing a DVCS is that it should support the full range of development platforms and thus the PEP was accepted prematurely. At some point the PEP should be reexamined and, if necessary, rescinded. What I don't understand is why the plan is still to move to hg despite, after several months, there not being a known good way to include Windows eol support. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 385: the eol-type issue
Martin v. Löwis: > Or don't you understand why that single unresolved item didn't manage > to revert the decision? Well, there are many unresolved items in > the Mercurial conversion, some much more stressful than the eol issue > (e.g. the branching discussion). Then these issues should have been included in the initial PEP for choosing a DVCS since the issues could have driven the choice. PEP 374 implies that win32text effectively solves the Windows eol issue which no longer appears to be correct. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 385: the eol-type issue
Glenn Linderman: > and perhaps other things (and > are there new Unicode control characters that could be used for line > endings?), Unicode includes Line Separator U+2028 and Paragraph Separator U+2029 but they are rarely supported and very rarely used. They are a pain to work with since they are 3 byte sequences in UTF-8. Visual Studio does support them. Python does not currently support these line separators such as in this example which only reads 2 lines rather than 3: with open("x.txt", "wb") as f: f.write("a\nb\u2029c\n".encode('utf-8')) with open("x.txt", "r") as f: n = 1 for l in f.readlines(): print(n, repr(l)) n += 1 Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 385: the eol-type issue
M.-A. Lemburg: > ... and because of this, the feature is already available if > you use codecs.open() instead of the built-in open(): So should I not add an issue for the basic open because codecs.open should be used for this case? Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
Dirkjan Ochtman: > I know a lot of projects use Mercurial on Windows as well, I'm not > aware of any big problems with it. If you have a Windows-only project with CRLF files using Mercurial then there is no line end problem as Mercurial preserves the CRLFs for you. Line end problems occur on mixed projects where both Windows and Unix tools are used. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
Paul Moore: > 1. Given that the "problematic" tools (notepad and Visual Studio) are > Windows tools, we seem to be back to the idea that this extension is > only needed by Windows developers. As I understood the consensus to be > that the extension should be for all users, I suspect I've missed > something. Some of the problems come from users on Unix checking in files with CRLF line ends that they have received using some other mechanism such as sharing a disk between Windows and Linux. I was going to point to a bad revision in a bzr housed project I work on but launchpad isn't working currently. What happened was that an OS X user committed a set of changes but with all the files having a different line ending to the repository. The result is that it is no longer easy to track changes before that revision. It also makes a check out larger. It would help in such cases for the commit command on Unix to either automatically change any CRLF line ends to LF for text files (but not files with an explicitly specified line end) or to display a warning. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] please consider changing --enable-unicode default to ucs4
Ronald Oussoren: > Both Carbon and the modern APIs use UTF-16. If Unicode size standardization is seen as sufficiently beneficial then UTF-16 would be more widely applicable than UTF-32. Unix mostly uses 8-bit APIs which are either explicitly UTF-8 (such as GTK+) or can accept UTF-8 when the locale is set to UTF-8. They don't accept UTF-32. It is possible that Unix could move towards UTF-32 but that hasn't been the case up to now and with both OS X and Windows being UTF-16, it is more likely that UTF-16 APIs will become more popular on Unix. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyPI comments and ratings, *really*?
When SourceForge started having comments and ratings, I was a little upset at having poor negative comments there (like "not work!"). But after it has been running for a while it appears useful. I suppose it helps that Scintilla has 88% thumbs up from 134 respondents. Because there is voting on comments, the more useful comments have bubbled onto the front page. As the system is used more, you'll see a wider range of comments on projects and you'll be able to tell more from them. It should be seen as a completely separate thing to the existing fora and trackers that each project has. While you want people to become involved in your project, many are just having a quick look and don't want to sign up for mailing lists or to interact with project members. They may just want to quickly comment about whether it was suitable or not. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3147: PYC Repository Directories
Tim Delaney: > I like this solution combined with having a single cache directory and a few > other things I've added below. > ... > 2. /tmp is often on non-volatile memory. If it is (e.g. my Windows system > temp dir is on a RAMdisk) then it seems wise to respect the obvious desire > to throw away temporary files on shutdown. This may create security vulnerabilities. I could, for example, insert a manipulated .pyc that logs passwords when other users run it. I can also see advantages to allowing out of tree compiled cache directories. For example, you could have a locked down .py tree with .pycs going into per-user trees. This prevents another user from spoofing a .pyc I use as well as allowing users to install arbitrary versions of Python without getting an admin to compile the .py tree with the new compiler. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reworking the GIL
Eric Hopper: > I don't suppose it will ever be ported back to Python 2.x? It doesn't > look like the whole GIL concept has changed much between Python 2.x and > 3.x so I expect back-porting it would be pretty easy. There was a patch but it has been rejected. http://bugs.python.org/issue7753 Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal for virtualenv functionality in Python
Larry Hastings: > But IIUC telling the compiler how to > do that is only vaguely standardized--Microsoft's CL.EXE doesn't seem to > support any environment variable containing an include /path/. The INCLUDE environment variable is a list of ';' separated paths http://msdn.microsoft.com/en-us/library/36k2cdd4%28VS.100%29.aspx Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and Windows 2000
Martin v. Löwis: > I don't recall whether we have already decided about continued support > for Windows 2000. > > If not, I'd like to propose that we phase out that support: the Windows > 2.7 installer should display a warning; 3.2 will stop supporting Windows > 2000. Is there any reason for this? I can understand dropping Windows 9x due to the lack of Unicode support but is there anything missing from Windows 2000 that makes supporting it difficult? Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and Windows 2000
Martin v. Löwis: > See http://bugs.python.org/issue6926 > > The SDK currently hides symbolic constants from us that people are > asking for. Setting the version to 0x501 (XP) doesn't actively try to stop running on version 0x500 (2K), it just reveals the symbols and APIs from 0x501. Including a call to an 0x501-specific API will then fail at load. IPPROTO_IPV6 (the cause of issue 6926) isn't a new symbol that started working in Windows XP - it was present in older SDKs without a version guard so was visible when compiling for any version of Windows. > In addition, we could simplify the code in dl_nt.c around > GetCurrentActCtx and friends, by linking to these functions directly. It would be simpler but its not as if this code needs any changes at this point. I don't really have a strong need for Windows 2000 although I keep an instance for checking compatibility of my code and I do still get queries from people using old versions of Windows, including 9x. There is the question of whether to force failure on Windows 2000 or just remove it from the list of known-working platforms while still allowing it to run. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C++
Antoine Pitrou: > Is this concern still valid? We are in the 2010s now. > I'm not saying I want us to put some C++ in the core interpreter, but > the portability argument sounds a little old... There are still viable platforms which only support subsets of C++. IIRC, Android does not support exceptions in C++. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support byte string API of Windows in Python3?
Victor Stinner: > It's a choice, I didn't want to patch Windows because I know that Windows use > unicode internally. I consider that developers using Python3 should use > unicode on Windows, and byte or unicode+surrogates on other OS. The Win32 byte string APIs convert their inputs to Unicode and then run Unicode code. You don't get additional capabilities by calling the byte string APIs and should avoid them completely. Including an easy way to invoke them on Windows will just lead to failures. People may think that Unix code that uses the byte string APIs for better platform fidelity can just run this code on Windows and get equivalent benefits. They won't and instead will see an inverted form of the problems they are trying to avoid on Unix. If there is ever a reason to use a byte string API on Windows (and I can't think of any) then ctypes can be used to explicitly call the API desired. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Michael Foord: > Python 3.0 was *declared* to be an experimental release, and by most > standards 3.1 (in terms of the core language and functionality) was a solid > release. That looks to me like an after-the-event rationalization. The release note for Python 3.0 (and the "What's new") gives no indication that it is experimental but does say """ We are confident that Python 3.0 is of the same high quality as our previous releases ... you can safely choose either version (or both) to use in your projects. """ http://mail.python.org/pipermail/python-dev/2008-December/083824.html Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
Steven D'Aprano: > Do any other languages have any equivalent to this ebtyes type? The String type in Ruby 1.9 is a byte string with an encoding attribute. Most online Ruby documentation is for 1.8 but the API can be examined here: http://ruby-doc.org/ruby-1.9/index.html Here's something more explanatory: http://blog.grayproductions.net/articles/ruby_19s_string My view is that this actually makes things much more complex by making encoding combination an n*n problem (where n is the number of encodings) rather an n sized problem when you have a single core string type Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing // PSF // Motion of non-confidence
anatoly techtonik: > The file consists of several licenses for multiple versions of Python. > It is an unusual mix that negatively affects understanding. A simpler license would be better. There have been moves in the past to simplify the license of Python but this would require agreement from the current rights owners including CWI and CNRI. IIRC not all of the rights owners are willing to agree to a change. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library
Kurt B. Kaiser: > I'm mystified about the comments that the GUI is ugly. It is minimal. > On XP, it looks exactly like an XP window with a simple menubar. Those > who haven't looked at it for awhile may not be aware of the recent > advances made by Tk in native look and feel. What is ugly? While Tk has improved at emulating native appearance, there are still many differences. On the main editing screen of IDLE, the most noticeable issue is that there is no horizontal scroll bar even though the text will move left when you move the caret beyond the rightmost visible character. The scrollbar and status bar have an appearance that looks to be from Windows 2000, not Windows XP and there is no resizing gripper on the right side of the status bar. The tear off menus are ugly as well as being non-standard on all three major platforms. Use the "Configure IDLE..." and an "idle" dialog appears that also looks to be from Windows 2000. I know Tk can do better than this as Git Gui (the Tk (8.5.8) program I use most often) at least shows XP themed buttons, scrollbars and other controls. However, the "idle" dialog (as well as Git Gui) shows the largest remaining problem for Tk user interfaces: keyboard navigation. When the "idle" dialog opens, try doing anything with the keyboard. Chances are nothing will happen. If you press Tab 16 times (yes, 16!) a focus rectangle will finally show on the "Bold" check box. Another Tab takes you to the "Indentation Width" slider. After that you don't see the focus until it wraps around to "Bold" again. The Enter key doesn't trigger OK and the Escape key doesn't let you escape. The Find and Replace dialogs are better as focus works as do Enter and Escape but none of the buttons have mnemonics. This may all sound like picking nits but details and consistency are important in user interfaces and this is just looking at the most easily discovered problems. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library
Kurt B. Kaiser: >> The tear off menus are ugly as well as being non-standard on all three >> major platforms. > > Well, would you discard them? They can (occasionally) be useful. Yes, I would replace the menus with ones missing the tear line. Most of the GUI toolkits experimented with tear-offs (Mac in late 80s, GTK+ up until 2002) and dropped them or hid them in a rarely visited API. The idea initially appeared reasonable ("I can have the Run and Check commands available with a single click") but was found to be too confusing in use. IDLE, because it uses a separate top-level window for each file and shell suffers more than most applications. A menu is torn off from one window and always applies to that window but shows no visual affinity with that window: its window is not even activated when a menu command acts on it. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing IDLE from the standard library
Stephen J. Turnbull: > But it's very important to be able to *move* tabs across windows or > panes. ... > In many apps, however, you would have to select the foo.c tab, close > it, bring up a new window, and open foo.c using the long path > (presumably with a file browser interface, but often enough the > default directory is wherever you started the editor, not most > recently used file). The common GUI technique is to drag a tab from one window into another window. Drag onto the desktop for a new top level window. This is supported by, among others, Firefox; Chrome; gedit; and GNOME Terminal. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] mingw support?
Terry Reedy: > I suspect that the persons who first ported Python to MSDOS simply used what > they were used to using, perhaps in their paid job. And I am sure that is > still true of at least some of the people doing Windows support today. Some Windows developers actually prefer Visual Studio, including me. MingW has become less attractive in recent years by the difficulty in downloading and installing a current version and finding out how to do so. Some projects have moved on to the TDM packaging of MingW. http://tdm-gcc.tdragon.net/ Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: i18n
Terry Reedy: > File "C:\Python26\lib\socket.py", line 406, in readline > data = self._sock.recv(self._rbufsize) > socket.error: [Errno 10054] A lÚtez§ kapcsolatot a tßvoli ßllomßs > kÚnyszerÝtette n bezßrta That is pretty good mojibake. One of the problems of providing localized error messages is that the messages may be messed up at different stages. The original text was A létező kapcsolatot a távoli állomás kényszerítetten bezárta. It was printed in iso8859_2 (ISO standard for Eastern European) then those bytes were pasted in as if they were cp852 (MS-DOS Eastern European). text = "A létező kapcsolatot a távoli állomás kényszerítetten bezárta." print(str(text.encode('iso8859_2'), 'cp852')) Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 384 status
M.-A. Lemburg: > Is it possible to have multiple versions of the lib C loaded > on Windows ? Yes. It is possible not only to mix C runtimes from different vendors but different variants from a single vendor. Historically, each vendor has shipped their own C runtime libraries. This was also the case with CP/M and OS/2. Many applications can be extended with DLLs and if it were not possible to load DLLs which use different runtimes then that would limit which compilers can be used to extend particular applications. If Microsoft were to stop DLLs compiled with Borland or Intel from working inside Internet Explorer or Excel then there would be considerable controversy and likely anti-trust actions. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
Ian Bicking: > I think the use case everyone has in mind here is where > you get a URL from one of these sources, and you want to handle it. I have > a hard time imagining the sequence of events that would lead to mojibake. > Naive parsing of a document in bytes couldn't do it, because if you have a > non-ASCII-compatible document your ASCII-based parsing will also fail (e.g., > looking for b'href="(.*?)"'). It depends on what the particular ASCII-based parsing is doing. For example, the set of trail bytes in Shift-JIS includes the same bytes as some of the punctuation characters in ASCII as well as all the letters. A search or split on '@' or '|' may find the trail byte in a two-byte character rather than a true occurrence of that character so the operation 'succeeds' but produces an incorrect result. Over time, the set of trail bytes used has expanded - in GB18030 digits are possible although many of the most important characters for parsing such as ''' "#%&.?/''' are still safe as they may not be trail bytes in the common double-byte character sets. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Stephen J. Turnbull: > Here's why: '''print "%d" % > some_integer''' doesn't now, and never will (unless Kristan gets his > Python 2.8), produce Arabic or Han numerals. Not in any > language I know of, not in Microsoft Excel, and definitely not in > Python 2. While I don't have Excel to test with, OpenOffice.org Calc will display in Arabic or Han numerals using the NatNum format codes. http://www.scintilla.org/ArabicNumbers.png > Ditto Arabic, I > would imagine; ISO 8859/6 (aka Latin/Arabic) does not contain the > Arabic digits that have been presented here earlier AFAICT. Note that > there's plenty of space for them in that code table (eg, 0xB0-0xB9 is > empty). Apparently nobody *ever* thought it was useful to have them! DOS code page 864 does use 0xB0-0xB9 for ٠ .. ٩. http://www.ascii.ca/cp864.htm Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] first draft of bug guidelines for www.python.org/dev/
Brett Cannon: > But SourceForge does not support anonymous reporting. SourceForge does support anonymous reporting. A large proportion of the fault reports I receive for Scintilla are anonymous as indicated by "nobody" in the "Submitted By" column. https://sourceforge.net/tracker/?group_id=2439&atid=102439 Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 2.4, VS 2005 & Profile Guided Optmization
Trent Nelson: > I ended up playing around with Profile Guided Optimization, running > ``python.exe pystones.py'' to collect call-graph data after > python.exe/Python24.dll had been instrumented, then recompiling with the > optimizations fed back in. It'd be an idea to build a larger body of Python code to run the profiling pass on so it doesn't just optimize the sort of code in pystone which is not very representative. Could run the test suite as it would have good coverage but would hit exceptional cases too heavily. Other compilers (Intel?) support profile directed optimization so would also benefit from such a body of code. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] More tracker demos online
Martin v. Löwis: > Currently, we have two running tracker demos online: After playing with them for 30 minutes, Jira seems to have too busy an interface and finicky behaviour: not liking the back button sometimes (similar to SF) and clicking on diffs wants to download them rather than view them. Its disappointing that Jira and Launchpad use different bug IDs as continuity should be maintained with the SF bug IDs which will be referred to in other areas such as commit messages. They do include the SF bug ID (as a field in Jira and a nickname in Launchpad) but this makes it harder to navigate between related bugs. I mostly looked at "os.startfile() still doesn't work with Unicode filenames" and I would have tagged the patch on SF with a "looks OK to me" if SF was working. The text in Launchpad was a bit sparsely formatted for me so would like to see if indvidual users can choose a different style. The others are OK although Roundup is clearer. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Extended Buffer Interface/Protocol
Travis Oliphant: > 3) information about discontiguous memory segments > > > Number 3 is where I could use feedback --- especially from PIL users and > developers. Strides are a common way to think about a possibly > discontiguous chunk of memory (which appear in NumPy when you select a > sub-region from a larger array). The strides vector tells you how many > bytes to skip in each dimension to get to the next memory location for > that dimension. I think one of the motivations for discontiguous segments was for split buffers which are commonly used in text editors. A split buffer has a gap in the middle where insertions and deletions can often occur without moving much memory. When an insertion or deletion is required elsewhere then the gap is first moved to that position. I have long intended to implement a good split buffer extension for Python but the best I have currently is an extension written using Boost.Python which doesn't implement the buffer interface. Here is a description of split buffers: http://www.cs.cmu.edu/~wjh/papers/byte.html Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Extended Buffer Interface/Protocol
Greg Ewing: > So an array-of-pointers interface wouldn't be a direct > substitute for the existing multi-segment buffer > interface. Providing an array of (pointer,length) wouldn't be too much extra work for a split vector implementation. Guido van Rossum: > But there's always a call to remove the gap (or move it to the end). Yes, although its something you try to avoid. I'm not saying that this is an important use-case since no one seems to have produced a split vector implementation that provides the buffer protocol. Numeric-style array handling is much more common so deserves priority. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Extended Buffer Interface/Protocol
I have developed a split vector type that implements the buffer protocol at http://scintilla.sourceforge.net/splitvector-1.0.zip It acts as a mutable string implementing most of the sequence protocol as well as the buffer protocol. splitvector.SplitVector('c') creates a vector containing 8 bit characters and splitvector.SplitVector('u') is for Unicode. A writable attribute bufferAppearence can be set to 0 (default) to respond to buffer protocol calls by moving the gap to the end and returning the address of all of the data. Setting bufferAppearence to 1 responds as a two segment buffer. I haven't found any code that understands responding with two segments. sre and file.write handle SplitVector fine when it responds as a single segment: import re, splitvector x = splitvector.SplitVector("c") x[:] = "The life of brian" r = re.compile("l[a-z]*", re.M) print x y = r.search(x) print y.group(0) x.bufferAppearence = 1 y = r.search(x) print y.group(0) produces The life of brian life Traceback (most recent call last): File "qt.py", line 9, in y = r.search(x) TypeError: expected string or buffer It is likely that adding multi-segment ability to sre would complexify and slow it down. OTOH multi-segment buffers may be well-suited to scatter/gather I/O calls like writev. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
Travis Oliphant: > PEP: 3118 > ... I'd like to see the PEP include discussion of what to do when an incompatible request is received while locked. Should there be a standard "Can't do that: my buffer has been got" exception? Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Stephen J. Turnbull: > Will it accept Arabic on input? (Han might be too much to ask for > since Unicode considers Han digits to be "impure".) I couldn't find a direct way to input Arabic digits into OO Calc, the normal use of Alt+number didn't work in Calc although it did in WordPad where Alt+1632 is ٠ and so on. OO Calc does have settings in the Complex Text Layout section for choosing different numerals but I don't understand the interaction of choices here. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Import and unicode: part two
Toshio Kuratomi: > My examples that you're replying to involve two "properly > configured" OS's. The Linux workstations are configured with a UTF-8 > locale. The Windows OS's use wide character unicode. The problem occurs in > that the code that one of the parties develops (either the students or the > professors) is developed on one of those OS's and then used on the other OS. This implies a symmetric issue,. but I can not see how there can be a problem with non-ASCII module names on Windows as the file system allows all Unicode characters so can represent any module name. OS X is also based on Unicode file names. While it is possible to mount file systems on Windows or OS X that do not support Unicode file names these are a very unusual situation that will cause problems in other ways. Common Linux distributions like Ubuntu and Fedora now default to UTF-8 locales. The situations in which users may encounter installations that do not support Unicode file names have reduced greatly. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Import and unicode: part two
Toshio Kuratomi: > When they update their OS to a version that has > utf-8 python module names, they will find that they have to make a choice. > They can either change their locale settings to a utf-8 encoding and have > the system installed modules work or they can leave their encoding on their > non-utf-8 encoding and have the modules that they've created on-site work. When switching to a UTF-8 locale, they can also change the file names of their modules to be encoded in UTF-8. It would be fairly easy to write a script that identifies non-ASCII file names in a directory and offers to transcode their names from their current encoding to UTF-8. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial conversion repositories
With hg 1.7.5 on Windows 7 I performed a non-core checkout: hg clone http://hg.python.org/cpython The eol extension is enabled in global settings. I looked at things a bit, opening some files and using the Tortoise Hg Repository Explorer. But made no actual changes. Running hg diff produces a large amount of output with almost all the *.decTest and most of the Windows build files (*.mk, *.sln, *.vcproj, *.bat) showing as changed but with identical text. I've had problems like this with Hg before (http://mercurial.selenic.com/bts/issue2287). The situation can be fixed by hg update to another version and then back to default. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial conversion repositories
Antoine Pitrou: > It should now be fixed in current SVN, meaning the final conversion > should be perfectly usable with the eol extension enabled. Good. > Do you find other issues under Windows? Have you tried pushing changes? Since I'm not a member of core developers I used a http pull and can't push: C:\u\cpython>hg push pushing to http://hg.python.org/cpython searching for changes remote: ssl required Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pymigr: Ask for hgeol-checking hook.
Line end problems do occur in real projects. A scintilla-cocoa project was branched off Scintilla to support the Cocoa GUI framework on OS X. Here is one of the revisions in that project: http://bazaar.launchpad.net/~mike-lischke/scintilla-cocoa/trunk/revision/5#include/ScintillaWidget.h If the ScintillaWidget.h changes aren't visible (after a brief wait) then click on the arrow next to it. There are only 3 real changed lines in this file (which are changing comments from C++ to C) but the whole file appears to have been changed. This is far from the worst I have seen with some revisions showing almost every line in a project changed. There are several effects from this: 1) The blame command loses usefulness as all lines in the file appear to be from this revision. 2) Downloads become bigger, and take longer. 3) Fixing the issues takes time, effort and junks the history further. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] devguide (hg_transition): Advertise hg import over patch.
Scott Dial: > I don't believe TortoiseHG has such a feature (or I can't find it), > although if you have TortoiseSVN, you can still use that as a patch tool. The Import... command is in the Synchronize menu of Hg Repository Explorer. There is no GUI equivalent to --no-commit but you can exit the commit message editor without saving which causes the commit to be abandoned with the patch still having been applied. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] devguide (hg_transition): Advertise hg import over patch.
Adrian Buehlmann: > FWIW, we are very close to releasing TortoiseHg 2.0 (due March 1st), > which ported the current Gtk based TortoiseHg to Qt (although, it was > more like a rewrite :-). I hope this is going to be fast. One of the reasons I chose Hg over Bzr for another project was that the Bzr GUI tools which are written using Qt are much slower, particularly when starting. A cold start of Bazaar Explorer takes around 7 seconds on a new fast machine compared with under a second to launch Hg Repository Explorer. Warm starts and internal actions are better but the Hg GUI tools are still much smoother than Bzr's. This slowness is quite common for Qt applications and I think is because of the large set of DLLs that are loaded. Qt Creator is better at around 4 seconds for a cold launch but, naturally, it doesn't matter for an environment which you use for an extended period like Qt Creator. It does matter for a VCS tool that you may invoke hundreds of times in a day. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython hg transition complete
Georg Brandl: > I'm very happy to announce that the core Python repository switch > to Mercurial is complete and the new repository at > http://hg.python.org/cpython/ is now officially open for cloning, OK, I just performed a clone OK. It seems wrong to me that the *.vcproj and *.vsprops files in PCBuild use Unix line ends. These extensions are marked BIN in .hgeol. This machine does not have VS 2008 installed so I can't really check if that is OK. Just in case it is not all files, here are two with this issue cpython\PCbuild\kill_python.vcproj cpython\PCbuild\debug.vsprops Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython hg transition complete
Antoine Pitrou: > It mimicks their settings in the SVN repository, so it should be ok. It doesn't match how they are checked out by svn since they have the property svn:eol-style set to 'native'. Therefore these files are checked out by svn with Windows \r\n line ends. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CPython hg transition complete
To minimize differences from previous behaviour, it is probably best to mimic svn more closely by changing .hgeol to either have all the project files as native or allow fall through to the default ** = native. Another possibility is to set Visual Studio project files to CRLF but this is less compatible with how svn has been used. The advantage to explicit CRLF is that if you clone onto a Unix system and then share that disk with Windows or create an archive that is expanded on Windows (in binary mode) then you have the expected line ends. Similarly for sharing from Windows to Unix where the main problem is that bash can be upset by CRLF line ends since it assumes that the CR is part of the line and if the line ends with a file name (like "cat .profile\r") will treat the CR as part of the file name. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] hgeol
Martin v. Löwis: > So how can I fix this properly: so that all files have CRLF, but > are still attributed to whoever last modified them, rather than > having them attributed to me? I don't think this is possible from the current state. It may be possible to change the conversion process to 'rewrite history' to produce clean annotations. On other projects, I've just changed the files and accepted a degraded history. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bugs in thread_nt.h
Martin v. Löwis: > I guess all this advice doesn't really apply to this case, though. > The Microsoft API declares the parameter as a volatile*, indicating > that they consider it "proper" usage of the API to declare the storage > volatile. The 'volatile' here is a modifier on the parameter and does not require a corresponding agreement in the variable declaration. It indicates that all access through the pointer inside the function will be with volatile semantics. As long as all functions that operate on the variable do so treating access as volatile then everything is fine. You should only need to declare the variable as volatile if there is other code that accesses it directly. If agreement was required then the compiler would print a warning. It is similar to declaring a function to take a const parameter: there is no need for the variable to also be const. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
Victor Stinner: > C and C++ identifiers are restricted to ASCII. I don't know for Fortran > or Java. Some C and C++ implementations currently allow non-ASCII identifiers and the forthcoming C1X and C++0x language standards include non-ASCII identifiers. The allowed characters are specified in Annexes of the respective standards. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf - Annex D http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf - Annex E Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
Victor Stinner: > I read these documents but they don't explain which encoding is used in > libraries and programs. Does it mean that Windows and Linux may use > different encodings? Yes, Windows will use UTF-16 as it does for almost everything. From a user's point of view, these should both just be seen as Unicode. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
Michael Urman: > I'm not convinced this is correct for this case. GetProcAddress takes > an "ANSI" string, meaning while it could theoretically use UTF-8, in > practice I doubt it uses anything outside of ASCII safely. So while > the name of the library would be encoded in UTF-16, the name of the > function loaded from the library would not be. Yes you are right: http://scintilla.org/NarrowName.png Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
Michael Urman: > That screenshot seems to show UTF-8 is being used. This may just be > the literal bytes in the .c file, but could it be something more > dependable? The file is in UTF-8 so the compiler may just be copying the bytes. There is a setlocale pragma but that seems to be just for string literals. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The socket HOWTO
Antoine Pitrou: > So what you're saying is that the text is mostly useless (or at least > quite dispensable), but you think it's fine that people waste their > time trying to read it? I found it useful when starting to write socket code. Later on I learnt more but, as an introduction, this document was great. It is written in an approachable manner and doesn't spend time on details unimportant to initial understanding. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 365 (Adding the pkg_resources module)
zooko: > Um, isn't this tool called "unzip"? I have done this -- accessed the > source code -- many times, and unzip suffices. The type of issue I ran into with eggs is when you get an exception with a trace that includes an egg, you can't use the normal means to look at the code. Instead you have to understand that its an egg, unzip the code, manually translate the path, open the file and go to the line number. Similarly, you can't easily grep the code in its egg state. If there was a global flag where I could say 'install eggs as directories of source' then I'd be much happier. Just reread the EasyInstall documentation and '--always-unzip' is portrayed as a 'don't do this' option. As it is I just avoid eggs. They may make sense for installing applications but for development they get in the way. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
Glenn Linderman: > That said, regexp, or some sort of cursor on a string, might be a workable > solution. Will it have adequate performance? Perhaps, at least for some > applications. Will it be as conceptually simple as indexing an array of > graphemes? No. Will it ever reach the efficiency of indexing an array of > graphemes? No. Does that matter? Depends on the application. Using an iterator for cluster access is a common technique currently. For example, with the Pango text layout and drawing library, you may create a PangoLayoutIter over a text layout object (which contains a UTF-8 string along with formatting information) and iterate by clusters by calling pango_layout_iter_next_cluster. Direct access to clusters by index is not as useful in this domain as access by pixel positions - for example to examine the portion of a layout visible in a window. http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter In this API, 'index' is used to refer to a byte index into UTF-8, not a character or cluster index. Rather than discuss functionality in the abstract, we need some use cases involving different levels of character and cluster access to see whether providing indexed access is worthwhile. I'll start with an example: some text drawing engines draw decomposed characters ("o" followed by " ̈" -> "ö") differently compared to their composite equivalents ("ö") and this may be perceived as better or worse. I'd like to offer an option to replace some decomposed characters with their composite equivalent before drawing but since other characters may look worse, I don't want to do a full normalization. The API style that appears most useful for this example is an iterator over the input string that yields composed and decomposed character strings (that is, it will yield both "ö" and "ö"), each character string is then converted if in a substitution dictionary and written to an output string. This is similar to an iterator over grapheme clusters although, since it is only aimed at composing sequences, the iterator could be simpler than a full grapheme cluster iterator. One of the benefits of iterator access to text is that many different iterators can be built without burdening the implementation object with extra memory costs as would be likely with techniques that build indexes into the representation. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
Guido van Rossum: > On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson wrote: >> [...] some text drawing engines draw decomposed characters ("o" >> followed by " ̈" -> "ö") differently compared to their composite >> equivalents ("ö") and this may be perceived as better or worse. I'd >> like to offer an option to replace some decomposed characters with >> their composite equivalent before drawing but since other characters >> may look worse, I don't want to do a full normalization. > > Isn't this an issue properly solved by various normal forms? No, since normalization of all cases may actually lead to worse visuals in some situations. A potential reason for drawing decomposed characters differently is that more room may be allocated for the generic condition where a character may be combined with a wide variety of accents compared with combining it with a specific accent. Here is an example on Windows drawing composite and decomposed forms to show the types of difference often encountered. http://scintilla.org/Composite.png Now, this particular example displays both forms quite reasonably so would not justify special processing but I have seen on other platforms and earlier versions of Windows where the umlaut in the decomposed form is displaced to the right even to the extent of disappearing under the next character. In the example, the decomposed 'o' is shorter and lighter and the umlauts are round instead of square. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
Glenn Linderman: > How many different iterators into the same text would be concurrently needed > by an application? And why? Seems like if it is dealing with text at the > level of grapheme clusters, it needs that type of iterator. Of course, if > it does I/O it needs codec access, but that is by nature sequential from the > starting point to the end point. I would expect that there would mostly be a single iterator into a string but can imagine scenarios in which multiple iterators may be concurrently active and that these could be of different types. For example, say we wanted to search for each code point in a text that fails some test (such as being a member of a set of unwanted vowel diacritics) and then display that failure in context with its surrounding text of up to 30 graphemes either side. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
Stephen J. Turnbull: > ... Eg, this is why the common GUIs for Unix (X.org, GTK+, and > Qt) either provide or require UTF-8 coding for their text. Qt uses UTF-16 for its basic QString type. While QString is mostly treated as a black box which you can create from input buffers in any encoding, the only encoding allowed for a contents-by-reference QString (QString::fromRawData) is UTF-16. http://doc.qt.nokia.com/latest/qstring.html#fromRawData Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Windows 8 support
Austin Fernandes: > Which versions of python will be compatible with windows8. I am using > currently 2.7.2 version. Current releases of both Python 2.7 and Python 3.2 appear to run fine on the Windows 8 Developer Preview. You should download and install the preview to ensure that your own code is compatible. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python as a Metro-style App
Antoine Pitrou: > When you say MoveFile is absent, is MoveFileEx supported instead? WinRT strongly prefers asynchronous methods for all lengthy operations. The most likely call to use for moving files is StorageFile.MoveAsync. http://msdn.microsoft.com/en-us/library/windows/apps/br227219.aspx > Depending on the extent of removed/disabled functionality, it might not > be very interesting to have a Metro port at all. Asynchronous APIs will become much more important on all platforms in the future to ensure responsive user interfaces. Python should not be left behind. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python as a Metro-style App
Antoine Pitrou: > How does it translate to C? The simplest technique would be to use C++ code to bridge from C to the API. If you really wanted to you could explicitly call the function pointer in the COM vtable but doing COM in C is more effort than calling through C++. > I'm not sure why "responsive user interfaces" would be more important > today than 10 years ago, but at least I hope Microsoft has found > something more usable than overlapped I/O. They are more important now due to the use of phones and tablets together with distant file systems. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] VS 11 Express is Metro only.
Curt: >> But will it be able to target Windows XP? It will likely be possible in a reasonable manner at some point. From http://blogs.msdn.com/b/visualstudio/archive/2012/05/18/a-look-ahead-at-the-visual-studio-11-product-lineup-and-platform-support.aspx : """C++ developers can also use the multi-targeting capability included in Visual Studio 11 to continue using the compilers and libraries included in Visual Studio 2010 to target Windows XP and Windows Server 2003. Multi-targeting for C++ applications currently requires a side-by-side installation of Visual Studio 2010. Separately, we are evaluating options for C++ that would enable developers to directly target XP without requiring a side-by-side installation of Visual Studio 2010 and intend to deliver this update post-RTM. """ Martin v. Löwis wrote: > The only place where platform support matters is the CRT, and this is > what I still want to test. E.g. it might be that the C RT works on XP, > and the C++ RT might use newer API. C++ runtime is more dependent on post-XP features than C runtime but even the C runtime currently needs some thunks: http://tedwvc.wordpress.com/ Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Is msvcr71.dll re-redistributable?
Anders J. Munch: > 1. John X. Programmer buys the product, agrees to the EULA and puts >the DLL up for download, with the explicit and stated intent of >distributing it to anyone who needs it. Disallowed in 3.1(a): # you agree: ... to distribute the Redistributables only ... in # conjunction with and as a part of a software application # product developed by you that adds significant and primary # functionality to the Redistributables > Unless the EULA contains specific language to forbid such multi-stage > open-ended redistribution, I'd say you can just re-redistribute away. Lawyers think like lawyers much better than developers do. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] comprehension abbreviation (was: Adding any() and all())
Guido van Rossum: > - Before anybody asks, I really do think the reason this is requested > at all is really just to save typing; there isn't the "avoid double > evaluation" argument that helped acceptance for assignment operators > (+= etc.), and I find redability is actually improved with 'for'. For me, the main motivation is to drop an unnecessarily repeated identifier. If you repeat something there is a chance that one of the occurrances will be wrong which is one reason behind the Don't Repeat Yourself principle. The reader can more readily see that this is a filter expression rather than a transforming expression. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Visual studio 2005 express now free
Martin v. Löwis: > Apparently, the status of this changed right now: it seems that > the 2003 compiler is not available anymore; the page now says > that it was replaced with the 2005 compiler. > > Should we reconsider? I expect Microsoft means that Visual Studio Express will be available free forever, not that you will always be able to download Visual Studio 2005 Express. They normally only provide a particular product version for a limited time after it has been superceded. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode imports
Kristján V. Jónsson: > Although python has had full unicode support for filenames for a long time > on selected platforms (e.g. Windows), there is one glaring deficiency: It > cannot import from paths containing unicode. I´ve tried creating folders > with chinese characters and adding them to path, to no avail. > The standard install path in chinese distributions can be with a non-ANSI > path, and installing an embedded python application there will break it. It should be unusual for a Chinese installation to use an install path that can not be represented in MBCS. Try encoding the install directory into MBCS before adding it to sys.path. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Andrew Durdin: > While we'ew discussing outstanding issues: In a related discussion of > the path module on c.l.py, Thomas Heller pointed out that the path > module doesn't correctly handle unicode paths: > ... Here is a patch that avoids failure when paths can not be represented in a single 8 bit encoding. It adds a _cwd variable in the initialisation and then calls this rather than os.getcwd. I sent the patch to Jason as well. _base = str _cwd = os.getcwd try: if os.path.supports_unicode_filenames: _base = unicode _cwd = os.getcwdu except AttributeError: pass #... def getcwd(): """ Return the current working directory as a path object. """ return path(_cwd()) Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Guido van Rossum: > Whoa! Do we really need a completely different mechanism for doing the > same stuff we can already do? One benefit I see for the path module is that it makes it easier to write code that behaves correctly with unicode paths on Windows. Currently, to implement code that may see unicode paths, you must first understand that unicode paths may be an issue, then write conditional code that uses either a string or unicode string to hold paths whenever a new path is created. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Thomas Heller: > OTOH, Python is lacking a lot when you have to handle unicode strings on > sys.path, in command line arguments, environment variables and maybe > other places. A new patch #1231336 "Add unicode for sys.argv, os.environ, os.system" is now in SourceForge. New parallel features sys.argvu and os.environu are provided and os.system accepts unicode arguments similar to PEP 277. A screenshot showing why the existing features are inadequate and the new features an enhancement are at http://www.scintilla.org/pyunicode.png One problem is that when using "python -c cmd args", sys.argvu includes the "cmd" but sys.argv does not. They both contain the "-c". os.system was changed to make it easier to add some test cases but then that looked like too much trouble. There are far too many variants on exec*, spawn* and popen* to write a quick patch for these. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Guido van Rossum: > Then maybe the code that handles Unicode paths in arguments should be > fixed rather than adding a module that encapsulates a work-around... It isn't clear whether you are saying this should be fixed by the user or in the library. For a quick example, say someone wrote some code for counting lines in a directory: import os root = "docs" lines = 0 for p in os.listdir(root): lines += len(file(os.path.join(root,p)).readlines()) print lines, "document lines" Quite common code. Running it now with one file "abc" in the directory yields correct behaviour: >pythonw -u "xlines.py" 1 document lines Now copy the file "Здравствуйте" into the directory and run it again: >pythonw -u "xlines.py" Traceback (most recent call last): File "xlines.py", line 5, in ? lines += len(file(os.path.join(root,p)).readlines()) IOError: [Errno 2] No such file or directory: 'docs\\' Changing line 2 to [root = u"docs"] will make the code work. If this is the correct fix then all file handling code should be written using unicode names. Contrast this to using path: import path root = "docs" lines = 0 for p in path.path(root).files(): lines += len(file(p).readlines()) print lines, "document lines" The obvious code works with only "abc" in the directory and also when "Здравствуйте" is added. Now, if you are saying it is a library failure, then there are multiple ways to fix it. 1) os.listdir should always return unicode. The problem with this is that people will see breakage of existing scripts because of promotion issues. Much existing code assumes a fixed locale, often 8859-1 and combining unicode and accented characters will raise UnicodeDecodeError. 2) os.listdir should not return "???" garbage, instead promoting to unicode whenever it sees garbage. This may also lead to UnicodeDecodeError as in (1). 3) This is an exceptional situation but the exception should be more explicit and raised earlier when os.listdir first encounters name garbage. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Thomas Heller: > Not only that, all the other flags like -O and -E are also in sys.argvu > but not in sys.argv. OK, new patch fixes these and the "-c" issue. > Those are nearly obsoleted by the subprocess module (although I do not > know how that handles unicode. It breaks. The argspec is zzOOiiOzO:CreateProcess. >>> z = subprocess.Popen(u"cmd /c echo \u0417") Traceback (most recent call last): File "", line 1, in ? File "c:\zed\python\dist\src\lib\subprocess.py", line 600, in __init__ errread, errwrite) File "c:\zed\python\dist\src\lib\subprocess.py", line 791, in _execute_child startupinfo) UnicodeEncodeError: 'ascii' codec can't encode character u'\u0417' in position 12: ordinal not in range(128) Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Guido van Rossum: > Ah, sigh. I didn't know that os.listdir() behaves differently when the > argument is Unicode. Does os.listdir(".") really behave differently > than os.listdir(u".")? Yes: >>> os.listdir(".") ['abc', ''] >>> os.listdir(u".") [u'abc', u'\u0417\u0434\u0440\u0430\u0432\u0441\u0442\u0432\u0443\u0439\u0442\u0435'] > Bah! I don't think that's a very good design > (although I see where it comes from). Partly my fault. At the time I was more concerned with making functionality possible rather than convenient. > Promoting only those entries > that need it seems the right solution -- user code that can't deal > with the Unicode entries shouldn't be used around directories > containing unicode -- if it needs to work around unicode it should be > fixed to support that! OK, I'll work on a patch for that but I'd like to see the opinions of the usual unicode guys as this will produce more opportunities for UnicodeDecodeError. The modification will probably work in the opposite way, asking for all the names in unicode and then attempting to convert to the default code page with failures retaining the unicode name. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Thomas Heller: > OTOH, I once had a bug report from a py2exe user who complained that the > program didn't start when installed in a path with japanese characters > on it. I tried this out, the bug existed (and still exists), but I was > astonished how many programs behaved the same: On a PC with english > language settings, you cannot start WinZip or Acrobat Reader (to give > just some examples) on a .zip or .pdf file contained in such a > directory. Much of the time these sorts of bugs don't make themselves too hard to live with because most non-ASCII names that any user encounters are still in the user's locale and so get mapped by Windows. It can be a lot of work supporting wide file names. I have just added wide file name support to my editor, SciTE, for the second time and am about to rip it out again as it complicates too much code for too few beneficiaries. (I want one executable for both Windows NT+ and 9x, so wide file names has to be a runtime choice leading to maybe 50 new branches in the code). If returning a mixture of unicode and narrow strings from os.listdir is the right thing to do then maybe it better for sys.argv and os.environ to also be mixtures. In patch #1231336 I added parallel attributes, sys.argvu and os.environu to hold unicode versions of this information. The alternative, placing unicode items in the existing attributes minimises API size. One question here is whether unicode items should be added only when the element is outside the user's locale (the CP_ACP code page) or whenever the item is outside ASCII. The former is more similar to existing behaviour but the latter is safer as it makes it harder to implicitly treat the data as being in an incorrect encoding. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Thomas Heller: > But adding u'\u5b66\u6821\u30c7\u30fc' to sys.path won't allow to import > this file as module. Internally Python\import.c converts everything to > strings. I started to refactor import.c to work with PyStringObjects > instead of char buffers as a first step - PyUnicodeObjects could have > been added later, but I gave up because there seems absolute zero > interest in it. Well, most people when confronted with this will rename the directory to something simple like "ulib" and continue. > I can't judge on this - but it's easy to experiment with it, even in > current Python releases since sys.argvu, os.environu can also be > provided by extension modules. It is the effect of this on the non-unicode-savvy that is important: if os.environu goes into prereleases of 2.5 then the only people that will use it are likely to be those who already try to keep their code unicode compliant. There is only likely to be (negative) feedback if existing features are made unicode-only or use unicode for non-ASCII. > But thanks that you care about this stuff - I'm a little bit worried > because all the other folks seem to think everything's ok (?). Unicode is becoming more of an issue: many Linux distributions now install by default with a UTF8 locale and other tools are starting to use this: GCC 4 now delivers error messages using Unicode quote characters like 'these' rather than `these'. There are 131 threads found by Google Groups for (UnicodeEncodeError OR UnicodeDecodeError) and 21 of these were in this June. A large proportion of the threads are in language-specific groups so are not as visible to core developers. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
M.-A. Lemburg: > I don't really buy this "trick": what if you happen to have > a home directory with Unicode characters in it ? Most people choose account names and thus home directory names that are compatible with their preferred locale settings: German users are unlikely to choose an account name that uses Japanese characters. Unicode is only necessary for file names that are outside your default locale. An administration utility may need to visit multiple user's home directories and so is more likely to encounter files with names that can not be represented in its default locale. I think it would be better if sys.path could include unicode entries but expect the code will rarely be exercised. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Guido van Rossum: > In some sense the safest approach from this POV would be to return > Unicode as soon as it can't be encoded using the global default > encoding. IOW normally this would return Unicode for all names > containing non-ASCII characters. On unicode versions of Windows, for attributes like os.listdir, os.getcwd, sys.argv, and os.environ, which can usefully return unicode strings, there are 4 options I see: 1) Always return unicode. This is the option I'd be happiest to use, myself, but expect this choice would change the behaviour of existing code too much and so produce much unhappiness. 2) Return unicode when the text can not be represented in ASCII. This will cause a change of behaviour for existing code which deals with non-ASCII data. 3) Return unicode when the text can not be represented in the default code page. While this change can lead to breakage because of combining byte string and unicode strings, it is reasonably safe from the point of view of data integrity as current code is returning garbage strings that look like '?'. 4) Provide two versions of the attribute, one with the current name returning byte strings and a second with a "u" suffix returning unicode. This is the least intrusive, requiring explicit changes to code to receive unicode data. For patch #1231336 I chose this approach producing sys.argvu and os.environu. For os.listdir the current behaviour of returning unicode when its argument is unicode can be retained but that is not extensible to, for example, sys.argv. Since this issue may affect many attributes a common approach should be chosen. For experimenting with os.listdir, there is a patch for posixmodule.c at http://www.scintilla.org/difft.txt which implements (2). To specify the US-ASCII code page, the number 20127 is used as there is no definition for this in the system headers. To change to (3) comment out the line with 20127 and uncomment the line with CP_ACP. Unicode arguments produce unicode results. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
M.-A. Lemburg: > It's naive to assume that all people in Germany using the German > locale have German names ;-) That is not an assumption I would make. The assumption I would make is that if it is important to you to have your account name in a particular character set then you will normally set your locale to enable easy use of that account. > I'm not sure why you bring up an administration tool: isn't > the discussion about being able to load Python modules from > directories with Unicode path components ? The discussion has moved between various aspects of unicode support in Python. There are many areas of the Python library which are not compatible with unicode and having an idea of the incidence of particular situations helps define where effort is most effectively spent. My experience has been that because of the way Windows handles character set conversions, problems are less common on individual's machines than they are on servers. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
M.-A. Lemburg: > > 2) Return unicode when the text can not be represented in ASCII. This > > will cause a change of behaviour for existing code which deals with > > non-ASCII data. > > +1 on this one (s/ASCII/Python's default encoding). I assume you mean the result of sys.getdefaultencoding() here. Unless much of the Python library is modified to use the default encoding, this will break. The problem is that different implicit encodings are being used for reading data and for accessing files. When calling a function, such as open, with a byte string, Python passes that byte string through to Windows which interprets it as being encoded in CP_ACP. When this differs from sys.getdefaultencoding() there will be a mismatch. Say I have been working on a machine set up for Australian English (or other Western European locale) but am working with Russian data so have set Python's default encoding to cp1251. With this simple script, g.py: import sys print file(sys.argv[1]).read() I process a file called '€.txt' with contents "European Euro" to produce C:\zed>python_d g.py €.txt European Euro With the proposed modification, sys.argv[1] u'\u20ac.txt' is converted through cp1251 to '\x88.txt' as the Euro is located at 0x88 in CP1251. The operating system is then asked to open '\x88.txt' which it interprets through CP_ACP to be u'\u02c6.txt' ('ˆ.txt') which then fails. If you are very unlucky there will be a file called 'ˆ.txt' so the call will succeed and produce bad data. Simulating with str(sys.argvu[1]): C:\zed>python_d g.py €.txt Traceback (most recent call last): File "g.py", line 2, in ? print file(str(sys.argvu[1])).read() IOError: [Errno 2] No such file or directory: '\x88.txt' > -1: code pages are evil and the reason why Unicode was invented > in the first place. This would be a step back in history. Features used to specify files (sys.argv, os.environ, ...) should match functions used to open and perform other operations with files as they do currently. This means their encodings should match. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Hi Marc-Andre, > >With the proposed modification, sys.argv[1] u'\u20ac.txt' is > > converted through cp1251 > > Actually, it is not: if you pass in a Unicode argument to > one of the file I/O functions and the OS supports Unicode > directly or at least provides the notion of a file system > encoding, then the file I/O should use the Unicode APIs > of the OS or convert the Unicode argument to the file system > encoding. AFAIK, this is how posixmodule.c already works > (more or less). Yes it is. The initial stage is reading the command line arguments. The proposed modification is to change behaviour when constructing sys.argv, os.environ or when calling os.listdir to "Return unicode when the text can not be represented in Python's default encoding". I take this to mean that when the value can be represented in Python's default encoding then it is returned as a byte string in the default encoding. Therefore, for the example, the code that sets up sys.argv has to encode the unicode command line argument into cp1251. > On input, file I/O APIs should accept both strings using > the default encoding and Unicode. How these inputs are then > converted to suit the OS is up to the OS abstraction layer, e.g. > posixmodule.c. This looks to me to be insufficiently compatible with current behaviour whih accepts byte strings outside the default encoding. Existing code may call open("€.txt"). This is perfectly legitimate current Python (with a coding declaration) as "€.txt" is a byte string and file systems will accept byte string names. Since the standard default encoding is ASCII, should such code raise UnicodeDecodeError? > Changing this is easy, though: instead of using the "et" > getargs format specifier, you'd have to use "es". The latter > recodes strings based on the default encoding assumption to > whatever other encoding you specify. Don't you want to convert these into unicode rather than another byte string encoding? It looks to me as though the "es" format always produces byte strings and the only byte string format that can be passed to the operating system is the file system encoding which may not contain all the characters in the default encoding. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Martin v. Löwis: > - But then, the wide API gives all results as Unicode. If you want to > promote only those entries that need it, it really means that you > only want to "demote" those that don't need it. But how can you tell > whether an entry needs it? There is no API to find out. I wrote a patch for os.listdir at http://www.scintilla.org/difft.txt that uses WideCharToMultiByte to check if a wide name can be represented in a particular code page and only uses that representation if it fits. This is good for Windows code pages including ASCII and "mbcs" but since Python's sys.getdefaultencoding() can be something that has no code page equivalent, it would have to try converting using strict mode and interpret failure as leaving the name as unicode. > You could declare that anything with characters >128 needs it, > but that would be an incompatible change: If a character >128 in > the system code page is in a file name, listdir currently returns > it in the system code page. It then would return a Unicode string. I now quite like returning unicode for anything non-ASCII on Windows as there is no ambiguity in what the result means and there will be no need to change all the system calls to translate from the default encoding. It is a change to the API which can lead to code breaking but it should break with an exception. Assuming that byte string arguments are using Python's default encoding looks more dangerous with a behavioural change but no notification. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding the 'path' module (was Re: Some RFE for review)
Martin v. Löwis: > This appears to be based on the usedDefault return value of > WideCharToMultiByte. I believe this is insufficient: > WideCharToMultiByte might convert Unicode characters to > codepage characters in a lossy way, without using the default > character. For example, it converts U+0308 (combining diaeresis) > to U+00A8 (diaeresis) (or something like that, I forgot the > exact details). So if you have, say, "p-umlaut" (i.e. U+0070 > U+0308), it converts it to U+0070 U+00A8 (in the local code page). > Trying to use this as a filename later fails. There is WC_NO_BEST_FIT_CHARS to defeat that. It says that it will use the default character if the translation can't be round-tripped. Available on WIndows 2000 and XP but not NT4. We could compare the original against the round-tripped as described at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_2bj9.asp Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Replacement for print in Python 3.0
Gareth McCaughan: > 3. It's convenient for debugging, interactive use, simple scripts, >and various other things. Interactive use is its own mode and works differently to the base language. To print the value of something, just type an expression. Python will evaluate and print the value of the expression. Much easier than adding 'print '. Extended interactive modes like ipython include other conveniences that don't belong in the python language. The problem with print is it becomes a barrier to extending a script into something more ambitious. This then leads to ugly 'features' like '>>' and trailing commas. By all means provide a simple syntax for i/o with the standard streams but ensure it is something that is a firm basis for extension. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Replacement for print in Python 3.0
Gareth McCaughan: > >Interactive use is its own mode and works differently to the base > > language. To print the value of something, just type an expression. > > Doesn't do the same thing. In interactive mode, you are normally interested in the values of things, not their formatting so it does the right thing. If you need particular formatting or interpretation, you can always achieve this. > Do you have any suggestion that's as practically usable > as "print"? The print function proposal is already as usable as the print statement. When I write a print statement, I'd like to be able to redirect that to a log or GUI easily. If print is a function then its interface can be reimplemented but users can't add new statements to Python. Creation of strings containing values could be simplified as that would be applicable in many cases. I actually like being able to append to strings in Java with the second operand being stringified. Perhaps a stringify and catenate operator could be included in Python. Like this: MessageBox("a=" ° a ° "pos=" ° x°","°y) Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] international python
Antoine Pitrou: > As for seamless unicode support, there are also problems sometimes with > filenames and filepaths: see e.g. > https://sourceforge.net/tracker/?func=detail&aid=1283895&group_id=5470&atid=105470 This bug report is using byte string arguments causing byte string processing rather than unicode calls with unicode processing. Windows code that may encounter file paths outside the default locale should stick to unicode for paths. Try converting os.curdir to unicode before calling other functions: os.path.abspath(unicode(os.curdir)) Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] international python
Antoine Pitrou: > I don't have a Windows machine at hand right now to test it, but, even > if this solution works, it breaks the principle of least astonishment: Astonishment is subjective and so a poor tool to measure by. At one stage Ruby tried to follow the more common formulation "principle of least surprise" (POLS) but this produced arguments of the following form: I am surprised by X. Therefore, X contradicts POLS. Therefore, X must be fixed. POLS was then abandoned. > os.path.abspath() should do the Right Thing regardless of what the > current locale is. This was discussed recently and the consensus position was for functions that can not return a value in the default encoding to instead return a unicode value. Correct implementation of this would require not only changing the behaviour of functions returning strings but also those receiving strings (which should treat byte strings as being in the default encoding). This would require a large amount of work, and is unlikely to be performed in the near future. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pythonic concurrency
Bruce Eckel: > I would say that the troublesome meme is that "threads are easy." I > posted an earlier, rather longish message about this. The gist of > which was: "when someone says that threads are easy, I have no idea > what they mean by it." I think you are overcomplicating the issue by looking at too many levels at once. The memory model is something that implementers of threading support need to understand. Users of that threading support just need to know that concurrent access to variables is dangerous and that they should use locks to access shared variables or use other forms of packaged inter-thread communication. Double Checked Locking is an optimization (removal of a lock) of an attempt to better modularize code (by automating the helper object creation). I'd either just leave the lock in or if benchmarking revealed an unacceptable performance problem, allocate the helper object before the resource is accessible to more than one thread. For statics, expose an Init method that gets called when the application is in the initial one user thread state. > But I just finished a 150-page chapter on Concurrency in Java which > took many months to write, based on a large chapter on Concurrency in > C++ which probably took longer to write. I keep in reasonably good > touch with some of the threading experts. I can't get any of them to > say that it's easy, even though they really do understand the issues > and think about it all the time. *Because* of that, they say that it's > hard. Implementing threading is hard. Using threading is not that hard. Its a source of complexity but so are many aspects of development. I get scared by reentrance in UI code. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Guido van Rossum: > Folks, please focus on what Python 3000 should do. > > I'm thinking about making all character strings Unicode (possibly with > different internal representations a la NSString in Apple's Objective > C) and introduce a separate mutable bytes array data type. But I could > use some validation or feedback on this idea from actual > practitioners. I'd like to more tightly define Unicode strings for Python 3000. Currently, Unicode strings may be implemented with either 2 byte (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to contain any Unicode character and should be indexable yielding characters rather than half characters. Therefore Python strings should appear to be UTF-32. There could still be multiple implementations (using UTF-16 or UTF-8) to preserve space but all implementations should appear to be the same apart from speed and memory use. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Martin v. Löwis: > That's very tricky. If you have multiple implementations, you make > usage at the C API difficult. If you make it either UTF-8 or UTF-32, > you make PythonWin difficult. If you make it UTF-16, you make indexing > difficult. For Windows, the code will get a little uglier, needing to perform an allocation/encoding and deallocation more often then at present but I don't think there will be a speed degradation as Windows is currently performing a conversion from 8 bit to UTF-16 inside many system calls. To minimize the cost of allocation, Python could copy Windows in keeping a small number of commonly sized preallocated buffers handy. For indexing UTF-16, a flag could be set to show if the string is all in the base plane and if not, an index could be constructed when and if needed. It'd be good to get some feel for what proportion of string operations performed require indexing. Many, such as startswith, split, and concatenation don't require indexing. The proportion of operations that use indexing to scan strings would also be interesting as adding a (currentIndex, currentOffset) cursor to string objects would be another approach. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com