[Python-Dev] uuid module - byte order issue
The UUID module uses network byte order, regardless of the platform byte order. On little-endian platforms like Windows the ".bytes" property of UUID objects is not compatible with the memory layout of UUIDs: >>> import uuid >>> import pywintypes >>> s = '{00112233-4455-6677-8899-aabbccddeeff}' >>> uuid.UUID(s).bytes.encode('hex') '00112233445566778899aabbccddeeff' >>> str(buffer(pywintypes.IID(s))).encode('hex') '33221100554477668899aabbccddeeff' >>> Ka-Ping Yee writes* that the Windows UUID generation calls are not RFC 4122 compliant and have an illegal version field. If the correct byte order is used the UUIDs generated by Windows XP are valid version 4 UUIDs: >>> parts = struct.unpack('>> parts[2] >> 12# version 4 >>> ord(parts[3][0]) & 0xC0# variant 128 The first three fields (32 bit time-low, 16 bit time-mid and time-high-and-version) are stored in the platform byte order while the remainder is stored as a vector of 8 bytes. The bytes property and bytes argument to the constructor should use the platform byte order. It would be nice to have explicit little endian and big endian versions available on platforms of either endianness for compatibility in communication and disk formats. There is another issue with version 1 uuid generation: >>> len(set(uuid.uuid1() for i in range(1000))) 992 The problem is that the random clock_seq field is only 14 bits long. If enough UUIDs are generated within the same system clock tick there will be collisions. Suggested solution: use the high-resolution of the time field (100ns) to generate a monotonically increasing timestamp that advances at least by 1 for each call, when time.time() returns the same value on subsequent calls. Oren [*] http://mail.python.org/pipermail/python-dev/2006-June/065869.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] uuid module - byte order issue
On 04/08/06, Ka-Ping Yee <[EMAIL PROTECTED]> wrote: > On Thu, 3 Aug 2006, Oren Tirosh wrote: > > The UUID module uses network byte order, regardless of the platform > > byte order. On little-endian platforms like Windows the ".bytes" > > property of UUID objects is not compatible with the memory layout > > RFC 4122 says: > > In the absence of explicit application or presentation protocol > specification to the contrary, a UUID is encoded as a 128-bit > object, as follows: > > The fields are encoded as 16 octets, with the sizes and order of > the fields defined above, and with each field encoded with the > Most Significant Byte first (known as network byte order). RFC 4122 defines a canonical byte order for UUIDs but also makes explicit reference to the fact that UUIDs are stored locally in native byte order. The final step in the RFC 4122 UUID generation algorithm is: > o Convert the resulting UUID to local byte order. So this is not another case of the Microsoft-implements-RFC-incorrectly syndrome. After all, they are one of the co-authors of the document. Compatibility with Windows "GUIDs" may be one of the most important use cases for the UUID module. It's important to resolve this or users will have unpleasant surprises. I did. alternatives: 1. Default is big endian byte order. Little endian is explicit. 2. Default is native byte order. Little endian and big endian are explicit. 3. No default. Little endian and big endian are both explicit. All three are relevant for both the constructor and retrieving the byte representation. Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed alternative to __next__ and __exit__
I suggest using a variation on the consumer interface, as described by Fredrik Lundh at http://effbot.org/zone/consumer.htm : .next() -- stays .next() .__next__(arg) -- becomes .feed(arg) .__exit__(StopIteration, ...) -- becomes .close() .__exit__(..,..,..) -- becomes .feed(exc_info=(..,..,..)) Extensions to effbot's original consumer interface: 1. The .feed() method may return a value 2. Some way to raise an exception other than StopIteration inside the generator/consumer function. The use of a keyword argument to .feed is just an example. I'm looking for other suggestions on this one. No new builtins. No backward-compatibility methods and wrappers. Yes, it would have been nicer if .next() had been called __next__() in the first place. But at this stage I feel that the cost of "fixing" it far outweighs any perceived benefit. so much for "uncontroversial" parts! :-) Oren On 5/6/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: > [Steven Bethard] > > So, just to make sure, if we had another PEP that contained from PEP 340[1]: > > * Specification: the __next__() Method > > * Specification: the next() Built-in Function > > * Specification: a Change to the 'for' Loop > > * Specification: the Extended 'continue' Statement > > * the yield-expression part of Specification: Generator Exit Handling > > would that cover all the pieces you're concerned about? > > > > I'd be willing to break these off into a separate PEP if people think > > it's a good idea. I've seen very few complaints about any of these > > pieces of the proposal. If possible, I'd like to see these things > > approved now, so that the discussion could focus more directly on the > > block-statement issues. > > I don't think it's necessary to separate this out into a separate PEP; > that just seems busy-work. I agree these parts are orthogonal and > uncontroversial; a counter-PEP can suffice by stating that it's not > countering those items nor repeating them. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/oren.tirosh%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP for RFE 46738 (first draft)
Please don't invent new serialization formats. I think we have enough of those already. The RFE suggests that "the protocol is specified in the documentation, precisely enough to write interoperating implementations in other languages". If interoperability with other languages is really the issue, use an existing format like JSON. If you want an efficient binary format you can use a subset of the pickle protocol supporting only basic types. I tried this once. I ripped out all the fancy parts from pickle.py and left only binary pickling (protocol version 2) of basic types. It took less than hour and I was left with something only marginally more complex than your new proposed protocol. Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding Python-Native Threads
On 6/26/05, Adam Olsen <[EMAIL PROTECTED]> wrote: ... > To resolve these problems I propose adding lightweight cooperative > threads to Python. Speaking of lightweight cooperative threads - has anyone recently tried to build Python with the pth option? It doesn't quite work out of the box. How much maintenance would be required to make it work again? Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Some RFE for review
On 6/27/05, Nick Coghlan <[EMAIL PROTECTED]> wrote: > Reinhold Birkenfeld wrote: > > 1152248: > > In order to read "records" separated by something other than newline, file > > objects > > should either support an additional parameter (the separator) to > > (x)readlines(), > > or gain an additional method which does this. > > Review: The former is a no-go, I think, because what is read won't be lines. > > The latter is further complicating the file interface, so I would follow the > > principle that not every 3-line function should be builtin. > > As Douglas Alan's sample implementation (and his second attempt [1]) > show, getting this right (and reasonably efficient) is actually a > non-trivial exercise. Leveraging the existing xreadlines > infrastructure is an idea worth considering. Do you mean the existing xreadlines infrustructure that no longer exists since 2.4 ? :-) An infrastructure that could be leveraged is the readahead buffer used by the file object's line iterator. Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] partition() (was: Remove str.find in 3.0?)
On 30/08/05, JustFillBug <[EMAIL PROTECTED]> wrote: > On 2005-08-30, Anthony Baxter <[EMAIL PROTECTED]> wrote: > > On Tuesday 30 August 2005 11:26, Raymond Hettinger wrote: > >> > My major issue is with the names - partition() doesn't sound right to > >> > me. > >> > >> FWIW, I am VERY happy with the name partition(). > > > > I'm +1 on the functionality, and +1 on the name partition(). The only other > > name that comes to mind is 'separate()', but > > a) I always spell it 'seperate' (and I don't need another lamdba ) > > b) It's too similar in name to 'split()' > > > > trisplit() split3() ? I'm +1 on the name "partition" but I think this is shorter, communicates the similarity to split and the fact that it always returns exactly three parts. Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 3 design principles
Most of the changes in PEP 3000 are tightening up of "There should be one obvious way to do it.": * Remove multiple forms of raising exceptions, leaving just "raise instance" * Remove exec as statement, leaving the compatible tuple/call form. * Remove <>, ``, leaving !=, repr etc. Other changes are to disallow things already considered poor style like: * No assignment to True/False/None * No input() * No access to list comprehension variable And there is also completely new stuff like static type checking. While a lot of existing code will break on 3.0 it is still generally possible to write code that will run on both 2.x and 3.0: use only the "proper" forms above, do not assume the result of zip or range is a list, use absolute imports (and avoid static types, of course). I already write all my new code this way. Is this "common subset" a happy coincidence or a design principle? Not all proposed changes remove redundancy or add completely new things. Some of them just change the way certain things must be done. For example: * Moving compile, id, intern to sys * Replacing print with write/writeln And possibly the biggest change: * Reorganize the standard library to not be as shallow I'm between +0 and -1 on these. I don't find them enough of an improvement to break this "common subset" behavior. It's not quite the same as strict backward compatibility and I find it worthwhile to try to keep it. Writing programs that run on both 2.x and 3 may require ugly version-dependent tricks like: try: compile except NameError: from sys import compile or perhaps try: import urllib except ImportError: from www import urllib Should the "common subset" be a design principle of Python 3? Do compile and id really have to be moved from __builtins__ to sys? Could the rearrangement of the standard library be a bit less aggressive and try to leave commonly used modules in place? Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 design principles
On 9/1/05, Robert Kern <[EMAIL PROTECTED]> wrote: > Oren Tirosh wrote: > > > While a lot of existing code will break on 3.0 it is still generally > > possible to write code that will run on both 2.x and 3.0: use only the > > "proper" forms above, do not assume the result of zip or range is a > > list, use absolute imports (and avoid static types, of course). I > > already write all my new code this way. > > > > Is this "common subset" a happy coincidence or a design principle? > > I think it's because those are the most obvious things right now. The > really radical stuff won't come up until active development on Python > 3000 actually starts. And it will, so any "common subset" will probably > not be very large. Static typing is radical stuff and doesn't hurt the common subset since it's optional. Making unicode the default is pretty radical and can be done without breaking the common subset (with the help of little tweaks like allowing str() to return unicode now like int() can return longs). Iterators and new-style classes were pretty radical changes that were managed elegantly and meet an an even stronger requirement than the common subset - they were achieved with full backward compatibility. Python 3 will most probably make big changes in the internal implementation and the C API. Perhaps it will even be generated from PyPy. I don't think keeping the common subset will really stand in the way of making big improvements. The proposed 3.x changes that break it seem more like nitpicking to me than significant improvements. Python is terrific. I find nothing I really want to change. Remove old cruft and add some brand new stuff, yes. But nothing to change. Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 3 executable name (was: Re: PEP 3000 and iterators)
On 9/11/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: ... > But just installing python3.0 as python and expecting > nothing will break is not a goal -- it would be too constraining. It should be expected that many users will keep both 2.x and 3 side by side for quite a long time. Instead of having distributions choosing their own naming schemes (like the python/python2 redhat fiasco) perhaps the Python 3 executable should have a different name as part of the standard distribution? I suggest "py" / "py.exe" Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 executable name
On 9/12/05, Greg Ewing <[EMAIL PROTECTED]> wrote: > Oren Tirosh wrote: > > > perhaps the Python 3 executable should have a different name as part > > of the standard distribution? I suggest "py" / "py.exe" > > Or "python3"? EIBTI :-) Generally, each distribution makes its own decision about when to make the default "python" the new version. Any damage is usually limited to third party extension modules because python versions are source compatible. But this time it isn't. So do you keep the name "python3" forever? Do you keep unqualified "python" as 2.x forever? I expect many installations to keep 2.x around for many years. How do you keep different distributions from making their own incompatible decisions about naming conventions? Using version numbers in the executable name is just asking for this to happen. I suggest an explicitly and permanently different name for the interpreter executable of this new and incompatible branch of the language. I want Python 3 scripts starting with #! to have an average shelf life longer than 18 months. Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
On 10/28/05, Neil Hodgson <[EMAIL PROTECTED]> wrote: >I used to work on software written by Japanese and English speakers > at Fujitsu with most developers being Japanese. The rules were that > comments could be in Japanese but identifiers were only allowed to > contain ASCII characters. Most variable names were poorly chosen with > s, p, q, fla (boolean=flag) and flafla being popular. When I asked > some Japanese coders why they didn't use Japanese words expressed in > ASCII (Romaji), their response was that it was a really weird idea. > >This is anecdotal but it appears to me that transliterations are > not commonly used apart from learning languages and some minimal help > for foreigners such as including transliterated names on railway > station name boards. Israeli programmers generally use English identifiers but transliterations are common for local business terminology: types of financial instruments, tax and insurance terminology, employee benefit plans etc. Yes, it looks weird, but it would be rather pointless to try to translate them. Even native English speakers would find it difficult to recognize the translations because they are used to using them as loan words. Only transliteration (or possibly the use of non-ASCII identifiers) would make sense in this situation and I do not think it is unique to Israel. BTW, I heard about a Cobol shop that had an explicit policy of using only transliterated identifiers. This resulted in a much smaller chance of hitting one of Cobol's numerous reserved words. Thankfully, this is not an issue in Python... Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
On 10/31/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > > It allows everything in Python to be both mutable and hashable, > > I don't understand, since it's already the case. Any user-defined object > is at the same time mutable and hashable. By default, user-defined objects are equal iff they are the same object, regardless of their content. This makes mutability a non-issue. If you want to allow different objects be equal you need to implement a consistent equality operator (commutative, etc), a consistent hash function and ensure that any attributes affecting equality or hash value are immutable. If you fail to meet any of these requirements and put such objects in dictionaries or sets it will result in undefined behavior that may change between Python versions and implementations. Oren ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com