[Python-Dev] uuid module - byte order issue

2006-08-03 Thread Oren Tirosh
The UUID module uses network byte order, regardless of the platform
byte order. On little-endian platforms like Windows the ".bytes"
property of UUID objects is not compatible with the memory layout of
UUIDs:

>>> import uuid
>>> import pywintypes
>>> s = '{00112233-4455-6677-8899-aabbccddeeff}'
>>> uuid.UUID(s).bytes.encode('hex')
'00112233445566778899aabbccddeeff'
>>> str(buffer(pywintypes.IID(s))).encode('hex')
'33221100554477668899aabbccddeeff'
>>>

Ka-Ping Yee writes* that the Windows UUID generation calls are not RFC
4122 compliant and have an illegal version field. If the correct byte
order is used the UUIDs generated by Windows XP are valid version 4
UUIDs:

>>> parts = struct.unpack('>> parts[2] >> 12# version
4
>>> ord(parts[3][0]) & 0xC0# variant
128

The first three fields (32 bit time-low, 16 bit time-mid and
time-high-and-version) are stored in the platform byte order while the
remainder is stored as a vector of 8 bytes.

The bytes property and bytes argument to the constructor should use
the platform byte order. It would be nice to have explicit little
endian and big endian versions available on platforms of either
endianness for compatibility in communication and disk formats.


There is another issue with version 1 uuid generation:
>>> len(set(uuid.uuid1() for i in range(1000)))
992

The problem is that the random clock_seq field is only 14 bits long.
If enough UUIDs are generated within the same system clock tick there
will be collisions. Suggested solution: use the high-resolution of the
time field (100ns) to generate a monotonically increasing timestamp
that advances at least by 1 for each call, when time.time() returns
the same value on subsequent calls.

  Oren

[*] http://mail.python.org/pipermail/python-dev/2006-June/065869.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] uuid module - byte order issue

2006-08-15 Thread Oren Tirosh
On 04/08/06, Ka-Ping Yee <[EMAIL PROTECTED]> wrote:
> On Thu, 3 Aug 2006, Oren Tirosh wrote:
> > The UUID module uses network byte order, regardless of the platform
> > byte order. On little-endian platforms like Windows the ".bytes"
> > property of UUID objects is not compatible with the memory layout
>
> RFC 4122 says:
>
> In the absence of explicit application or presentation protocol
> specification to the contrary, a UUID is encoded as a 128-bit
> object, as follows:
>
> The fields are encoded as 16 octets, with the sizes and order of
> the fields defined above, and with each field encoded with the
> Most Significant Byte first (known as network byte order).

RFC 4122 defines a canonical byte order for UUIDs but also makes
explicit reference to the fact that UUIDs are stored locally in native
byte order. The final step in the RFC 4122 UUID generation algorithm
is:

>   o  Convert the resulting UUID to local byte order.

So this is not another case of the
Microsoft-implements-RFC-incorrectly syndrome. After all, they are one
of the co-authors of the document.

Compatibility with Windows "GUIDs" may be one of the most important
use cases for the UUID module. It's important to resolve this or users
will have unpleasant surprises. I did.

alternatives:

1. Default is big endian byte order. Little endian is explicit.
2. Default is native byte order. Little endian and big endian are explicit.
3. No default. Little endian and big endian are both explicit.

All three are relevant for both the constructor and retrieving the
byte representation.

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed alternative to __next__ and __exit__

2005-05-07 Thread Oren Tirosh
I suggest using a variation on the consumer interface, as described by
Fredrik Lundh at http://effbot.org/zone/consumer.htm :

.next() -- stays .next()
.__next__(arg) --  becomes .feed(arg)
.__exit__(StopIteration, ...) -- becomes .close()
.__exit__(..,..,..) -- becomes .feed(exc_info=(..,..,..))   

Extensions to effbot's original consumer interface:
1. The .feed() method may return a value 
2. Some way to raise an exception other than StopIteration inside the
generator/consumer function.  The use of a keyword argument to .feed
is just an example. I'm looking for other suggestions on this one.

No new builtins. No backward-compatibility methods and wrappers.

Yes, it would have been nicer if .next() had been called __next__() in
the first place. But at this stage I feel that the cost of "fixing" it
far outweighs any perceived benefit.

so much for "uncontroversial" parts!  :-)

  Oren


On 5/6/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> [Steven Bethard]
> > So, just to make sure, if we had another PEP that contained from PEP 340[1]:
> >  * Specification: the __next__() Method
> >  * Specification: the next() Built-in Function
> >  * Specification: a Change to the 'for' Loop
> >  * Specification: the Extended 'continue' Statement
> >  * the yield-expression part of Specification: Generator Exit Handling
> > would that cover all the pieces you're concerned about?
> >
> > I'd be willing to break these off into a separate PEP if people think
> > it's a good idea.  I've seen very few complaints about any of these
> > pieces of the proposal.  If possible, I'd like to see these things
> > approved now, so that the discussion could focus more directly on the
> > block-statement issues.
> 
> I don't think it's necessary to separate this out into a separate PEP;
> that just seems busy-work. I agree these parts are orthogonal and
> uncontroversial; a counter-PEP can suffice by stating that it's not
> countering those items nor repeating them.
> 
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/oren.tirosh%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP for RFE 46738 (first draft)

2005-06-18 Thread Oren Tirosh
Please don't invent new serialization formats. I think we have enough
of those already.

The RFE suggests that "the protocol is specified in the documentation,
precisely enough to write interoperating implementations in other
languages". If interoperability with other languages is really the
issue, use an existing format like JSON.

If you want an efficient binary format you can use a subset of the
pickle protocol supporting only basic types. I tried this once. I
ripped out all the fancy parts from pickle.py and left only binary
pickling (protocol version 2) of basic types. It took less than hour
and I was left with something only marginally more complex than your
new proposed protocol.

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding Python-Native Threads

2005-06-26 Thread Oren Tirosh
On 6/26/05, Adam Olsen <[EMAIL PROTECTED]> wrote:
...
> To resolve these problems I propose adding lightweight cooperative
> threads to Python.  

Speaking of lightweight cooperative threads - has anyone recently
tried to build Python with the pth option? It doesn't quite work out
of the box. How much maintenance would be required to make it work
again?

Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Some RFE for review

2005-06-27 Thread Oren Tirosh
On 6/27/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Reinhold Birkenfeld wrote:
> > 1152248:
> > In order to read "records" separated by something other than newline, file 
> > objects
> > should either support an additional parameter (the separator) to 
> > (x)readlines(),
> > or gain an additional method which does this.
> > Review: The former is a no-go, I think, because what is read won't be lines.
> > The latter is further complicating the file interface, so I would follow the
> > principle that not every 3-line function should be builtin.
> 
> As Douglas Alan's sample implementation (and his second attempt [1])
> show, getting this right (and reasonably efficient) is actually a
> non-trivial exercise. Leveraging the existing xreadlines
> infrastructure is an idea worth considering.

Do you mean the existing xreadlines infrustructure that no longer
exists since 2.4 ?  :-)

An infrastructure that could be leveraged is the readahead buffer used
by the file object's line iterator.

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] partition() (was: Remove str.find in 3.0?)

2005-08-30 Thread Oren Tirosh
On 30/08/05, JustFillBug <[EMAIL PROTECTED]> wrote:
> On 2005-08-30, Anthony Baxter <[EMAIL PROTECTED]> wrote:
> > On Tuesday 30 August 2005 11:26, Raymond Hettinger wrote:
> >> > My major issue is with the names - partition() doesn't sound right to
> >> > me.
> >>
> >> FWIW, I am VERY happy with the name partition().
> >
> > I'm +1 on the functionality, and +1 on the name partition(). The only other
> > name that comes to mind is 'separate()', but
> > a) I always spell it 'seperate' (and I don't need another lamdba )
> > b) It's too similar in name to 'split()'
> >
>
> trisplit()

split3() ?

I'm +1 on the name "partition" but I think this is shorter,
communicates the similarity to split and the fact that it always
returns exactly three parts.

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python 3 design principles

2005-08-31 Thread Oren Tirosh
Most of the changes in PEP 3000 are tightening up of  "There should be
one obvious way to do it.":
* Remove multiple forms of raising exceptions, leaving just "raise instance" 
* Remove exec as statement, leaving the compatible tuple/call form.
* Remove <>, ``, leaving !=, repr
etc.

Other changes are to disallow things already considered poor style like:
* No assignment to True/False/None 
* No input() 
* No access to list comprehension variable 

And there is also completely new stuff like static type checking.

While a lot of existing code will break on 3.0 it is still generally
possible to write code that will run on both 2.x and 3.0: use only the
"proper" forms above, do not assume the result of zip or range is a
list, use absolute imports (and avoid static types, of course). I
already write all my new code this way.

Is this "common subset" a happy coincidence or a design principle? 

Not all proposed changes remove redundancy or add completely new
things. Some of them just change the way certain things must be done.
For example:
*  Moving compile, id, intern to sys
*  Replacing print with write/writeln
And possibly the biggest change:
*  Reorganize the standard library to not be as shallow

I'm between +0 and -1 on these. I don't find them enough of an
improvement to break this "common subset" behavior. It's not quite the
same as strict backward compatibility and I find it worthwhile to try
to keep it.

Writing programs that run on both 2.x and 3 may require ugly
version-dependent tricks like:

try:
compile
except NameError:
from sys import compile

or perhaps

try:
import urllib
except ImportError:
from www import urllib

Should the "common subset" be a design principle of Python 3? Do
compile and id really have to be moved from __builtins__ to sys? Could
the rearrangement of the standard library be a bit less aggressive and
try to leave commonly used modules in place?

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 design principles

2005-08-31 Thread Oren Tirosh
On 9/1/05, Robert Kern <[EMAIL PROTECTED]> wrote:
> Oren Tirosh wrote:
> 
> > While a lot of existing code will break on 3.0 it is still generally
> > possible to write code that will run on both 2.x and 3.0: use only the
> > "proper" forms above, do not assume the result of zip or range is a
> > list, use absolute imports (and avoid static types, of course). I
> > already write all my new code this way.
> >
> > Is this "common subset" a happy coincidence or a design principle?
> 
> I think it's because those are the most obvious things right now. The
> really radical stuff won't come up until active development on Python
> 3000 actually starts. And it will, so any "common subset" will probably
> not be very large.

Static typing is radical stuff and doesn't hurt the common subset
since it's optional. Making unicode the default is pretty radical and
can be done without breaking the common subset (with the help of
little tweaks like allowing str() to return unicode now like int() can
return longs). Iterators and new-style classes were pretty radical
changes that were managed elegantly and meet an an even stronger
requirement than the common subset - they were achieved with full
backward compatibility.

Python 3 will most probably make big changes in the internal
implementation and the C API. Perhaps it will even be generated from
PyPy.

I don't think keeping the common subset will really stand in the way
of making big improvements. The proposed 3.x changes that break it
seem more like nitpicking to me than significant improvements.

Python is terrific. I find nothing I really want to change. Remove old
cruft and add some brand new stuff, yes. But nothing to change.

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python 3 executable name (was: Re: PEP 3000 and iterators)

2005-09-11 Thread Oren Tirosh
On 9/11/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
...
> But just installing python3.0 as python and expecting
> nothing will break is not a goal -- it would be too constraining.

It should be expected that many users will keep both 2.x and 3 side by
side for quite a long time. Instead of having distributions choosing
their own naming schemes (like the python/python2 redhat fiasco)
perhaps the Python 3 executable should have a different name as part
of the standard distribution? I suggest "py" / "py.exe"

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3 executable name

2005-09-11 Thread Oren Tirosh
On 9/12/05, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Oren Tirosh wrote:
> 
> > perhaps the Python 3 executable should have a different name as part
> > of the standard distribution? I suggest "py" / "py.exe"
> 
> Or "python3"? EIBTI :-)

Generally, each distribution makes its own decision about when to make
the default "python" the new version. Any damage is usually limited to
third party extension modules because python versions are source
compatible. But this time it isn't. So do you keep the name "python3"
forever? Do you keep unqualified "python" as 2.x forever? I expect
many installations to keep 2.x around for many years. How do you keep
different distributions from making their own incompatible decisions
about naming conventions? Using version numbers in the executable name
is just asking for this to happen.

I suggest an explicitly and permanently different name for the
interpreter executable of this new and incompatible branch of the
language. I want Python 3 scripts starting with #! to have an average
shelf life longer than 18 months.

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-28 Thread Oren Tirosh
On 10/28/05, Neil Hodgson <[EMAIL PROTECTED]> wrote:
>I used to work on software written by Japanese and English speakers
> at Fujitsu with most developers being Japanese. The rules were that
> comments could be in Japanese but identifiers were only allowed to
> contain ASCII characters. Most variable names were poorly chosen with
> s, p, q, fla (boolean=flag) and flafla being popular. When I asked
> some Japanese coders why they didn't use Japanese words expressed in
> ASCII (Romaji), their response was that it was a really weird idea.
>
>This is anecdotal but it appears to me that transliterations are
> not commonly used apart from learning languages and some minimal help
> for foreigners such as including transliterated names on railway
> station name boards.

Israeli programmers generally use English identifiers but
transliterations are common for local business terminology: types of
financial instruments, tax and insurance terminology, employee benefit
plans etc. Yes, it looks weird, but it would be rather pointless to
try to translate them. Even native English speakers would find it
difficult to recognize the translations because they are used to using
them as loan words. Only transliteration (or possibly the use of
non-ASCII identifiers) would make sense in this situation and I do not
think it is unique to Israel.

BTW, I heard about a Cobol shop that had an explicit policy of using
only transliterated identifiers. This resulted in a much smaller
chance of hitting one of Cobol's numerous reserved words. Thankfully,
this is not an issue in Python...

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-31 Thread Oren Tirosh
On 10/31/05, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
> > It allows everything in Python to be both mutable and hashable,
>
> I don't understand, since it's already the case. Any user-defined object
> is at the same time mutable and hashable.

By default, user-defined objects are equal iff they are the same
object, regardless of their content. This makes mutability a
non-issue.

If you want to allow different objects be equal you need to implement
a consistent equality operator (commutative, etc), a consistent hash
function and ensure that any attributes affecting equality or hash
value are immutable. If you fail to meet any of these requirements and
put such objects in dictionaries or sets it will result in undefined
behavior that may change between Python versions and implementations.

  Oren
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com