[Python-Dev] ANN: byteplay - a bytecode assembler/disassembler

2006-08-14 Thread Noam Raphael
Hello,

I wanted to tell you that I wrote a Python bytecode
assembler/disassembler, and would be happy if people tried it and said
what they think.

I send this message to this list because this module deals with pretty
low-level Python, so I thought it might interest the audience here. If
I was wrong - please forgive me.

The idea is to define an object which is equivalent to Python's code
object, but which is easy to work with. To explain what I mean, I'll
show a quick example. We can define this stupid function:

>>> def f(a, b):
... print (a, b)

We can convert it to an equivalent object, and see how it stores the byte code:

>>> from byteplay import *
>>> c = Code.from_code(f.func_code)
>>> from pprint import pprint; pprint(c.code)
[(SetLineno, 2),
 (LOAD_FAST, 'a'),
 (LOAD_FAST, 'b'),
 (BUILD_TUPLE, 2),
 (PRINT_ITEM, None),
 (PRINT_NEWLINE, None),
 (LOAD_CONST, None),
 (RETURN_VALUE, None)]

We can change the bytecode easily, and see what happens:

>>> c.code[3:3] = [(ROT_TWO, None)]
>>> f.func_code = c.to_code()
>>> f(3, 5)
(5, 3)

The idea is basically the same as that of Michael Hudson's
bytecodehacks, but this one works with Python 2.4 and 2.5. I also
think that it's simpler to use. I borrowed some code from Phillip J.
Eby's peak.util.assembler - the main difference between his package
and mine is that mine lets you play with existing bytecode, not only
create new code objects.

I learned a lot about Python's bytecode from writing this, and I think
that other may learn from it as well - I think it's much easier to
understand how code objects work by understanding equivalent objects
which were meant to be as simple as possible, instead of as fast as
possible.

I think it got pretty good testing - I patched __import__ so that
after a module is imported, all function objects (found by the gc
module) were disassembled and assembled again. I then managed to get
the complete test suite to pass!

You can download the module from
http://byteplay.googlecode.com/svn/trunk/byteplay.py . I wrote a
documentation (which goes into some length in purpose of explaining
how bytecode works). It's on the wiki:
http://wiki.python.org/moin/ByteplayDoc .

I even thought that it might get to the standard library, because it
seemed to me to be a pretty good documentation of bytecode details,
and because it has a pretty straightforward interface. It also makes
meddling with bytecode much less dangerous - for example, you can see
Raymond's original recipe for binding constants at compile time (I
reposted it at http://python.pastebin.com/768312) and how using
byteplay makes it simple (posted at http://python.pastebin.com/768318
so you can view the diff). But, of course - it's up to you. I will be
entirely satisfied if people simply find it useful.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can LOAD_GLOBAL be optimized to a simple array lookup?

2006-08-23 Thread Noam Raphael
2006/8/24, Brett Cannon <[EMAIL PROTECTED]>:
>
>
>
> On 8/23/06, K.S.Sreeram <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > I noticed in Python/ceval.c that LOAD_GLOBAL uses a dictionary lookup,
> > and was wondering if that can be optimized to a simple array lookup.
>
>
> No, not as the language  stands now.
>
> > If i'm right there are 3 kinds of name lookups: locals, outer
> > scopes(closures), and globals. (not counting attribute lookup). Locals
> > are identified by, either the presence of assignments, or their presence
> > in the arg list. So all name lookups can be classified into the 3 types
> > at compile/load time.
> >
> > Since we know, at load time, which names are global.. Can't we simply
> > build a global name table and replace LOAD_GLOBALs with a lookup at the
> > corresponding index into the global name table?
>
>
> But we don't know statically what the globals will be.  You can import a
> module and put something in its global namespace externally.  That is done
> after load time or compile time.
>
I think that it can be implemented for the language as it stands now.
I don't know whether it will be a good thing or not.

In principle, you can add a feature to dict implementation that will
allow it to notify when the value of a specific key changes. If you
have that, you can change LOAD_GLOBAL implementation to:
1. look for the global.
2. ask for notification when the global dict changes in a way that
will change the meaning of the global.
3. change the LOAD_GLOBAL opcode to something like LOAD_CONST, and set
the notification from the dict to update the LOAD_CONST opcode to the
new object.

In that way, LOAD_GLOBAL will cause a dict lookup only once. Changing
the value of globals will require more work, though.

Again, I'm not saying that it's desired, just that it's possible.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] An import hook which does nothing

2004-12-02 Thread Noam Raphael
Hello,
I'm currently writing an import hook which will cache information on the 
local disk, instead of retrieving it every time from the slow NFS where 
I work.
To make sure that I understand how import hooks work, and as a starting 
point, I wrote a dummy import hook, which is supposed to behave like the 
python's built-in import mechanism.
I post it here, for two reasons:
1. So that developers who understand the import mechanism better than I 
do may review it and find things which I didn't do exactly right.
2. Because I think other people might find it helpful when writing new 
import hooks, as a starting point as well as for better understanding -- 
there's nothing like a working example to help you settle up on what 
does which where. (Although perhaps a few more comments, in addition to 
those which I wrote, might help...)

(Note: I wrote the "DEBUG" parts in order to make sure that my importer 
works, because if it fails things might be done by the built-in importer 
and I won't notice. These parts can be removed, of course.)

Do you think that it might be useful? Maybe something like that can go 
into the "examples" section of the imp module?

Thanks,
Noam Raphael
import imp
import sys

DEBUG = True

if DEBUG:
myimports = []


class InPackageFinder(object):
"""Find a module/package in a package."""

def find_module(self, fullname, path=None):
if path is None:
# Not in a package - don't handle it here.
return None

try:
f, fn, desc = imp.find_module(fullname[fullname.rfind('.')+1:],
  path)
except ImportError:
# Well, I don't find it, maybe others will.
return None

return Loader(f, fn, desc)


class TopLevelFinder(object):
"""Find a top level module/package."""

def __init__(self, path):
self._path = path

def find_module(self, fullname):
try:
f, fn, desc = imp.find_module(fullname, [self._path])
except ImportError:
# It is not in this path. Maybe in another one.
return None

return Loader(f, fn, desc)


class Loader(object):
"""Load a module/package."""

def __init__(self, f, fn, desc):
self._f, self._fn, self._desc = f, fn, desc

def load_module(self, fullname):
if DEBUG:
myimports.append(fullname)

try:
return imp.load_module(fullname, self._f, self._fn, self._desc)
finally:
if self._f is not None:
# For packages we have None instead of a file object.
self._f.close()


def install():
sys.meta_path.append(InPackageFinder())
sys.path_hooks.append(TopLevelFinder)
sys.path_importer_cache.clear()

if DEBUG:
myimports.extend(sys.modules.iterkeys())

if DEBUG:
def checkok():
return not [x for x in sys.modules if
sys.modules[x] is not None
and hasattr(sys.modules[x], "__file__")
and x not in myimports]
___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-20 Thread Noam Raphael
Hello,

I would like to add here another small thing which I encountered this
week, and seems to follow the same logic as does Guido's proposal.

It's about staticmethods. I was writing a class, and its
pretty-printing method got a function for converting a value to a
string as an argument. I wanted to supply a default function. I
thought that it should be in the namespace of the class, since its
main use lies there. So I made it a staticmethod.

But - alas! After I declared the function a staticmethod, I couldn't
make it a default argument for the method, since there's nothing to do
with staticmethod instances.

The minor solution for this is to make staticmethod objects callable.
This would solve my problem. But I suggest a further step: I suggest
that if this is done, it would be nice if classname.staticmethodname
would return the classmethod instance, instead of the function itself.
I know that this things seems to contradict Guido's proposal, since he
suggests to return the function instead of a strange object, and I
suggest to return a strange object instead of a function. But this is
not true; Both are according to the idea that class attributes should
be, when possible, the same objects that were created when defining
the class. This is more consistent with the behaviour of modules
(module attributes are the objects that were created when the code was
run), and is more consistent with the general convention, that running
A = B
causes
A == B
to be true. Currently, Class.func = staticmethod(func), and Class.func
= func, don't behave by this rule. If the suggestions are accepted,
both will.

I just think it's simpler and cleaner that way. Just making
staticmethods callable would solve my practical problem too.

Noam Raphael
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-20 Thread Noam Raphael
> > and is more consistent with the general convention, that running
> > A = B
> > causes
> > A == B
> > to be true. Currently, Class.func = staticmethod(func), and Class.func
> > = func, don't behave by this rule. If the suggestions are accepted,
> > both will.
> 
> Well, given that attribute assignment can be overloaded, you can't
> depend on that requirement all the time.
> 
Yes, I know. For example, I don't know how you can make this work for
classmethods. (although I have the idea that if nested scopes were
including classes, and there was a way to assign names to a different
scope, then there would be no need for them. But I have no idea how
this can be done, so never mind.)

I just think of it as a very common convention, and I don't find the
exceptions "aesthetically pleasing". But of course, I accept practical
reasons for not making it that way.

> I recommend that you work around it by setting the default to None and
> substituting the real default in the function.

That's a good idea, I will probably use it. (I thought of a different
way: don't use decorators, and wrap the function in a staticmethod
after defining the function that uses it. But this is really ugly.)

Thanks for your reply,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A proposal: built in support for abstract methods

2005-02-02 Thread Noam Raphael
Hello,

I would like to suggest a new method decorator: abstractmethod. I'm
definitely not the only one who've thought about it, but I discussed
this on c.l.py, and came to think that it's a nice idea. An even
Pythonic!

This has nothing to do with type checking and adaptation - or, to be
more precise, it may be combined with them, but it will live happily
without them. I don't understand these issues a great deal.

What was my situations? I had to write a few classes, all with the
same interface but with a different implementation, that were meant to
work inside some infrastructure. The specific class that would be used
would be selected by what exactly the user wanted. Some methods of
these classes were exactly the same in all of the classes, so
naturally, I wrote a base class with an implementation of these
methods.

But then came the question: and what about the other methods? I wanted
to document that they should exist in all the classes of that family,
and that they should do XYZ; otherwise, they won't fit the
infrastructure. So I wrote something like:

def get_changed(self):
"""This method should return the changed keys since last call."""
raise NotImplementedError

But I wasn't happy about it. I thought that

@abstractmethod
def get_changed(self):
"""This methods should ..."""

would have been nicer. Why?

1. "Beautiful is better than ugly." - Who was talking here about
errors? I just wanted to say what the method should do!

2. "Explicit is better than implicit." - This is really the issue. I
*meant* to declare that a method should be implemented in subclasses,
and what it should do, but I *was* actually defining a method which
raises NotImplementedError when called with no arguments. I am used to
understanding NotImplementedError as "We should really implement this
some day, when we have the time", not as "In order to be a proud
subclass of BaseClass, you should implement this method".

3. "There should be one-- and preferably only one --obvious way to do
it." - I could have written this in a few other ways:

def get_changed(self):
"""This method should return the changed keys since last call.

PURE VIRTUAL.
"""

def get_changed(self):
"""This method should return the changed keys since last call."""
raise NotImplementedError, "get_changed is an abstract methods.
Subclasses of BaseClass should implement it."

What's good about the last example is that when the exception occurs,
it would be easier to find the problem. What's bad about it, is that
it's completely redundent, and very long to write.

Ok. Now another thing: I want classes that contain abstractmethods be
uninstantiable. One (and the main) reason is that instantiating that
class of mine doesn't make sense. It doesn't know how to do anything
useful, and doesn't represent any consistent object that you can have
instances of. The other reason is that it will help the programmer to
find out quickly methods he forgot to implement in his subclasses. You
may say that it suits "Errors should never pass silently."

The basic reason why I think this is fitting is that abstract classes
are something which is natural when creating class hierarchies;
usually, when I write a method, all subclasses must inherit it, or
implement another version with a compatible behaviour. Sometimes there
is no standard behaviour, so all subclasses must choose the second
option.

This concept is already in use in Python's standard library today!
"basestring" was created as the base class of "str" and "unicode".
What I'm proposing is just to make this possible also in code written
in Python.

George Sakkis has posted a very nice Python implementation of this:
http://groups-beta.google.com/group/comp.lang.python/msg/597e9ffa7b1f709b

To summarize, I think that abstract methods are simply not regular
functions, since by definition they don't specify actions, and so they
deserves an object of their own. And if it helps with testing the
subclasses - then why not?


What do you say?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340 -- loose ends

2005-05-04 Thread Noam Raphael
On 5/4/05, Reinhold Birkenfeld <[EMAIL PROTECTED]> wrote:
> 
> There is one problem with using no keyword: You cannot use arbitrary 
> expressions
> in the new statement. Consider:
> 
> resource = opening("file.txt")
> block resource:
> (...)
> 
> resource = opening("file.txt")
> resource:
> (...)
> 
> The latter would have to be forbidden.

Can you explain why it would have to be forbidden please?

Thanks,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Split MIME headers into multiple lines near a space

2005-05-28 Thread Noam Raphael
Hello,

I recently used Python to automatically send messages to my gmail
account. I was surprised to find out that some of the words in the
subjects of messages were split by a space character which came from
nowhere.

It turns out that the international (Hebrew) subject was split into
multiple lines by the email package, sometimes in the middle of words.
Gmail treats these line breaks as spaces, so words gets cut into two.
I've checked, and there are email clients which ignore the line
breaks, so the subject looks ok.

I added four lines to the _binsplit function of email.Header, so that
if there is a space character in the string, it will be splitted
there. This fixes the problem, and subjects look fine again. These
four lines (plus a comment which I wrote) are:

# Try to find a place in splittable[:i] which is near a space,
# and split there, so that clients which interpret the line break
# as a separator won't insert a space in the middle of a word.
if splittable[i:i+1] != ' ':
spacepos = splittable.rfind(' ', 0, i)
if spacepos != -1:
i = spacepos + 1

These lines should be added before the last three lines of _binsplit.

Do you think it's ok? Could this be added to email.Header?

(Should I send this as a patch? It's just that the patch list was full
of IDLE patches, and this change is really small, so I thought that it
would be easier to post it here. Please tell me if I was wrong.)

Thank you,
Noam Raphael
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split MIME headers into multiple lines near a space

2005-06-05 Thread Noam Raphael
On 5/30/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Noam's suggestion seems reasonable to me, but I'm not
> sure what the performance implications are.

I think that they are not critical. The number of lines can grow by at
most twice, because shorter words would not have a line of their own.
The added rfind call seems not very significant to me, since it is
preceded by about log2n string encodings, to test if an encoded prefix
fits the required line length.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Saving the hash value of tuples

2006-04-01 Thread Noam Raphael
Hello,

I've found out that the hash value of tuples isn't saved after it's
calculated. With strings it's different: the hash value of a string is
calculated only on the first call to hash(string), and saved in the
structure for future use. Saving the value makes dict lookup of tuples
an operation with an amortized cost of O(1).

Saving the hash value means that if an item of the tuple changes its
hash value, the hash value of the tuple won't be changed. I think it's
ok, since:
  1. Hash value of things shouldn't change.
  2. Dicts assume that too.

I tried the change, and it turned out that I had to change cPickle a
tiny bit: it uses a 2-tuple which is allocated when the module
initializes to lookup tuples in a dict. I changed it to properly use
PyTuple_New and Py_DECREF, and now the complete test suite passes. I
run test_cpickle before the change and after it, and it took the same
time (0.89 seconds on my computer).

What do you think? I see three possibilities:
  1. Nothing should be done, everything is as it should be.
  2. The cPickle module should be changed to not abuse the tuple, but
there's no reason to add an extra word to the tuple structure and
break binary backwards compatibility.
  3. Both should be changed.

I will be happy to send a patch, if someone shows interest.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Saving the hash value of tuples

2006-04-01 Thread Noam Raphael
Ok, I uploaded it.
Patch no. 1462796:
https://sourceforge.net/tracker/index.php?func=detail&aid=1462796&group_id=5470&atid=305470


On 4/1/06, Aahz <[EMAIL PROTECTED]> wrote:
> On Sat, Apr 01, 2006, Noam Raphael wrote:
> >
> > I've found out that the hash value of tuples isn't saved after it's
> > calculated. With strings it's different: the hash value of a string is
> > calculated only on the first call to hash(string), and saved in the
> > structure for future use. Saving the value makes dict lookup of tuples
> > an operation with an amortized cost of O(1).
> >  [...]
> > I will be happy to send a patch, if someone shows interest.
>
> Regardless of whether anyone shows interest, please submit a patch!  Then
> post the URL back here.  That way if someone gets interested in the
> future, your code is still available.
> --
> Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/
>
> "Look, it's your affair if you want to play with five people, but don't
> go calling it doubles."  --John Cleese anticipates Usenet
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Saving the hash value of tuples

2006-04-02 Thread Noam Raphael
On 4/2/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > I tried the change, and it turned out that I had to change cPickle a
> > tiny bit: it uses a 2-tuple which is allocated when the module
> > initializes to lookup tuples in a dict. I changed it to properly use
> > PyTuple_New and Py_DECREF, and now the complete test suite passes. I
> > run test_cpickle before the change and after it, and it took the same
> > time (0.89 seconds on my computer).
>
> Not just cPickle. I believe enumerate() also reuses a tuple.

Maybe it does, but I believe that it doesn't calculate the hash value
of it - otherwise, the test suite would probably have failed.
>
> > What do you think? I see three possibilities:
> >   1. Nothing should be done, everything is as it should be.
> >   2. The cPickle module should be changed to not abuse the tuple, but
> > there's no reason to add an extra word to the tuple structure and
> > break binary backwards compatibility.
> >   3. Both should be changed.
>
> I'm -1 on the change. Tuples are pretty fundamental in Python and
> hashing them is relatively rare. I think the extra required space for
> all tuples isn't worth the potential savings for some cases.

That's fine with me. But what about option 2? Perhaps cPickle (and
maybe enumerate) should properly discard their tuples, so that if
someone in the future decides that saving the hash value is a good
idea, he won't encounter strange bugs? At least in cPickle I didn't
notice any loss of speed because of the change, and it's quite
sensible, since there's a tuple-reuse mechanism anyway.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Alternative path suggestion

2006-05-02 Thread Noam Raphael
Hello,

I saw the discussion about including the path type in the standard
library. As it turned out, I recently wrote a program which does quite
a lot of path manipulation. This caused me to think that the proposed
path module:

 * Makes path manipulation significantly easier
 * Can be improved.

So I tried to write my version of it. My basic problem with the
current proposed path module is that it's a bit... messy. It contains
a lot of methods, collected from various modules, and for me it looks
too crowded - there are too many methods and too many details for me
to easily learn.

So I tried to organize it all. I think that the result may make file
and path manipulation really easier.

Here are my ideas. It's a copy of what I posted a few minutes ago in
the wiki - you can view it at
http://wiki.python.org/moin/AlternativePathClass (it looks better
there).

You can find the implementation at
http://wiki.python.org/moin/AlternativePathModule?action=raw
(By the way, is there some "code wiki" available? It can simply be a
public svn repository. I think it will be useful for those things.)

All these are ideas - I would like to hear what you think about them.

= Major Changes =

== a tuple instead of a string ==

The biggest conceptual change is that my path object is a subclass of
''tuple'', not a subclass of str. For example,
{{{
>>> tuple(path('a/b/c'))
('a', 'b', 'c')
>>> tuple(path('/a/b/c'))
(path.ROOT, 'a', 'b', 'c')
}}}

This means that path objects aren't the string representation of a
path; they are a ''logical'' representation of a path. Remember why a
filesystem path is called a path - because it's a way to get from one
place on the filesystem to another. Paths can be relative, which means
that they don't define from where to start the walk, and can be not
relative, which means that they do. In the tuple representation,
relative paths are simply tuples of strings, and not relative paths
are tuples of strings with a first "root" element.

The advantage of using a logical representation is that you can forget
about the textual representation, which can be really complex. You
don't have to call normpath when you're unsure about how a path looks,
you don't have to search for seps and altseps, and... you don't need
to remember a lot of names of functions or methods. To show that, take
a look at those methods from the original path class and their
equivalent in my path class:

{{{
p.normpath()  -> Isn't needed - done by the constructor
p.basename()  -> p[-1]
p.splitpath() -> (p[:-1], p[-1])
p.splitunc()  -> (p[0], p[1:]) (if isinstance(p[0], path.UNCRoot))
p.splitall()  -> Isn't needed
p.parent  -> p[:-1]
p.name-> p[-1]
p.drive   -> p[0] (if isinstance(p[0], path.Drive))
p.uncshare-> p[0] (if isinstance(p[0], path.UNCRoot))

and of course:
p.join(q) [or anything like it] -> p + q
}}}

The only drawback I can see in using a logical representation is that
giving a path object to functions which expect a path string won't
work. The immediate solution is to simply use str(p) instead of p. The
long-term solution is to make all related functions accept a path
object.

Having a logical representation of a path calls for a bit of term
clearing-up. What's an absolute path? On POSIX, it's very simple: a
path starting with a '/'. But what about Windows? Is "\temp\file" an
absolute path? I claim that it isn't really. The reason is that if you
change the current working directory, its meaning changes: It's now
not "c:\temp\file", but "a:\temp\file". The same goes for
"c:temp\file". So I decided on these two definitions:

 * A ''relative path'' is a path without a root element, so it can be
concatenated to other paths.
 * An ''absolute path'' is a path whose meaning doesn't change when
the current working directory changes.

This means that paths starting with a drive letter alone
(!UnrootedDrive instance, in my module) and paths starting with a
backslash alone (the CURROOT object, in my module) are not relative
and not absolute.

I really think that it's a better way to handle paths. If you want an
example, compare the current implementation of relpathto and my
implementation.

== Easier attributes for stat objects ==

The current path objects includes:
 * isdir, isfile, islink, and -
 * atime, mtime, ctime, size.
The first line does file mode checking, and the second simply gives
attributes from the stat object.

I suggest that these should be added to the stat_result object. isdir,
isfile and islink are true if a specific bit in st_mode is set, and
atime, mtime, ctime and size are simply other names for st_atime,
st_mtime, st_ctime and st_size.

It means that instead of using the atime, mtime etc. methods, you will
write {{{ p.stat().atime }}}, {{{ p.stat().size }}}, etc.

This is good, because:
 * If you want to make only one system call, it's very easy to save
the stat object and use it.
 * If you have to deal with symbolic links, you can simply use {{{
p.lstat().mtime }}}. Yes, symbolic link

Re: [Python-Dev] Alternative path suggestion

2006-05-06 Thread Noam Raphael
Hello all,

I just wanted to say thanks for your encouraging comments and
participation, and to say that I'm sorry that I haven't replied yet -
unfortunately, I don't have an Internet connection where I stay most
evenings. I now read all your replies, but I want to reply seriously
and the time is getting late, so I hope I'll be able to reply
tomorrow.

Anyway, don't make my poor participation stop you from developing the
idea. (I know it doesn't, I just hope you forgive me.)

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Alternative path suggestion

2006-05-07 Thread Noam Raphael
Hello all again!

Thanks to Mike's suggestion, I now opened a new wiki page,
AlternativePathDiscussion, in
http://wiki.python.org/moin/AlternativePathDiscussion

The idea is to organize the discussion by dividing it into multiple
sections, and seeing what is agreed and what should be further
discussed. I now wrote there what I think, and quoted a few opinions
from other posts. The posts by others are only a minority - what's
written there currently is mostly what I think. I'm sorry for the
inconvinience, but please go there and post your opinions (you can
copy and paste from your emails, of course).

I apologize first for not replying to each post, and second for only
writing my opinions in the wiki. I simply write pretty slowly in
English, and am already developing a growing sleep-hours deficit. I
hope you forgive me.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-09 Thread Noam Raphael
Hello,

Recently I discovered that a small change to the Python grammar that
could help me a lot.

It's simply this: Currently, the expression "x[]" is a syntax error. I
suggest that it will be a valid syntax, and equivalent to "x[()]",
just as "x[a, b]" is equivalent to "x[(a, b)]" right now.

I discussed this in python-list, and Fredrik Lundh suggested that I
quickly write a pre-PEP if I want this to go into 2.5. Since I want
this, I wrote a pre-PEP.

It's available in the wiki, at
http://wiki.python.org/moin/EmptySubscriptListPEP and I also copied it
to this message.

I know that now is really close to 2.5b1, but I thought that perhaps
there was still a chance for this suggestion getting in, since:
 * It's a simple change and there's almost nothing to be decided
except whether to accept it or not.
 * It has a simple implementation (It was fairly easy for me to
implement it, and I know almost nothing about the AST).
 * It causes no backwards compatibilities issues.

Ok, here's the pre-PEP. Please say what you think about it.

Have a good day,
Noam


PEP: XXX
Title: Allow Empty Subscript List Without Parentheses
Version: $Revision$
Last-Modified: $Date$
Author: Noam Raphael <[EMAIL PROTECTED]>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Jun-2006
Python-Version: 2.5?
Post-History: 30-Aug-2002

Abstract


This PEP suggests to allow the use of an empty subscript list, for
example ``x[]``, which is currently a syntax error. It is suggested
that in such a case, an empty tuple will be passed as an argument to
the __getitem__ and __setitem__ methods. This is consistent with the
current behaviour of passing a tuple with n elements to those methods
when a subscript list of length n is used, if it includes a comma.


Specification
=

The Python grammar specifies that inside the square brackets trailing
an expression, a list of "subscripts", separated by commas, should be
given. If the list consists of a single subscript without a trailing
comma, a single object (an ellipsis, a slice or any other object) is
passed to the resulting __getitem__ or __setitem__ call. If the list
consists of many subscripts, or of a single subscript with a trailing
comma, a tuple is passed to the resulting __getitem__ or __setitem__
call, with an item for each subscript.

Here is the formal definition of the grammar:

::
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
sliceop: ':' [test]

This PEP suggests to allow an empty subscript list, with nothing
inside the square brackets. It will result in passing an empty tuple
to the resulting __getitem__ or __setitem__ call.

The change in the grammar is to make "subscriptlist" in the first
quoted line optional:

::
trailer: '(' [arglist] ')' | '[' [subscriptlist] ']' | '.' NAME


Motivation
==

This suggestion allows you to refer to zero-dimensional arrays elegantly. In
NumPy, you can have arrays with a different number of dimensions. In
order to refer to a value in a two-dimensional array, you write
``a[i, j]``. In order to refer to a value in a one-dimensional array,
you write ``a[i]``. You can also have a zero-dimensional array, which
holds a single value (a scalar). To refer to its value, you currently
need to write ``a[()]``, which is unexpected - the user may not even
know that when he writes ``a[i, j]`` he constructs a tuple, so he
won't guess the ``a[()]`` syntax. If the suggestion is accepted, the
user will be able to write ``a[]`` in order to refer to the value, as
expected. It will even work without changing the NumPy package at all!

In the normal use of NumPy, you usually don't encounter
zero-dimensional arrays. However, the author of this PEP is designing
another library for managing multi-dimensional arrays of data. Its
purpose is similar to that of a spreadsheet - to analyze data and
preserve the relations between a source of a calculation and its
destination. In such an environment you may have many
multi-dimensional arrays - for example, the sales of several products
over several time periods. But you may also have several
zero-dimensional arrays, that is, single values - for example, the
income tax rate. It is desired that the access to the zero-dimensional
arrays will be consistent with the access to the multi-dimensional
arrays. Just using the name of the zero-dimensional array to obtain
its value isn't going to work - the array and the value it contains
have to be distinguished.


Rationale
=

Passing an empty tuple to the __getitem__ or __setitem__ call was
chosen because it is consistent 

Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-10 Thread Noam Raphael
Hello,

I'll try to answer the questions in one message. Sorry for not being
able to do it until now.

About the joke - it isn't, I really need it.

About the timing - Of course, I can live with this getting into 2.6,
and I think that I may even be able to stay alive if this were
rejected. I still personally think that if people agree that it's a
good idea it might get in, since there's almost nothing to be decided
except for that - but of course, I can understand not wanting to rush
things too much.

About whether NumPy should return real scalars or 0-dimensional arrays
- I don't know.

About the use case - I think that's the real thing I didn't explain
well and needs explanation, so I will try to do it better this time.

I'm talking about something similar to a spreadsheet in that it saves
data, calculation results, and the way to produce the results.
However, it is not similar to a spreadsheet in that the data isn't
saved in an infinite two-dimensional array with numerical indices.
Instead, the data is saved in a few "tables", each storing a different
kind of data. The tables may be with any desired number of dimensions,
and are indexed by meaningful indices, instead of by natural numbers.

For example, you may have a table called sales_data. It will store the
sales data in years from set([2003, 2004, 2005]), for car models from
set(['Subaru', 'Toyota', 'Ford']), for cities from set(['Jerusalem',
'Tel Aviv', 'Haifa']). To refer to the sales of Ford in Haifa in 2004,
you will simply write: sales_data[2004, 'Ford', 'Haifa']. If the table
is a source of data (that is, not calculated), you will be able to set
values by writing: sales_data[2004, 'Ford', 'Haifa'] = 1500.

Tables may be computed tables. For example, you may have a table which
holds for each year the total sales in that year, with the income tax
subtracted. It may be defined by a function like this:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate)

Now, like in a spreadsheet, the function is kept, so that if you
change the data, the result will be automatically recalculated. So, if
you discovered a mistake in your data, you will be able to write:

sales_data[2004, 'Ford', 'Haifa'] = 2000

and total_sales[2004] will be automatically recalculated.

Now, note that the total_sales table depends also on the
income_tax_rate. This is a variable, just like sales_data. Unlike
sales_data, it's a single value. We should be able to change it, with
the result of all the cells of the total_sales table recalculated. But
how will we do it? We can write

income_tax_rate = 0.18

but it will have a completely different meaning. The way to make the
income_tax_rate changeable is to think of it as a 0-dimensional table.
It makes sense: sales_data depends on 3 parameters (year, model,
city), total_sales depends on 1 parameter (year), and income_tax_rate
depends on 0 parameters. That's the only difference. So, thinking of
it like this, we will simply write:

income_tax_rate[] = 0.18

Now the system can know that the income tax rate has changed, and
recalculate what's needed. We will also have to change the previous
function a tiny bit, to:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate[])

But it's fine - it just makes it clearer that income_tax_rate[] is a
part of the model that may change its value.

I hope that I managed to explain the use case better this time -
please ask if my description isn't clear enough.

Thanks for your comments, please send more,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-10 Thread Noam Raphael
Hello,

2006/6/10, Nick Coghlan <[EMAIL PROTECTED]>:
> The closest parallel would be with return/yield, as those actually create real
> tuples the same way subscripts do, and allow the expression to be omitted
> entirely.
>
> By that parallel, however, an implicit subscript (if adopted) should be None
> rather than ().
>
> Adapting the table from the pre-PEP to describe return statements (and yield
> expressions):
>
>  return i, j, k  <-->  return (i, j, k)
>  return i, j <-->  return (i, j)
>  return i,   <-->  return (i, )
>  return i<-->  return (i)
>return ()# (No implicit equivalent)
>  return  <-->  return None
>
> With the status quo, however, subscripts are simply equivalent to the RHS of
> an assignment statement in *requiring* that the expression be non-empty:
>
>  x = i, j, k  <-->  x = (i, j, k)
>  x = i, j <-->  x = (i, j)
>  x = i,   <-->  x = (i, )
>  x = i<-->  x = (i)
> x = () # (No implicit equivalent)
> x = None   # (No implicit equivalent)
>
> The PEP doesn't make a sufficiently compelling case for introducing
> yet-another-variant on the implicit behaviour invoked when a particular
> subexpression is missing from a construct.
>
I hope that my (hopefully) better explanation made the use case more
compelling, but I want to add two points in favour of an empty tuple:

1. If you want, you can ignore the x[(i, j, k)] equivalence
completely, since it doesn't work all the times - for example, you can
write "x[1:2, 3:4]", but you can't write "x[(1:2, 3:4)]". You can
think of x[i, j, k] as a syntax for specifying a cell in a
3-dimensional array, resulting in a call to x.__getitem__ with a
3-tuple describing the subscript for each dimension. In that view,
"x[]", which is a syntax for specifying the cell of a 0-dimensional,
should result in a __getitem__ call with an empty tuple, as there are
no subscripts to be described.

2. My equivalencies are better than yours :-), since they are dealing
with equivalencies for this specific syntax, while yours are dealing
with similar properies of a syntax for doing something completely
different.

> I guess I could have gone with my initial instinct of -1 and saved myself some
> mental exercise ;)

Why? Mental exercise is a good way to keep you mental ;)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-16 Thread Noam Raphael
Hello,

It seems to me that people don't object to my proposal, but don't find
it useful to them either.

The question is, what to do next. I guess one possibility is to raise
this discussion again in a few months, when people will be less
occupied with 2.5 beta. This is ok, although I would prefer a decision
before that, because it might affect the design of the library -
should I find a permanent workaround, or one that I know that will be
removed in the future.

If you do want to continue the discussion to reach a decision, please
do. You can say that if nobody else on python-dev is interested, it
shouldn't be implemented. You can examine my use case, say if you
think it's reasonable, and suggest alternative solutions - or say that
you see how allowing empty subscript list solves it elegantly (yes!)

My point is, I don't want this discussion to naturally die because
nobody is interested, since I am interested. So please say what you
think should happen to it, so we can reach a conclusion.

Now, if a the discussion is to continue, Nick proposed an alternative:

2006/6/11, Nick Coghlan <[EMAIL PROTECTED]>:
> For your specific use cases, though, I'd be inclined to tweak the API a bit,
> and switch to using attributes for the single-valued data:
>
> tax_rates.income_tax = 0.18

It's probably ok, although I would prefer not having to artificially
group scalars just to make them attributes of something. I would
prefer remaining with one object, and having something like
income_tax.setvalue(), or even income_tax.value.

> Although the income tax rate should actually depend on the current financial
> year, since it can change over time as the government increases taxes ;)

But that's exactly why I prefer writing simply "income_tax[] = 0.18"
when it's a constant, which is completely analogous to
"income_tax[2005] = 0.17; income_tax[2006] = 0.18" when it depends on
something.

By the way, another thing about consistency: A friend of mine brought
the point that there isn't another example of forbidden empty brackets
- [], {}, (), x() are all allowed.

And about the other thing Nick said:
> I guess I'm really only -0 on the idea of x[] invoking x.__getitem__(), and
> allowing the class to decide whether or not to define a default value for the
> subscript. I wouldn't implement it myself, but I wouldn't object strenuously
> if Guido decided it was OK :)

I would prefer an empty tuple, since invoking __getitem__ with no
arguments would be a special case: for all other possible subscript
lists, exactly one argument is passed to __getitem__. This leaves us
with one special case: a subscript list with one item and without a
trailing comma results in __getitem__ not getting a tuple, where in
all other cases it does get a tuple. This works exactly like
parentheses: they don't mean a tuple only when there's one item inside
them and no trailing comma.

Good bye,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-16 Thread Noam Raphael
2006/6/16, Gareth McCaughan <[EMAIL PROTECTED]>:
> None of the above is intended to constitute argument for
> or against Noam's proposed change to Python. Python isn't
> primarily a language for mathematicians, and so much the
> better for Python.
>
Thanks for your explanation of mathematical zero dimensional array! I
just wanted to say that I really got to this just from trying to make
a *computer program* as simple as possible - from what I know now,
with empty subscript lists not allowed, my library will have more
lines of code, will have more details of interface, and will require
longer code to operate it. I'm not saying that not changing it will be
terrible - I'm just saying that if changing something makes other
things simpler AND goes along with mathematical intuition, it might be
the right thing to do...

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-17 Thread Noam Raphael
Hello,

2006/6/16, Josiah Carlson <[EMAIL PROTECTED]>:
> I'm not a mathematician, and I don't really work with arrays of any
> dimensionality, so the need for 0-D subscripting via arr[] while being
> cute, isn't compelling to my uses for Python.

Thanks for appreciating its cuteness...
>
> Now, I appreciate the desire to reduce code length and complexity, but
> from what I understand, the ultimate result of such a change to your
> code would be to go from:
> arr[()]
> to:
> arr[]
>
> I don't see how this can reduce lines of code in implementation or use.
> At most it is a two characters per use, and a change in documentation
> (specifying how you subscript 0-D arrays).  If you can show an example
> where actual code line count is reduced with this change, I can't
> guarantee that I would get behind this proposal in a few months (if the
> conversation starts up again), but it may make me feel less that your
> proposal is essentially about aesthetics.

I meant the extra code for writing a special class to handle scalars,
if I decide that the "x[()]" syntax is too ugly or too hard to type,
so I write a special class which will allow the syntax "x.value".

The extra parentheses might not seem to matter for code using that
library, but I intend for people to use it directly, in an interactive
way, just like you type an expression in a spreadsheet. I expect that
for such a usage, the extra parentheses will be slightly unfun.

I know that it's not such a big difference, but I'm not talking about
a big change to the language either - it affects less than 20 lines of
code (probably could be done with even less), and doesn't cause any
problems with anything.

I can imagine Guido designing the grammar, thinking, "Should I allow
an empty subscript list? No, why should anyone want such a thing?
Besides, if someone wants them, we can always add them later." - at
least, it may be how I would think if I designed a language. So now, a
use was found. Indeed, it is farely rare. But why not to allow it now?

Have a good week,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-17 Thread Noam Raphael
2006/6/17, "Martin v. Löwis" <[EMAIL PROTECTED]>:
> Noam Raphael wrote:
> > I meant the extra code for writing a special class to handle scalars,
> > if I decide that the "x[()]" syntax is too ugly or too hard to type,
> > so I write a special class which will allow the syntax "x.value".
>
> What I cannot understand is why you use a zero-dimensional array to
> represent a scalar. Scalars are directly supported in Python:
>
> x = 5

I need a zero-dimensional array as a single cell - an object that
holds a value that can change over time. It works just like a cell in
a spreadsheet: For example, say that if you change the value of cell
A1 to 0.18, cell A2 changes to 5. When using the library I design, you
would write "sheet1[0, 0] = 0.18", and, magically, "sheet1[0, 1]" will
become 5. But in my library, everything is meaningful and doesn't have
to be two-dimensional. So, if in the spreadsheet example, A1 meant the
income tax rate, you would write "income_tax[] = 0.18", and,
magically, "profit['Jerusalem', 2005]" will become 5.

I hope I managed to explain myself - my use case and why the simplest
way to treat scalars like income_tax is as zero-dimensional arrays.

> Also, in an assignment, what are you putting on the right-hand side?
> A read access from another zero-dimensional array?
>
I hope my example explained that, but you can put there any object -
for example, you can write "income_tax[] = 0.18"

(If I didn't yet manage to explain myself, please say so - it seems
that it's not a very simple example and I'm not a very good explainer,
at least in English.)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-17 Thread Noam Raphael
Hi, sorry for my repeated posts. I just wanted to say that I improved
my patch a little bit, so it does exactly the same thing, but with
smaller code: you can see for yourself at
http://python.pastebin.com/715221 - it changed exactly 10 lines of
code, and adds additional 8 lines, all of them really short and
obvious.

I thought that it might convince someone that it's just a little
generalization of syntax, nothing frightening...

Noam

2006/6/17, Noam Raphael <[EMAIL PROTECTED]>:
> I know that it's not such a big difference, but I'm not talking about
> a big change to the language either - it affects less than 20 lines of
> code (probably could be done with even less), and doesn't cause any
> problems with anything.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-18 Thread Noam Raphael
2006/6/18, Shane Hathaway <[EMAIL PROTECTED]>:
> Try to think more about how users will use your API.  You haven't
> specified where those names (sheet1, income_tax, and profit) are coming
> from.  What do you expect users of your library to do to bring those
> names into their namespace?
>
That's a good question. I'm going to do some bytecode hacks! Something
like this:

from spreadsheetlib import SourceCube, CalculatedCube
income_tax = SourceCube([])
income_tax[] = 0.18
years = set([2003, 2004, 2005])
profit = SourceCube([years])
profit[2003] = 1000; profit[2004] = 2000; profit[2005] = 2500
real_profit = CalculatedCube([years], lambda year: profit[year] / (1+
income_tax[]))
print real_profit[2004]
(1694.9152542372883)

It may be what Talin meant about a "higher level language", but I
don't really change the language - I only inspect the function to see
on what other changeable objects it depends. Those changeable objects
implement some sort of change notification protocol, and it allows the
system to automatically recalculate the result when one of the values
it depends on changes.

(Actually, I intend to change the function to depend directly on the
changeable object instead of look it up every time in the global
namespace, but I don't think that it changes the explanation.)

Note that requiring that all changeable objects will be attributes of
some other object won't remove the need for bytecode hacking: the only
way is to explicitly specify a list of all the objects that the
function depends on, and then give a function that gets these as
arguments. This will really be inconvenient.

But thanks for the suggestion!

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-18 Thread Noam Raphael
2006/6/18, Guido van Rossum <[EMAIL PROTECTED]>:
> But more to the point, this discussion is pointless, since I won't
> accept the syntax change.

OK, too bad!

But don't say I haven't warned you, when you will all use my fabulous
package and get tired from typing all those extra parentheses! :)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Empty Subscript PEP on Wiki - keep or toss?

2006-07-01 Thread Noam Raphael
Hello,

I posted it as a "pre-PEP", in the hope that it may become a PEP and
be accepted. As it happened, Guido said "no" at the end, so I stopped
pushing the subject.

I think that the main reason for the "no" was that my use case wasn't
convincing enough - the objections were that this wasn't useful
enough, not that it does anything harmful*. As the one who does think
it's useful, I have the tiniest hope that if, in the future, people
will become familiar with the package and see the usefulness of
allowing empty subscript list, the decision will change. If the only
result of me posting it as a PEP is a final "rejected" status that
will prevent any chance of that happening, I don't think I'll bother
to make it a PEP. If it's not the case, then I'll make it a PEP and
post it.

Have a good week,
Noam

* Yes, I know that adding unneeded feature to the language can be
considered "harmful". You may not agree with my distinction in this
case. As it is, I barely consider this as an "added feature" - I would
say it's mostly "a small generalization".

2006/6/30, Georg Brandl <[EMAIL PROTECTED]>:
> [EMAIL PROTECTED] wrote:
> > Noam Raphael posted an empty subscript PEP on the Python Wiki:
> >
> > http://wiki.python.org/moin/EmptySubscriptListPEP
> >
> > It's not linked to by any other pages on the wiki.  Is there a reason it
> > wasn't added to the peps repository?
>
> Perhaps the author forgot to submit it to the PEP editor, or he decided
> to abandon it after the mostly negative discussion here.
>
> Georg
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/noamraph%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] IDLE development

2005-09-10 Thread Noam Raphael
Hello,

More than a year and a half ago, I posted a big patch to IDLE which
adds support for completion and much better calltips, along with some
other improvements. Since then, I had some mail conversations with
Kurt B. Kaiser, who is responsible for IDLE, which resulted in
nothing. My last mail, from Jul 10, saying (with more details) "I made
the minor changes you asked for, let's get it in, it's not very
complicated" was unanswered.

This is just an example of the fact that IDLE development was
virtually nonexistent in the last months, because most patches were
simply ignored.

I and my colleges use IDLE intensively - that is, a heavily patched
IDLE. It includes my patch and many other improvements made by me and
my friends.

The improved IDLE is MUCH better than the standard IDLE, especially
for interactive work. Since we would like to share our work with the
rest of the world, if nothing is changed we would start a new IDLE
fork soon, perhaps at python-hosting.com.

I really don't like that - maintaining a fork requires a lot of extra
work, and it is certain that many more people will enjoy our work if
it integrated in the standard Python distribution. But sending patches
and watching them stay open despite a continuous nagging is worse.

Please, either convince KBK to invest more time in IDLE development,
or find someone else who would take care of it. If you like, I would
happily help in the development.

I hope I am not sounding offensive. It's actually quite simple: if the
excellent development environment IDLE can't develop inside standard
Python, it should be developed outside it. As I said, I prefer the
first option.

Have a good week,
Noam Raphael
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IDLE development

2005-09-10 Thread Noam Raphael
On 9/11/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 9/10/05, Noam Raphael <[EMAIL PROTECTED]> wrote:
> > I and my colleges use IDLE intensively - that is, a heavily patched
> > IDLE. It includes my patch and many other improvements made by me and
> > my friends.
> >
> > The improved IDLE is MUCH better than the standard IDLE, especially
> > for interactive work.
> 
> Could it be that this is a rather subjective judgement? It wouldn't be
> the first time that someone pushing for their personal set of
> functionality changes is overlooking the needs of other user groups.
> 
I don't think so, since:
1. These are added features, not functionality changes.
2. There are quite a lot of people using the improved IDLE where I
work, and I never heard anyone saying he prefers the standard IDLE -
on the contrary, many are asking how they can use the improved IDLE in
their homes.
3. Kurt agreed to integrate the change - he just didn't do it.

> > Since we would like to share our work with the
> > rest of the world, if nothing is changed we would start a new IDLE
> > fork soon, perhaps at python-hosting.com.
> 
> I have no problem with this. You might be able to save yourself some
> maintenance work by structuring your version as a set of subclasses
> rather than a set of patches (even if you distribute it as a complete
> working program). Many people have needs that aren't met by standard
> Python; they write their own modules or extensions and distribute
> these independently from Python; your case probably isn't all that
> different.
> 
I think that rewriting the patches as subclasses will be a lot of
work, and won't be a very good idea - if you change one line in a
function, copy-pasting it to a subclass and changing the line seems a
little weird for me - not to mention the cases where some refactoring
needs to be done. I think we will be talking about a separate package
- say, idleforklib instead of idlelib. You can always run diff to find
the differences between the two packages.

> Often the needs of certain user groups and the development speeds of
> such 3rd party modules are so different that it simply doesn't make
> sense to fold them in the Python distribution anyway -- consider what
> you would have to do if Kurt accepted your patches: you'll still have
> to wait until Python 2.5 is released before others can benefit from
> your changes, and if you come up with an improvement after that
> release, your next chance will be 18 months later...
> 
I don't think so - if IDLE is developed on the Python CVS, we can
still distribute a stand-alone package with IDLE from the CVS head,
for eager people. All others will get the changes a year later, which
isn't that bad. Perhaps it can even be less than a year - since IDLE
is a GUI application and not a library, so there isn't a lot of
backward compatibility to maintain, it seems to me that updated
versions can be shipped also with new minor versions of Python.

The advantages of developing IDLE on the Python CVS are that there is
no need to synchronize two versions, and a wider audience. Of course,
after you see the improved IDLE you will surely decide to immediately
import it into the Python CVS, so there's not much of a problem... :)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IDLE development

2005-09-10 Thread Noam Raphael
On 9/11/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > Often the needs of certain user groups and the development speeds of
> > such 3rd party modules are so different that it simply doesn't make
> > sense to fold them in the Python distribution anyway -- consider what
> > you would have to do if Kurt accepted your patches: you'll still have
> > to wait until Python 2.5 is released before others can benefit from
> > your changes, and if you come up with an improvement after that
> > release, your next chance will be 18 months later...
> 
> Isn't separate distribution the way the *current* version of Idle was
> developed? I seem to recall it existing as IDLEFork for a long time so that it
> could have a more rapid release cycle before being rolled into the main
> distribution.

Yes, it is. I answered on the way to maintain a more rapid release
cycle of IDLE when developed in the Python CVS on my post in reply to
Guido.
> 
> This approach also allows a wider audience to asess the subjective benefits of
> any changes made - many more people will download and try out a separate IDE
> than will download and try out a patch to the main distribution. I'm such a
> one, even though I believe my main problems with Idle lie in the Tcl/tk
> toolkit (so I don't expect any application level changes to alter my opinion
> much).

Can you please explain what are these problems? A big problem with
Tcl/tk is that only one function call can be triggered by an event,
and I solved it for IDLE by writing a wrapper around Tkinter classes,
which calls all binded function calls on an event. This, for example,
allows the yellow CallTip windows to disappear when the IDLE window
loses focus, instead of staying above all other windows.

Thanks,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-29 Thread Noam Raphael
Hello,

I have thought about freezing for some time, and I think that it is a
fundamental need - the need to know, sometimes, that objects aren't
going to change.

This is mostly the need of containers. dicts need to know that the
objects that are used as keys aren't going to change, because if they
change, their hash value changes, and you end up with a data structure
in an inconsistent state. This is the need of sets too, and of heaps,
and binary trees, and so on.

I want to give another example: I and my colleges designed something
which can be described as an "electronic spreadsheet in Python". We
called it a "table". The values in the table are Python objects, and
the functions which relate them are written in Python. Then comes the
problem: the user has, of course, access to the objects stored in the
table. What would happen if he changes them? The answer is that the
table would be in an inconsistent state, since something which should
be the return value of a function is now something else, and there's
no way for the table to know about that.

The solution is to have a "freeze" protocol. It may be called "frozen"
(like frozen(set([1,2,3]))), so that it will be clear that it does not
change the object itself. The definition of a frozen object is that
its value can't change - that is, if you compare it with another
object, you should get the same result as long as the other object
hasn't changed. As a rule, only frozen objects should be hashable.

I want to give another, different, use case for freezing objects. I
once thought about writing a graph package in Python - I mean a graph
with vertices and edges. The most obvious way to store a directed
graph is as a mapping (dict) from a node to the set of nodes that it
points to. Since I want to be able to find also which nodes point to a
specific node, I will store another mapping, from a node to the set of
nodes that point to it. Now, I want a method of the graph which will
return the set of nodes that a given node points to, for example to
let me write "if y in graph.adjacent_nodes(x) then". The question is,
what will the adjacent_nodes method return? If it returns the set
which is a part of the data structure, there is nothing (even no
convention!) that will prevent the user from playing with it. This
will corrupt the data structure, since the change won't be recorded in
the inverse mapping. adjacent_nodes can return a copy of the set, it's
a waste if you only want to check whether an object is a member of the
set.

I gave this example to say that the "frozen" protocol should (when
possible) return an object which doesn't really contain a copy of the
data, but rather gives an "image" of the original object. If the
original object changes while there are frozen copies of it, the data
will be copied, and all the frozen objects will then reference a
version of the data that will never change again.

This will solve the graph problem nicely - adjacent_nodes would simply
return a frozen copy of the set, and a copy operation would happen
only in the rare cases when the returned set is being modified. This
would also help the container use cases: they may call the frozen()
method on objects that should be inserted into the container, and
usually the data won't be copied. Some objects can't be created in
their final form, but can only be constructed step after step. This
means that they must be non-frozen objects. Sometimes they are
constructed in order to get into a container. Unless the frozen()
method is copy-on-change the way I described, all the data would have
to be copied again, just for the commitment that it won't change.

I don't mean to frighten, but in principle, this may mean that
immutable strings might be introduced, which will allow us to get rid
of all the cStringIO workarounds. Immutable strings would be
constructed whenever they are needed, at a low performance cost
(remember that a frozen copy of a given object has to be constructed
only once - once it has been created, the same object can be returned
on additional frozen() calls.)

Copy-on-change of containers of non-frozen objects requires additional
complication: it requires frozen objects to have a way for setting a
callback that will be called when the original object was changed.
This is because the change makes the container of the original object
change, so it must drop its own frozen copy. This needs to happen only
once per frozen object, since after a change, all the containers drop
their frozen copies. I think this callback is conceptually similar to
the weakref callback.

Just an example that copy-on-change (at least of containers of frozen
objects) is needed: sets. It was decided that you can test whether a
non-frozen set is a member of a set. I understand that it is done by
"temporarily freezing" the set, and that it caused some threading
issues. A copy-on-change mechanism might solve it more elegantly.

What do you think?

Noam
___

Re: [Python-Dev] [Python-checkins] commit of r41352 - in python/trunk: . Lib Lib/distutils Lib/distutils/command Lib/encodings

2005-10-29 Thread Noam Raphael
> That might be reasonable. I just noticed that it is convenient to do
>
> svn propset svn:ignore -F .cvsignore .
>
> Without a file, I wouldn't know how to edit the property, so I would
> probably do
>
> svn propget svn:ignore . > ignores
> vim ignores
> svn propset svn:ignore -F ignores .
> rm ignores
>

Won't "svn propedit svn:ignore ." do the trick?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-30 Thread Noam Raphael
Hello,

It seems that we both agree that freezing is cool (-; . We disagree on
whether a copy-on-write behaviour is desired. Your arguments agains
copy-on-write are:
1. It's not needed.
2. It's complicated to implement.

But first of all, you didn't like my use cases. I want to argue with that.

> In reading your description of a 'table of values', I can't help but be
> reminded of the wxPython (and wxWidget) wx.Grid and its semantics.  It
> offers arbitrary tables of values (whose editors and viewers you can
> change at will), which offers a mechanism by which you can "listen" to
> changes that occur to the contents of a cell.  I can't help but think
> that if you offered a protocol by which a user can signal that a cell
> has been changed, perhaps by writing the value to the table itself
> (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351
> freeze), etc., that both you and the users of your table would be much
> happier.

Perhaps I didn't make it clear. The difference between wxPython's Grid
and my table is that in the table, most values are *computed*. This
means that there's no point in changing the values themselves. They
are also used frequently as set members (I can describe why, but it's
a bit complicated.)

I want to say that even if sets weren't used, the objects in the table
should have been frozen. The fact the sets (and dicts) only allow
immutable objects as members/keys is just for protecting the user.
They could have declared, "you shouldn't change anything you insert -
as long as you don't, we'll function properly." The only reason why
you can't compute hash values of mutable objects is that you don't
want your user to make mistakes, and make the data structure
inconsistent.

> As for the graph issue, you've got a bigger problem than users just
> being able to edit edge lists, users can clear the entire dictionary of
> vertices (outgoing.clear()).  It seems to me that a more reasonable
> method to handle this particular case is to tell your users "don't
> modify the dictionaries or the edge lists", and/or store your edge lists
> as tuples instead of lists or dictionaries, and/or use an immutable
> dictionary (as offered by Barry in the PEP).

As I wrote before, telling my users "don't modify the edge lists" is
just like making lists hashable, and telling all Python users, "dont
modify lists that are dictionary keys." There's no way to tell the
users that - there's no convention for objects which should not be
changed. You can write it in the documentation, but who'll bother
looking there?

I don't think that your other suggestions will work: the data
structure of the graph itself can't be made of immutable objects,
because of the fact that the graph is a mutable object - you can
change it. It can be made of immutable objects, but this means copying
all the data every time the graph changes.


Now, about copy-on-write:

> There's also this little issue of "copy on write" semantics with Python.
> Anyone who tells you that "copy on write" is easy, is probably hanging
> out with the same kind of people who say that "threading is easy".  Of
> course both are easy if you limit your uses to some small subset of
> interesting interactions, but "copy on write" gets far harder when you
> start thinking of dictionaries, lists, StringIOs, arrays, and all the
> possible user-defined classes, which may be mutated beyond obj[key] =
> value and/or obj.attr = value (some have obj.method() which mutates the
> object). As such, offering a callback mechanism similar to weak
> references is probably pretty close to impossible with CPython.

Let's limit ourselves to copy-on-write of objects which do not contain
nonfrozen objects. Perhaps it's enough - the table, the graph, and
strings, are perfect examples of these. Implementation doesn't seem to
complicated to me - whenever the object is about to change, and there
is a connected frozen copy, you make a shallow copy of the object,
point the frozen copy to it, release the reference to the frozen copy,
and continue as usual. That's all.

I really think that this kind of copy-on-write is "correct". The
temporary freezing of sets in order to check if they are members of
other sets is a not-very-nice way of implementing it. This kind of
copy-on-write would allow, in principle, for Python strings to become
mutable, with almost no speed penalty. It would allow my table, and
other containers, to automatically freeze the objects that get into
it, without having to trust the user on not changing the objects - and
remember that there's no way of *telling* him not to change the
objects.

Now, the computer scientist in me wants to explain (and think about)
freezing containers of nonfrozen objects. What I actually want is that
as long as an object doesn't change after it's freezed, the cost of
freezing would be nothing - that is, O(1). Think about a mutable
string object, which is used in the same way as the current, immutable
strings. It is constructed o

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-31 Thread Noam Raphael
Hello,

I have slept quite well, and talked about it with a few people, and I
still think I'm right.

About the users-changing-my-internal-data issue:

> Again, user semantics.  Tell your users not to modify entries, and/or
> you can make copies of objects you return.  If your users are too daft
> to read and/or follow directions, then they deserve to have their
> software not work.
...
> When someone complains that something doesn't work, I tell them to read
> the documentation.  If your users haven't been told to RTFM often enough
> to actually make it happen, then you need a RTFM-bat. Want to know how
> you make one?  You start wrapping the objects you return which segfaults
> the process if they change things. When they start asking, tell them it
> is documented quite clearly "do not to modify objects returned, or else".
> Then there's the other option, which I provide below.

I disagree. I read the manual when I don't know what something does.
If I can guess what it does (and this is where conventions are good),
I don't read the manual. And let's say I ought to read the complete
manual for every method that I use, and that I deserve a death
punishment (or a segmentation fault) if I don't. But let's say that I
wrote a software, without reading the manual, and it worked. I have
gone to work on other things, and suddenly a bug arises. When the poor
guy who needs to fix it goes over the code, everything looks
absolutely correct. Should he also read the complete manuals of every
library that I used, in order to fix that bug? And remember that in
this case, the object could have traveled between several places
(including functions in other libraries), before it was changed, and
the original data structure starts behaving weird.

You suggest two ways for solving the problem. The first is by copying
my mutable objects to immutable copies:

> Also from the sounds of it, you are storing both source and destination
> values in the same table...hrm, that sounds quite a bit like a
> spreadsheet.  How does every spreadsheet handle that again?  Oh yeah,
> they only ever store immutables (generally strings which are interpreted).
> But I suppose since you are (of course) storing mutable objects, you
> need to work a bit harder...so store mutables, and return immutable
> copies (which you can cache if you want, and invalidate when your
> application updates the results...like a wx.Grid update on changed).

This is basically ok. It's just that in my solution, for many objects
it's not necessary to make a complete copy just to prevent changing
the value: Making frozen copies of objects which can't reference
nonfrozen objects (sets, for example), costs O(1), thanks to the
copy-on-write.

Now, about the graph:

> So let me get this straight: You've got a graph.  You want to be able to
> change the graph, but you don't want your users to accidentally change
> the graph. Sounds to me like an API problem, not a freeze()/mutable problem.
> Want an API?
>
> class graph:
>...
>def IterOutgoing(self, node):
>...
>def IterIncoming(self, node):
>...
>def IsAdjacent(self, node1, node2):
>...
>def IterNodes(self):
>...
>def AddEdge(self, f_node, t_node):
>...
>def RemEdge(self, node1, node2):
>...
>def AddNode(self):
>...
>
> If you are reasonable in your implementation, all of the above
> operations can be fast, and you will never have to worry about your
> users accidentally mucking about with your internal data structures:
> because you aren't exposing them.  If you are really paranoid, you can
> take the next step and implement it in Pyrex or C, so that only a
> malicous user can muck about with internal structures, at which point
> you stop supporting them.

This will work. It's simply... well, not very beautiful. I have to
invent a lot of names, and my users need to remember them all. If I
give them a frozen set, with all the vertices than a vertex points to
(which is an absolutely reasonable API), they will know how to deal
with it without learning a lot of method names, thanks to the fact
that they are already familiar with sets, and that a lot of thought
has gone into the set interface.

> > Now, about copy-on-write:
...
>
> What you have written here is fairly unintelligible, but thankfully you
> clarify yourself...pity it still doesn't work, I explain below.

I'm sorry if I am sometimes hard to understand. English is not my
mother tongue, and it degrades as the hour gets later - and sometimes,
things are hard to explain. If I don't explain myself, please say so
and I'll try again. This is an excellent example - I wrote about
callbacks, and went to sleep. Let me try to explain again how it
*does* work.

The frozen() function, and the __frozen__ protocol, would get another
optional argument - an object to be notified when the *nonfrozen*
object has changed. It may be called at most once - only on the first
change to the object, and only 

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-31 Thread Noam Raphael
On 10/31/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
...
> And I'm going to point out why you are wrong.

I still don't think so. I think that I need to reclarify what I mean.

> > About the users-changing-my-internal-data issue:
...
> You can have a printout before it dies:
> "I'm crashing your program because something attempted to modify a data
> structure (here's the traceback), and you were told not to."
>
> Then again, you can even raise an exception when people try to change
> the object, as imdict does, as tuples do, etc.

Both solutions would solve the problem, but would require me to wrap
the built-in set with something which doesn't allow changes. This is a
lot of work - but it's quite similiar to what my solution would
actually do, in a single built-in function.
>
> > You suggest two ways for solving the problem. The first is by copying
> > my mutable objects to immutable copies:
>
> And by caching those results, then invalidating them when they are
> updated by your application.  This is the same as what you would like to
> do, except that I do not rely on copy-on-write semantics, which aren't
> any faster than freeze+cache by your application.

This isn't correct - freezing a set won't require a single copy to be
performed, as long as the frozen copy isn't saved after the original
is changed. Copy+cache always requires one copy.

...
> I never claimed it was beautiful, I claimed it would work.  And it does.
> There are 7 methods, which you can reduce if you play the special method
> game:
>
> RemEdge -> __delitem__((node, node))
> RemNode -> __delitem__(node) #forgot this method before
> IterNodes -> __iter__()
> IterOutgoing,IterIncoming -> IterAdjacent(node)
>
I just wanted to say that this game is of course a lot of fun, but it
doesn't simplify the interface.

> In any case, whether you choose to use freeze, or use a different API,
> this particular problem is solvable without copy-on-write semantics.

Right. But I think that a significant simplification of the API is a
nice bonus for my solution. And about those copy-on-write semantics -
it should be proven how complex they are. Remember that we are talking
about frozen-copy-on-write, which I think would simplify matters
considerably - for example, there are at most two instances sharing
the same data, since the frozen copy can be returned again and again.

> > > > Now, about copy-on-write:
> > ...
> Thank you for the clarification (btw, your english is far better than
> any of the foreign languages I've been "taught" over the years).
Thanks! It seems that if you are forced to use a language from time to
time it improves a little...

...

> Even without validation, there are examples that force a high number of
> calls, which are not O(1), ammortized or otherwise.
>
[Snap - a very interesting example]
>
> Now, the actual time analysis on repeated freezings and such gets ugly.
> There are actually O(k) objects, which take up O(k**2) space.  When you
> modify object b[i][j] (which has just been frozen), you get O(k)
> callbacks, and when you call freeze(b), it actually results in O(k**2)
> time to re-copy the O(k**2) pointers to the O(k) objects.  It should be
> obvious that this IS NOT AMMORTIZABLE to original object creation time.
>
That's absolutely right. My ammortized analysis is correct only if you
limit yourself to cases in which the original object doesn't change
after a frozen() call was made. In that case, it's ok to count the
O(k**2) copy with the O(k**2) object creation, because it's made only
once.

Why it's ok to analyze only that limited case? I am suggesting a
change in Python: that every object you would like be mutable, and
would support the frozen() protocol. When you evaluate my suggestion,
you need to take a program, and measure its performance in the current
Python and in a Python which implements my suggestion. This means that
the program should work also on the current Python. In that case, my
assumption is true - you won't change objects after you have frozen
them, simply because these objects (strings which are used as dict
keys, for example) can't be changed at all in the current Python
implementation!

I will write it in another way: I am proposing a change that will make
Python objects, including strings, mutable, and gives you other
advantages as well. I claim that it won't make existing Python
programs run slower in O() terms. It would allow you to do many things
that you can't do today; some of them would be fast, like editing a
string, and some of them would be less fast - for example, repeatedly
changing an object and freezing it.

I think that the performance penalty may be rather small - remember
that in programs which do not change strings, there would never be a
need to copy the string data at all. And since I think that usually
most of the dict lookups are for method or function names, there would
almost never be a need to constuct a new object on dict lookup,
because you search for the same names ag

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-31 Thread Noam Raphael
I thought about something -
>
> I think that the performance penalty may be rather small - remember
> that in programs which do not change strings, there would never be a
> need to copy the string data at all. And since I think that usually
> most of the dict lookups are for method or function names, there would
> almost never be a need to constuct a new object on dict lookup,
> because you search for the same names again and again, and a new
> object is created only on the first frozen() call. You might even gain
> performance, because s += x would be faster.
>
Name lookups can take virtually the same time they take now - method
names can be saved from the beginning as frozen strings, so finding
them in a dict will take just another bit test - is the object frozen
- before doing exactly what is done now. Remember, the strings we are
familiar with are simply frozen strings...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)

2005-11-01 Thread Noam Raphael
On 11/1/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
...
>
> I am an advocate for PEP 351.  However, I am against your proposed
> implementation/variant of PEP 351 because I don't believe it ads enough
> to warrant the additional complication and overhead necessary for every
> object (even tuples would need to get a .frozen_cache member).
>
> Give me a recursive freeze from PEP 351 (which handles objects that are
> duplicated, but errors out on circular references), and I'll be happy.
>
That's fine - but it doesn't mean that I must be happy with it.
>
...
> >
> > This isn't correct - freezing a set won't require a single copy to be
> > performed, as long as the frozen copy isn't saved after the original
> > is changed. Copy+cache always requires one copy.
>
> You are wrong, and you even say you are wrong..."freezing a set doesn't
> require a COPY, IF the frozen COPY isn't saved after the original is
> CHANGED". Creating an immutable set IS CREATING A COPY, so it ALSO
> copies, and you admit as much, but then say the equivalent of "copying
> isn't copying because I say so".

No, I am not wrong. I am just using misleading terms. I will call a
"frozen copy" a "frozen image". Here it goes: "freezing a set doesn't
require a COPY, IF the frozen IMAGE isn't saved after the original is
CHANGED". I suggest that there would be a way to create a frozenset
without COPYING an O(n) amount of MEMORY. When a frozen set is created
by a call frozen(x), it would not copy all the data, but would rather
reference the existing data, which was created by the non-frozen set.
Only if the original set changes, when there's a frozen set
referencing the data, the MEMORY would be actually copied.

I call it a "frozen copy" because it behaves as a frozen copy, even
though not all the memory is being copied. When you call the COPY
function in the COPY module with a string, it doesn't really copy
memory - the same string is returned. When you copy a file inside
subversion, it doesn't actually copy all the data associated with it,
but does something smarter, which takes O(1). The point is, for the
user, it's a copy. Whether or not memory is actually being copied, is
an implementation detail.
>
...
>
> I think that adding an additional attribute to literally every single
> object to handle the caching of 'frozen' objects, as well as a list to
> every object to handle callbacks which should be called on object
> mutation, along with a _call_stuff_when_mutated() method that handles
> these callback calls, IN ADDITION TO the __freeze__ method which is
> necessary to support this, is a little much, AND IS CERTAINLY NOT A
> SIMPLIFICATION!

I don't agree. You don't need to add a list to every object, since you
can store all those relations in one place, with a standard function
for registering them. Anyway, code written in Python (which is the
language we are discussing) WON'T BE COMPLICATED! The frozen
mechanism, along with two new protocols (__frozen__ and __changed__),
would be added automatically! The internal state of a class written in
Python can be automatically frozen, since it's basically a dict. Now
let's see if it's a simplification:

1. No Python code would have to be made more complicated because of the change.
2. There would be no need to find workarounds, like cStringIO, for the
fact that strings and tuples are immutable.
3. You would be able to put any kind of object into a set, or use it
as a dict key.
4. Classes (like the graph example) would be able to give users things
without having to make a choice between risking their users with
strange bugs, making a complicated interface, making very inefficient
methods, and writing complicated wrapper classes.

I will ask you: Is this a complication?
The answer is: it requires a significent change of the CPython
implementation. But about the Python language: it's definitely a
simplification.
>
> Let us pause for a second and consider:
> Original PEP proposed 1 new method: __freeze__, which could be
> implemented as a subclass of the original object (now), and integrated
> into the original classes as time goes on.  One could /register/
> __freeze__ functions/methods a'la Pickle, at which point objects
> wouldn't even need a native freeze method.
>
> Your suggestion offers 2 new methods along with 2 new instance variables.
> Let's see, a callback handler, __freeze__, the cache, and the callback
> list.  Doesn't that seem a little excessive to you to support freezing?
> It does to me.  If Guido were to offer your implementation of freeze, or
> no freeze at all, I would opt for no freeze, as implementing your freeze
> on user-defined classes would be a pain in the ass, not to mention
> implementing them in C code would be more than I would care to do, and
> more than I would ask any of the core developers to work on.
>
As I said above: this suggestion would certainly require more change
in the Python implementation than your suggestion. But the Python
language would gain a lot more. Imp

Re: [Python-Dev] a different kind of reduce...

2005-11-01 Thread Noam Raphael
On 11/1/05, Reinhold Birkenfeld <[EMAIL PROTECTED]> wrote:
> > Hmm, using the function's own namespace is an interesting idea.  It
> > might also be a good place to put other functionals:
> >
> >results = f.map(data)
> >newf = f.partial(somearg)
>
> And we have solved the "map, filter and reduce are going away! Let's all
> weep together" problem with one strike!
>
> Reinhold

I have no problems with map and filter goint away. About reduce -
please remember that you need to add this method to any callable,
including every type (I mean the constructor). I am not sure it is a
good trade for throwing away one builting, which is a perfectly
reasonable function.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] apparent ruminations on mutable immutables (was:PEP 351, the freeze protocol)

2005-11-01 Thread Noam Raphael
On 11/1/05, Delaney, Timothy (Tim) <[EMAIL PROTECTED]> wrote:
> Noam,
>
> There's a simple solution to all this - write a competing PEP. One of
> the two competing PEPs may be accepted.

I will. It may take some time, though.
>
> FWIW, I'm +1 on PEP 351 in general, and -1 on what you've proposed.
>
> PEP 351 is simple to explain, simple to implement and leaves things
> under the control of the developer. I think there are still some issues
> to be resolved, but the basic premise is exactly what I would want of a
> freeze protocol.
>
> Tim Delaney

It is true that PEP 351 is simpler. The problem is, that thanks to PEP
351 I have found a fundamental place in which the current Python
design is not optimal. It is not easy to fix it, because 1) it would
require a significant change to the current implementation, and 2)
people are so used to the current design that it is hard to convince
them that it's flawed.

The fact that discussing the design is long doesn't mean that the
result, for the Python programmer, would be complicated. They won't -
my suggestion will cause almost no backward-compatibility problems.
Think about it - it clearly means that my suggestion simply can't make
Python programming *more* complicated.

Please consider new-style classes. I'm sure they required a great deal
of discussion, but they are simple to use -- and they are a good
thing. And I think that my suggestion would make things easier, more
than the new-style-classes change did. Features of new-style classes
are an advanced topic. The questions, "why can't I change my strings?"
"why do you need both a tuple and a list?" and maybe "why can't I add
my list to a set", are fundamental ones, which would all not be asked
at all if my suggestion is accepted.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)

2005-11-01 Thread Noam Raphael
On 11/1/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
...
>
> I still consider it dead.
>"If the implementation is hard to explain, it's a bad idea."

It is sometimes true, but not always. It may mean two other things:
1. The one trying to explain is not talented enough.
2. The implementation is really not very simple. A hash table, used so
widely in Python, is really not a simple idea, and it's not that easy
to explain.
>
> Also, not all user-defined classes have a __dict__, and not all
> user-defined classes can have arbitrary attributes added to them.
>
> c>>> class foo(object):
> ... __slots__ = ['lst']
> ... def __init__(self):
> ... self.lst = []
> ...
> >>> a = foo()
> >>> a.bar = 1
> Traceback (most recent call last):
>  File "", line 1, in ?
> AttributeError: 'foo' object has no attribute 'bar'
> >>>
It doesn't matter. It only means that the implementation would have to
make frozen copies also of __slots__ items, when freezing a
user-defined class.

I am afraid that this question proves that I didn't convey my idea to
you. If you like, please forgive my inability to explain it clearly,
and try again to understand my idea, by going over what I wrote again,
and thinking on it. You can also wait for the PEP that I intend to
write. And you can also forget about it, if you don't want to bother
with it - you've already helped a lot.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Why should the default hash(x) == id(x)?

2005-11-02 Thread Noam Raphael
Hello,

While writing my PEP about unifying mutable and immutable, I came upon this:

Is there a reason why the default __hash__ method returns the id of the objects?

It is consistent with the default __eq__ behaviour, which is the same
as "is", but:

1. It can easily become inconsistent, if someone implements __eq__ and
doesn't implement __hash__.
2. It is confusing: even if someone doesn't implement __eq__, he may
see that it is suitable as a key to a dict, and expect it to be found
by other objects with the same "value".
3. If someone does want to associate values with objects, he can
explicitly use id:
dct[id(x)] = 3. This seems to better explain what he wants.


Now, I just thought of a possible answer: "because he wants to store
in his dict both normal objects and objects of his user-defined type,
which turn out to be not equal to any other object."

This leads me to another question: why should the default __eq__
method be the same as "is"? If someone wants to check if two objects
are the same object, that's what the "is" operator is for. Why not
make the default __eq__ really compare the objects, that is, their
dicts and their slot-members?

I would be happy to get answers.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] apparent ruminations on mutable immutables (was:PEP 351, the freeze protocol)

2005-11-02 Thread Noam Raphael
Thank you for your encouraging words!

I am currently working on a PEP. I am sure that writing it is a good
idea, and that it would help with explaining this idea both to others
and to myself.

What I already wrote makes me think that it can be accomplished with
no really large changes to the language - only six built-in types are
affected, and there is no reason why existing code, both in C and in
Python, would stop working.

I hope others would be interested in the idea too, when I finish
writing the PEP draft, so it would be discussed. Trying the idea with
PyPy is a really nice idea - it seems that it would be much simpler to
implement, and I'm sure that learning PyPy would be interesting.

Thanks again, and I would really like to hear your comments when I
post the PEP draft,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why should the default hash(x) == id(x)?

2005-11-02 Thread Noam Raphael
On 11/2/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
...
>
> A quick search in the list archives via google search
>"site:mail.python.org object __hash__"
> Says that Guido wanted to remove the default __hash__ method for object
> in Python 2.4, but that never actually happened.
>
> http://www.python.org/sf/660098
> http://mail.python.org/pipermail/python-dev/2003-December/041375.html
>
> There may be more code which relies on the default behavior now, but
> fixing such things is easy.
>
Cool! If Guido also thinks that it should be gone, who am I to argue...

(Seriously, I am in favor of removing it. I really think that it is confusing.)

And if backwards-compatibility is a problem: You can, in Python 2.5,
show a warning when the default __hash__ method is being called,
saying that it is going to disappear in Python 2.6.

[Snip - I will open a new thread about the equality operator]

> As for removing the default __hash__ for objects, I'm actually hovering
> around a -0, if only because it is sometimes useful to generate unique
> keys for dictionaries (which can be done right now with object() ), and
> I acknowledge that it would be easy to subclass and use that instead.
>
I can suggest a new class, that will help you in the cases that you do
want a dict of identities:

class ref(object):
def __init__(self, obj):
self._obj = obj
def __call__(self):
return self._obj
def __eq__(self, other):
return self._obj is other._obj
def __hash__(self):
return hash(id(self._obj))

It has the advantage over using ids as keys, that it saves a reference
to the object, so it won't be killed.

It lets you make a dict of object identities just as easily as before,
in a more explicit and error-prone way. Perhaps it should become a
builtin?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-02 Thread Noam Raphael
I think it should.

(I copy here messages from the thread about the default hash method.)

On 11/2/05, Michael Chermside <[EMAIL PROTECTED]> wrote:
> > Why not make the default __eq__ really compare the objects, that is,
> > their dicts and their slot-members?
>
> Short answer: not the desired behavior. Longer answer: there are
> three common patterns in object design. There are "value" objects,
> which should be considered equal if all fields are equal. There are
> "identity" objects which are considered equal only when they are
> the same object. And then there are (somewhat less common) "value"
> objects in which a few fields don't count -- they may be used for
> caching a pre-computed result for example. The default __eq__
> behavior has to cater to one of these -- clearly either "value"
> objects or "identity" objects. Guido chose to cater to "identity"
> objects believing that they are actually more common in most
> situations. A beneficial side-effect is that the default behavior
> of __eq__ is QUITE simple to explain, and if the implementation is
> easy to explain then it may be a good idea.
>
This is a very nice observation. I wish to explain why I think that
the default __eq__ should compare values, not identities.

1. If you want to compare identities, you can always use "is". There
is currently no easy way to compare your user-defined classes by
value, in case they happen to be "value objects", in Michael's
terminology - you have to compare every single member. (Comparing the
__dict__ attributes is ugly, and will not always work). If the default
were to compare the objects by value, and they happen to be "identity
objects", you can always do:
def __eq__(self, other):
return self is other

2. I believe that counter to what Michael said, "value objects" are
more common than "identity objects", at least when talking about
user-defined classes, and especially when talking about simple
user-defined classes, where the defaults are most important, since the
writer wouldn't care to define all the appropriate protocols. (this
was a long sentence) Can you give examples of common "identity
objects"? I believe that they are usually dealing with some
input/output, that is, with things that interact with the environment
(files, for example). I believe almost all "algorithmic" classes are
"value objects". And I think that usually, comparison based on value
will give the correct result for "identity objects" too, since if they
do I/O, they will usually hold a reference to an I/O object, like
file, which is an "identity object" by itself. This means that the
comparison will compare those objects, and return false, since the I/O
objects they hold are not the same one.

3. I think that value-based comparison is also quite easy to explain:
user-defined classes combine functions with a data structure. In
Python, the "data structure" is simply member names which reference
other objects. The default, value-based, comparison, checks if two
objects have the same member names, and that they are referencing
equal (by value) objects, and if so, returns True. I think that
explaining this is not harder than explaining the current dict
comparison.


Now, for Josiah's reply:

On 11/2/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > This leads me to another question: why should the default __eq__
> > method be the same as "is"? If someone wants to check if two objects
> > are the same object, that's what the "is" operator is for. Why not
> > make the default __eq__ really compare the objects, that is, their
> > dicts and their slot-members?
>
> Using 'is' makes sense when the default hash is id (and actually in
> certain other cases as well). Actually comparing the contents of an
> object is certainly not desireable with the default hash, and probably
> not desireable in the general case because equality doesn't always
> depend on /all/ attributes of extension objects.
>
>Explicit is better than implicit.
>In the face of ambiguity, refuse the temptation to guess.
>
I hope that the default hash would stop being id, as Josiah showed
that Guido decided, so let's don't discuss it.

Now, about the good point that sometimes the state doesn't depend on
all the attributes. Right. But the current default doesn't compare
them well too - you have no escape from writing an equality operator
by yourself. And I think this is not the common case.

I think that the meaning of "in the face of ambiguity, refuse the
temptation to guess" is that you should not write code that changes
its behaviour according to what the user will do, based on your guess
as to what he meant. This is not the case - the value-based comparison
is strictly defined. It may just not be what the user would want - and
in most cases, I think it will.

"Explicit is better than implicit" says only "better". identity-based
comparison is just as implicit as value-based comparison.

(I want to add that there is a simple way to support value-based
comparison 

Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-02 Thread Noam Raphael
I've looked for classes in my /usr/lib/python2.4 directory. I won't go
over all the 7346 classes that were found there, but let's see:

"identity objects" that will continue to work because they contain
other "identity objects"

SocketServer, and everything which inherits from it (like HTTPServer)
Queue
csv (contains _csv objects)

"value objects" that would probably gain a meaningful equality operator

StringIO
ConfigParser
markupbase, HTMLParser
HexBin, BinHex
cgi.FieldStorage
AST Nodes

others
==
Cookie - inherits from dict its __eq__ method.

I'll stop here. I was not strictly scientific, because I chose classes
that I thought that I might guess what they do easily, and perhaps
discarded classes that didn't look interesting to me. But I didn't
have any bad intention when choosing the classes. I have seen no class
that the change would damage its equality operator. I have seen quite
a lot of classes which didn't define an equality operator, and that a
value-based comparison would be the right way to compare them.

I'm getting more convinced in my opinion.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare valuesinstead of identities?

2005-11-02 Thread Noam Raphael
On 11/2/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> > Should the default equality operator compare valuesinstead of
> identities?
>
> No.  Look back into last year's python-dev postings where we agreed that
> identity would always imply equality.  There were a number of practical
> reasons.  Also, there are a number of places in CPython where that
> assumption is implicit.
>
Perhaps you've meant something else, or I didn't understand? Identity
implying equality is true also in value-based comparison. If the
default __eq__ operator compares by value, I would say that it would
do something like:

def __eq__(self, other):
if self is other:
return True
if type(self) is not type(other):
return False
(compare the __dict__ and any __slots__, and if they are all ==,
return True.)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why should the default hash(x) == id(x)?

2005-11-04 Thread Noam Raphael
On 11/3/05, Greg Ewing <[EMAIL PROTECTED]> wrote:
> > 3. If someone does want to associate values with objects, he can
> > explicitly use id:
> > dct[id(x)] = 3.
>
> This is fragile. Once all references to x are dropped,
> it is possible for another object to be created having
> the same id that x used to have. The dict now
> unintentionally references the new object.
>
You are right. Please see the simple "ref" class that I wrote in my
previous post, which solves this problem.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-05 Thread Noam Raphael
On 11/3/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
...
>
> Right, but lists (dicts, tuples, etc.) are defined as containers, and
> their comparison operation is defined on their contents.  Objects are
> not defined as containers in the general case, so defining comparisons
> based on their contents (as opposed to identity) is just one of the two
> assumptions to be made.
>
> I personally like the current behavior, and I see no /compelling/ reason
> to change it.  You obviously feel so compelled for the behavior to
> change that you are willing to express your desires.  How about you do
> something more productive and produce a patch which implements the
> changes you want, verify that it passes tests in the standard library,
> then post it on sourceforge.  If someone is similarly compelled and
> agrees with you (so far I've not seen any public support for your
> proposal by any of the core developers), the discussion will restart,
> and it will be decided (not by you or I).

Thanks for the advice - I will try to do as you suggest.
>
>
> > To summarize, I think that value-based equality testing would usually
> > be what you want, and currently implementing it is a bit of a pain.
>
> Actually, implementing value-based equality testing, when you have a
> finite set of values you want to test, is quite easy.
>
> def __eq__(self, other):
> for i in self.__cmp_eq__:
> if getattr(self, i) != getattr(other, i):
> return False
> return True
>
> With a simple metaclass that discovers all of those values automatically,
> and/or your own protocol for exclusion, and you are done.  Remember, not
> all 5-line functions should become builtin/default behavior, and this
> implementation shows that it is not a significant burdon for you (or
> anyone else) to implement this in your own custom library.
>
You are right that not all 5-line functions should become
builtin/default behaviour. However, I personally think that this one
should, since:
1. It doesn't add complexity, or a new builtin.
2. Those five line doesn't include the metaclass code, which will
probably take more than five lines and won't be trivial.
3. It will make other objects behave better, not only mine - other
classes will get a meaningful comparison operator, for free.
>
> P.S. One thing that you should remember is that even if your patch is
> accepted, and even if this is desireable, Python 2.5 is supposed to be
> released sometime next year (spring/summer?), and because it is a
> backwards incompatible change, would need at least 2.6-2.7 before it
> becomes the default behavior without a __future__ import, which is
> another 3-4 years down the line.

I hope that the warning can go in by Python 2.5, so the change (which
I think will cause relatively few backwards incompatibility problems)
can go in by Python 2.6, which I think is less than 2 years down the
line.
>
> I understand you are passionate, really I do (you should see some of my
> proposals), but by the time these things get around to getting into
> mainline Python, there are high odds that you probably won't care about
> them much anymore (I've come to feel that way myself about many of my
> proposals), and I think it is a good idea to attempt to balance - when
> it comes to Python - "Now is better than never." and "Although never is
> often better than *right* now."
>
> Removing __hash__, changing __eq__, and trying to get in copy-on-write
> freezing (which is really copy-and-cache freezing), all read to me like
> "We gotta do this now!", which certainly isn't helping the proposal.
>
Thanks - I should really calm down a bit. I will try to go "safe and
slowly", and I hope that at the end I will succeed in making my own
small contribution to Python.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-05 Thread Noam Raphael
On 11/5/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
...
> > 1. It doesn't add complexity, or a new builtin.
>
> It changes default behavior (which I specified as a portion of my
> statement, which you quote.
>
> And you are wrong, it adds complexity to the implementation of both
> class instantiation and the default comparison mechanism.  The former, I
> believe, you will find more difficult to patch than the comparison,
> though if you have not yet had adventures in that which is writing C
> extension modules, modifying the default class instantiation may be
> the deal breaker for you (I personally would have no idea where to start).

Sorry, I meant complexity to the Python user - it won't require him to
learn more in order to write programs in Python.
>
> class eqMetaclass(type):
> def __new__(cls, name, bases, dct):
> if '__cmp_include__' in dct:
> include = dict.fromkeys(dct['__cmp_include__'])
> else:
> include = dict.fromkeys(dct.keys)
>
> for i in dct.get('__cmp_exclude__'):
> _ = include.pop(i, None)
>
> dct['__cmp_eq__'] = include.keys()
> return type.__new__(cls, name, bases, dct)
>
> It took 10 lines of code, and was trivial (except for not-included
> multi-metaclass support code, which is being discussed in another thread).
>
> Oh, I suppose I should modify that __eq__ definition to be smarter about
> comparison...
>
> def __eq__(self, other):
> if not hasattr(other, '__cmp_eq__'):
> return False
> if dict.fromkeys(self.__cmp_eq__) != \
>dict.fromkeys(other.__cmp_eq__):
> return False
> for i in self.__cmp_eq__:
> if getattr(self, i) != getattr(other, i):
> return False
> return True

Thanks for the implementation. It would be very useful in order to
explain my suggestion.

It's nice that it compares only attributes, not types. It makes it
possible for two people to write classes that can be equal to one
another.

>
> Wow, 20 lines of support code, how could one ever expect users to write
> that? ;)

This might mean that implementing it in C, once I find the right
place, won't be too difficult.

And I think that for most users it will be harder than it was for you,
and there are some subtleties in those lines.
>
>
> > 3. It will make other objects behave better, not only mine - other
> > classes will get a meaningful comparison operator, for free.
>
> You are that the comparison previously wasn't "meaningful".  It has a
> meaning, though it may not be exactly what you wanted it to be, which is
> why Python allows users to define __eq__ operators to be exactly what
> they want, and which is why I don't find your uses compelling.
>
I think that value-based equality testing is a better default, since
in more cases it does what you want it to, and since in those cases
they won't have to write those 20 lines, or download them from
somewhere.
>
...
>
> From what I have previously learned from others in python-dev, the
> warnings machinery is slow, so one is to be wary of using warnings
> unless absolutely necessary. Regardless of it being absolutely necessary,
> it would be 2 years at least before the feature would actually make it
> into Python and become default behavior, IF it were desireable default
> behavior.

All right. I hope that those warnings will be ok - it's yet to be
seen. And about those 2 years - better later than never.
...
>
> You should also realize that you can make contributions to Python
> without changing the language or the implementation of the langauge.
> Read and review patches, help with bug reports, hang out on python-list
> and attempt to help the hundreds (if not thousands) of users who are
> asking for help, try to help new users in python-tutor, etc.

I confess that I don't do these a lot. I can say that I from time to
time teach beginners Python, and that where I work I help a lot of
other people with Python.

> If you
> have an idea for a language change, offer it up on python-list first
> (I've forgotten to do this more often than I would like to admit), and
> if it generally has more "cool" than "ick", then bring it back here.
>
I will. Thanks again.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-05 Thread Noam Raphael
On 11/6/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
...
> >
> > Sorry, I meant complexity to the Python user - it won't require him to
> > learn more in order to write programs in Python.
>
> Ahh, but it does add complexity.  Along with knowing __doc__, __slots__,
> __metaclass__, __init__, __new__, __cmp__, __eq__, ..., __str__,
> __repr__, __getitem__, __setitem__, __delitem__, __getattr__,
> __setattr__, __delattr__, ...
>
>
> The user must also know what __cmp_include__ and __cmp_exclude__ means
> in order to understand code which uses them, and they must understand
> that exclude entries overwrite include entries.
>
You are right. But that's Python - I think that nobody knows all the
exact details of what all these do. You look in the documentation. It
is a compliation - but it's of the type that I can live with, if
there's a reason.
>
> > > Wow, 20 lines of support code, how could one ever expect users to write
> > > that? ;)
> >
> > This might mean that implementing it in C, once I find the right
> > place, won't be too difficult.
> >
> > And I think that for most users it will be harder than it was for you,
> > and there are some subtleties in those lines.
>
> So put it in the Python Cookbook:
> http://aspn.activestate.com/ASPN/Cookbook/Python
>
A good idea.
>
> > > > 3. It will make other objects behave better, not only mine - other
> > > > classes will get a meaningful comparison operator, for free.
> > >
> > > You are that the comparison previously wasn't "meaningful".  It has a
> > > meaning, though it may not be exactly what you wanted it to be, which is
> > > why Python allows users to define __eq__ operators to be exactly what
> > > they want, and which is why I don't find your uses compelling.
> > >
> > I think that value-based equality testing is a better default, since
> > in more cases it does what you want it to, and since in those cases
> > they won't have to write those 20 lines, or download them from
> > somewhere.
>
> You are making a value judgement on what people want to happen with
> default Python. Until others state that they want such an operation as a
> default, I'm going to consider this particular argument relatively
> unfounded.
>
All right. I will try to collect more examples for my proposal.
>
> > > From what I have previously learned from others in python-dev, the
> > > warnings machinery is slow, so one is to be wary of using warnings
> > > unless absolutely necessary. Regardless of it being absolutely necessary,
> > > it would be 2 years at least before the feature would actually make it
> > > into Python and become default behavior, IF it were desireable default
> > > behavior.
> >
> > All right. I hope that those warnings will be ok - it's yet to be
> > seen. And about those 2 years - better later than never.
>
> It won't be OK.  Every comparison using the default operator will incur
> a speed penalty while it checks the (pure Python) warning machinery to
> determine if the warning has been issued yet.  This alone makes the
> transition require a __future__ import.
>
How will the __future__ statement help? I think that the warning is
still needed, so that people using code that may stop working will
know about it. I see that they can add a __future__ import and see if
it still works, but it will catch much fewer problems, because usually
code would be run without the __future__ import.

If it really slows down things, it seems to me that the only solution
is to optimize the warning module...

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why should the default hash(x) == id(x)?

2005-11-05 Thread Noam Raphael
On 11/5/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> More generally, I claim that the current behaviour is better than
> *any* alternative. To refute this claim, you would have to come
> up with an alternative first.
>
The alternative is to drop the __hash__ method of user-defined classes
(as Guido already decided to do), and to make the default __eq__
method compare the two objects' __dict__ and slot members.

See the thread about default equality operator - Josiah Carlson posted
there a metaclass implementing this equality operator.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter

2005-11-12 Thread Noam Raphael
On 11/12/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> If I have a Gtk app I have to feed other (socket, callback) pairs to it.  It
> takes care of adding it to the select() call.  Python could dictate that the
> way to play ball is for other packages (Tkinter, PyGtk, wxPython, etc) to
> feed Python the (socket, callback) pair.  Then you have a uniform way to
> control event-driven applications.  Today, a package like Michiel's has no
> idea what sort of event loop it will encounter.  If Python provided the
> event loop API it would be the same no matter what widget set happened to be
> used.
>
> The sticking point is probably that a number of such packages presume they
> will always provide the main event loop and have to way to feed their
> sockets to another event loop controller.  That might present some hurdles
> for the various package writers/Python wrappers.
>
I think that in order to solve Michiels' problem, there's no need for
something like that, since probably neither of the "loops" are
listening to sockets.

Currently, Tkinter sets PyOS_InputHook to call its "dooneevent"
repeatedly while Python code isn't being executed. It turns out to
work excellently. All that is needed to make Tkinter and Michiels'
code run together is a way to say "add this callback to the input
hook" instead of the current "replace the current input hook with this
callback". Then, when the interpreter is idle, it will call all the
registered callbacks, one at a time, and everyone would be happy.

To make this work with IDLE, or other interactive shells written in
Python, you need to expose a function which will run all the
registered callbacks. Then IDLE can call that function repeatedly when
it's idle, and you'll get the same behaviour you have in the regular
interactive shell. Specifically for IDLE, I know where that place is -
since there's no way to generally invoke the input hook, I wrote a
patch that calls _tkinter.dooneevent(_tkinter.DONT_WAIT) in the right
place, and it works fine.

Concerning threads - please don't. The "do one event at a time while
the interpreter is idle" method works fine. Most programs aren't
designed to be thread-safe, and since Tkinter does many callbacks to
Python functions, you'll get unexpected behaviour if it's on another
thread.

I hope I made myself clear. This solution is simple, and works
whenever a "do one event" function is available.

Have a good week,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-12 Thread Noam Raphael
Following Avi's suggestion, can I raise this thread up again? I think
that Reinhold's .dedent() method can be a good idea after all.

The idea is to add a method called "dedent" to strings. It would do
exactly what the current textwrap.indent function does. The motivation
is to be able to write multilined strings easily without damaging the
visual indentation of the source code, like this:

def foo():
msg = '''\
 From: %s
 To: %s\r\n'
 Subject: Host failure report for %s
 Date: %s

 %s
 '''.dedent() % (fr, ', '.join(to), host, time.ctime(), err)

Writing multilined strings without spaces in the beginning of lines
makes functions harder to read, since although the Python parser is
happy with it, it breaks the visual indentation.

On 9/15/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> From the sound of it, it's probably not worth endowing every string
> object with this method and hardcoding its implementation forever in C
> code. There are so many corner cases and variations on the
> functionality of "dedenting" a block that it's better to keep it as
> Python source code.

I've looked at the textwrap.dedent() function, and it's really simple
and well defined: Given a string s, take s.expandtabs().split('\n').
Take the minimal number of whitespace chars at the beginning of each
line (not counting lines with nothing but whitespaces), and remove it
from each line.

This means that the Python source code is simple, and there would be
no problems to write it in C.

On 9/15/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
>
> -1
>
> Let it continue to live in textwrap where the existing pure python code
> adequately serves all string-like objects.  It's not worth losing the
> duck typing by attaching new methods to str, unicode, UserString, and
> everything else aspiring to be string-like.
>
> String methods should be limited to generic string manipulations.
> String applications should be in other namespaces.  That is why we don't
> have str.md5(), str.crc32(), str.ziplib(), etc.
>
> Also, I don't want to encourage dedenting as a way of life --- programs
> using it often are likely to be doing things the hard way.
>
I think that the difference between "dedent" and "md5", "crc32" and
such is the fact that making "dedent" a method helps writing code that
is easier to read.

Strings already have a lot of methods which don't make code clearer
the way "dedent" will, such as center, capitalize, expandtabs, and
many others. I think that given these, there's no reason not to add
"dedent" as a string method.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-12 Thread Noam Raphael
On 11/12/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> > The motivation
> > is to be able to write multilined strings easily without damaging the
> > visual indentation of the source code
>
> That is somewhat misleading.  We already have that ability.  What is
> being proposed is moving existing code to a different namespace.  So the
> motivation is really something like:
>
>I want to write
>s = s.dedent()
>because it is too painful to write
>s = textwrap.dedent(s)
>
Sorry, I didn't mean to mislead. I wrote "easily" - I guess using the
current textwrap.dedent isn't really hard, but still, writing:

import textwrap
...

r = some_func(textwrap.dedent('''\
  line1
  line2'''))

Seems harder to me than simply

r = some_func('''\
  line1
  line2'''.dedent())

This example brings up another reason why "dedent" us a method is a
good idea: It is a common convention to indent things according to the
last opening bracket. "dedent" as a function makes the indentation
grow in at least 7 characters, and in 16 characters if you don't do
"from textwrap import dedent".

Another reason to make it a method is that I think it focuses
attention at the string, which comes first, instead of at the
"textwrap.dedent", which is only there to make the code look nicer.

And, a last reason: making dedent a built-in method makes it a more
"official" way of doing things, and I think that this way of writing a
multilined string inside an indented block is really the best way to
do it.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter

2005-11-12 Thread Noam Raphael
On 11/13/05, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Noam Raphael wrote:
>
> > All that is needed to make Tkinter and Michiels'
> > code run together is a way to say "add this callback to the input
> > hook" instead of the current "replace the current input hook with this
> > callback". Then, when the interpreter is idle, it will call all the
> > registered callbacks, one at a time, and everyone would be happy.
>
> Except for those who don't like busy waiting.
>
I'm not sure I understand what you meant. If you meant that it will
work slowly - a lot of people (including me) are using Tkinter without
a mainloop from the interactive shell, and don't feel the difference.
It uses exactly the method I described.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-14 Thread Noam Raphael
On 11/14/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> We have to draw a line somewhere - otherwise you could
> just as well add all functions that accept single
> string arguments as methods to the basestring
> sub-classes.

Please read my first post in this thread - I think there's more reason
for 'dedent' to be a string method than there is, for example, for
'expandtabs', since it allows you to write clearer code.
>
> The point is that the presented use case does not
> originate in a common need (to dedent strings), but
> from a desire to write Python code with embedded
> indented triple-quoted strings which lies in the scope
> of the parser, not that of string objects.
>
That's a theoretical argument. In practice, if you do it in the
parser, you have two options:
1. Automatically dedent all strings.
2. Add a 'd' or some other letter before the string.

Option 1 breaks backwards compatibility, and makes the parser do
unexpected things. Option 2 adds another string-prefix letter, which
is confusing, and it will also be hard to find out what that letter
means. On the other hand, adding ".dedent()" at the end is very clear,
and is just as easy.

Now, about performance, please see the message I'll post in a few minutes...

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-14 Thread Noam Raphael
Just two additional notes:

On 9/15/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
>
> -1
>
> Let it continue to live in textwrap where the existing pure python code
> adequately serves all string-like objects.  It's not worth losing the
> duck typing by attaching new methods to str, unicode, UserString, and
> everything else aspiring to be string-like.

It may seem like the 'dedent' code would have to be written a lot of
times, but I've checked the examples. It may be needed to write
different versions for 'str' and for 'unicode', but these are going to
be unified. In UserString you'll have to add exactly one line:

def dedent(self): return self.data.dedent()

I've just taken the line created for 'isalpha' and replaced 'isalpha'
with 'dedent'. So in the long run, there will be exactly one
implementation of 'dedent' in the Python code. (I don't know of any
other objects which try to provide the full string interface.)

Another reason for prefering a 'dedent' method over a 'dedent'
function in some module, is that it allows sometime in the future to
add an optimization to the compiler, so that it will dedent the string
in compile time (this can't work for a function, since the function is
found in run time). This will solve the performance problem
completely, so that there will be an easy way to write multilined
strings which do not interfere with the visual structure of the code,
without the need to worry about performance. I'm not saying that this
optimization has to be done now, just that 'dedent' as a method makes
it possible, which adds to the other arguments for making it a method.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter

2005-11-14 Thread Noam Raphael
On 11/14/05, Michiel Jan Laurens de Hoon <[EMAIL PROTECTED]> wrote:
> Ronald Oussoren wrote:
>
> > I wonder why nobody has suggested a seperate thread for managing the
> > GUI and
> > using the hook in Python's event loop to issue the call to update_plot.
> >
> Ha. That's probably the best solution I've heard so far, short of adding
> a Tcl-like event loop API to Python.

No. It is definitely a bad solution.

Where I work, we do a lot of plotting from the interactive
interpreter, using Tkinter. I always wondered how it worked, and
assumed that it was done using threading. So when people started using
IDLE, and those plots didn't show up, I've found the solution of
calling the Tkinter main() function from a thread. Everything seemed
to work fine, until...

It didn't. Strange freezes started to appear, only when working from
IDLE. This made me investigate a bit, and I've found that Tkinter
isn't run from a seperate thread - the dooneevent() function is called
repeatedly by PyOS_InputHook while the interpreter is idle.

The conclusions:
1. Don't use threads when you don't have to. Tkinter does callbacks to
Python code, and most code isn't designed to work reliably in
multithreaded environment.
2. The non-threading solution works *really* well - the fact is that I
hadn't noticed the difference between multi-threaded mode and
single-threaded mode, until things began to freeze in the
multi-threaded mode.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-14 Thread Noam Raphael
On 11/14/05, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> so is putting the string constant in a global variable, outside the scope
> you're in, like you'd do with any other constant.

Usually when I use a constant a single time, I write it where I use
it, and don't give it a name. I don't do:

messagea = "The value of A is "

... (a long class definition)
print messagea, A

This is what I mean when I say "constant" - a value which is known
when I write the code, not necessarily an arbitrary value that may
change, so I write it at the beginning of the program for others to
know it's there.

There's no reason why multilined strings that are used only once
should be defined at the beginning of a program (think about a simple
CGI script, which prints HTML parts in a function.)
>
> (how about a new rule: you cannot post to a zombie thread on python-
> dev unless they've fixed/reviewed/applied or otherwise processed at least
> one tracker item earlier the same day.  there are hundreds of items on the
> bugs and patches trackers that could need some loving care)
>
I posted to this thread because it was relevant to a new post about
dedenting strings. Anyway, I looked at bug 1356720 (Ctrl+C for copy
does not work when caps-lock is on), and posted there a very simple
patch which will most probably solve the problem. I also looked at bug
1337987 (IDLE, F5 and wrong external file content. (on error!)). One
problem it raises is that IDLE doesn't have a "revert" command and
that it doesn't notice if the file was changed outside of IDLE. I am
planning to fix it.

The other problem that is reported in that bug is that exceptions show
misleading code lines when the source file was changed but wasn't
loaded into Python. Perhaps in compiled code, not only the file name
should be written but also its modification time? This way, when
tracebacks print lines of changed files, they can warn if the line
might not be the right line.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter

2005-11-14 Thread Noam Raphael
On 11/15/05, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> If you want to write portable code that keeps things running "in the
> background" while the users hack away at the standard interactive
> prompt, InputHook won't help you.
>
So probably it should be improved, or changed a bit, to work also on
Windows. Or perhaps it's Tkinter. Anyway, what I'm saying is - don't
use threads! Process events in the main thread while it doesn't run
the user's Python code. If he runs another thread - that's his
problem. The implicit event loop should never execute Python code
while a user's Python code is running in the main thread.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-15 Thread Noam Raphael
Thanks for your examples. I understand tham sometimes it's a good idea
not to write the HTML inside the function (although it may be nice to
sometimes write it just before the function - and if it's a method,
then we get the same indentation problem.)

However, as you said, sometimes it is desired to write multilined
strings inside functions. You think it's ok to add white spaces to the
HTML code, I personally prefer not add varying indentation to my
output according to the level of indentation the code that generated
it.

I just wanted to add another use case: long messages. Consider those
lines from idlelib/run.py:133

msg = "IDLE's subprocess can't connect to %s:%d.  This may be due "\
  "to your personal firewall configuration.  It is safe to "\
  "allow this internal connection because no data is visible on "\
  "external ports." % address
tkMessageBox.showerror("IDLE Subprocess Error", msg, parent=root)

and from idlelib/PyShell.py:734:

def display_port_binding_error(self):
tkMessageBox.showerror(
"Port Binding Error",
"IDLE can't bind TCP/IP port 8833, which is necessary to "
"communicate with its Python execution server.  Either "
"no networking is installed on this computer or another "
"process (another IDLE?) is using the port.  Run IDLE with the -n "
"command line switch to start without a subprocess and refer to "
"Help/IDLE Help 'Running without a subprocess' for further "
"details.",
master=self.tkconsole.text)

I know, of course, that it could be written using textwrap.dedent, but
I think that not having to load a module will encourage the use of
dedent; if I have to load a module, I might say, "oh, I can live with
all those marks around the text, there's no need for another module",
and then, any time I want to change that message, I have a lot of
editing work to do.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-19 Thread Noam Raphael
On 11/19/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
> > You are missing an important point here: There are intentionally no line
> > breaks in this string; it must be a single line, or else showerror will
> > break it in funny ways. So converting it to a multi-line string would
> > break it, dedent or not.
>
> Only if you didn't include newline escapes, e.g.::
>
> msg = textwrap.dedent('''\
> IDLE's subprocess can't connect to %s:%d.  This may be due \
> to your personal firewall configuration.  It is safe to \
> allow this internal connection because no data is visible on \
> external ports.''' % address)
>

Unfortunately, it won't help, since the 'dedent' method won't treat
those spaces as indentation.

But if those messages were printed to the standard error, the line
breaks would be ok, and the use case valid.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison

2005-11-26 Thread Noam Raphael
Three weeks ago, I read this and thought, "well, you have two options
for a default comparison, one based on identity and one on value, both
are useful sometimes and Guido prefers identity, and it's OK." But
today I understood that I still think otherwise.

In two sentences: sometimes you wish to compare objects according to
"identity", and sometimes you wish to compare objects according to
"values". Identity-based comparison is done by the "is" operator;
Value-based comparison should be done by the == operator.

Let's take the car example, and expand it a bit. Let's say wheels have
attributes - say, diameter and manufacturer. Let's say those can't
change (which is reasonable), to make wheels hashable. There are two
ways to compare wheels: by value and by identity. Two wheels may have
the same value, that is, they have the same diameter and were created
by the same manufacturer. Two wheels may have the same identity, that
is, they are actually the same wheel.

We may want to compare wheels based on value, for example to make sure
that all the car's wheels fit together nicely: assert car.wheel1 ==
car.wheel2 == car.wheel3 == car.wheel4. We may want to compare wheels
based on identity, for example to make sure that we actually bought
four wheels in order to assemble the car: assert car.wheel1 is not
car.wheel2 and car.wheel3 is not car.wheel1 and car.wheel3 is not
car.wheel2...

We may want to associate values with wheels based on their values. For
example, it's reasonable to suppose that the price of every wheel of
the same model is the same. In that case, we'll write: price[wheel] =
25. We may want to associate values with wheels based on their
identities. For example, we may want to note that a specific wheel is
broken. For this, I'll first define a general class (I defined it
before in one of the discussions, that's because I believe it's
useful):

class Ref(object):
def __init__(self, obj):
self._obj = obj
def __call__(self):
return self._obj
def __eq__(self, other):
return isinstance(other, ref) and self._obj is other._obj
def __hash__(self):
return id(self._obj) ^ 0xBEEF

Now again, how will we say that a specific wheel is broken? Like this:

broken[Ref(wheel)] = True

Note that the Ref class also allows us to group wheels of the same
kind in a set, regardless of their __hash__ method.

I think that most objects, especially most user-defined objects, have
a *value*. I don't have an exact definition, but a hint is that two
objects that were created in the same way have the same value.
Sometimes we wish to compare objects based on their identity - in
those cases we use the "is" operator. Sometimes we wish to compare
objects based on their value - and that's what the == operator is for.
Sometimes we wish to use the value of objects as a dictionary key or
as a set member, and that's easy. Sometimes we wish to use the
identity of objects as a dictionary key or as a set member - and I
claim that we should do that by using the Ref class, whose *value* is
the object's *identity*, or by using a dict/set subclass, and not by
misusing the __hash__ and __eq__ methods.

I think that whenever value-based comparison is meaningful, the __eq__
and __hash__ should be value-based. Treating objects by identity
should be done explicitly, by the one who uses the objects, by using
the "is" operator or the Ref class. It should not be the job of the
object to decide which method (value or identity) is more useful - it
should allow the user to use both methods, by defining __eq__ and
__hash__ based on value.

Please give me examples which prove me wrong. I currently think that
the only objects for whom value-based comparison is not meaningful,
are objects which represent entities which are "outside" of the
process, or in other words, entities which are not "computational".
This includes files, sockets, possibly user-interface objects,
loggers, etc. I think that objects that represent purely "data", have
a "value" that they can be compared according to. Even wheels that
don't have any attributes are simply equal to other wheels, and not
equal to other objects. Since user-defined classes can interact with
the "environment" only through other objects or functions, it  is
reasonable to suggest that they should get a value-based equality
operator. Many times the value is defined by the __dict__ and
__slots__ members, so it seems to me a reasonable default.

I would greatly appreciate repliers that find a tiny bit of reason in
what I said (even if they don't agree), and not deny it all as a
complete load of rubbish.

Thanks,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison

2005-11-27 Thread Noam Raphael
On 11/27/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Noam Raphael wrote:
>  > I would greatly appreciate repliers that find a tiny bit of reason in
>  > what I said (even if they don't agree), and not deny it all as a
>  > complete load of rubbish.
>
> I don't understand what your message is. With this posting, did you
> suggest that somebody does something specific? If so, who is that one,
> and what should he do?

Perhaps I felt a bit attacked. It was probably my fault, and anyway, a
general message like this is not the proper way - I'm sorry.

>
> Anyway, a lot of your posting is what I thought was common knowledge;
> and with some of it, I disagree.

This is fine, of course.
> > We may want to compare wheels based on value, for example to make sure
> > that all the car's wheels fit together nicely: assert car.wheel1 ==
> > car.wheel2 == car.wheel3 == car.wheel4.
>
> I would never write it that way. This would suggest that the wheels
> have to be "the same". However, this is certainly not true for wheels:
> they have to have to be of the same make. Now, you write that wheels
> only carry manufacturer and diameter. However, I would expect that
> wheels grow additional attributes over time, like whether they are
> left or right, and what their wear level is. So to write your property,
> I would write
>
> car.wheel1.manufacturer_and_make() ==
> car.wheel2.manufacturer_and_make() ==
> car.wheel3.manufacturer_and_make() ==
> car.wheel4.manufacturer_and_make()
>
You may be right in the case of wheels. From time to time, in the real
(programming) world, I encounter objects that I wish to compare by
value - this is certainly the case for built-in objects, but is
sometimes the case for more complex objects.

> > We may want to associate values with wheels based on their values. For
> > example, it's reasonable to suppose that the price of every wheel of
> > the same model is the same. In that case, we'll write: price[wheel] =
> > 25.
>
> Again, I would not write it this way. I would find
>
> wheel.price()

Many times the objects are not yours to add attributes, or may have
__slots__ defined. The truth is that I prefer not to add attributes to
external objects even when it's possible.
>
> most natural. If I have the notion of a price list, then I would
> try to understand what the price list is keyed-by, e.g. model number:
>
> price[wheel.model] = 25
>
Sometimes there's no "key" - it's just the state of the object (what
if wheels don't have a model number?)

> > Now again, how will we say that a specific wheel is broken? Like this:
> >
> > broken[Ref(wheel)] = True
>
> If I want things to be keyed by identity, I would write
>
> broken = IdentityDictionary()
> ...
> broken[wheel] = True
>
> although I would prefer to write
>
> wheel.broken = True
>
I personally prefer the first method, but the second one is ok too.

> > I think that most objects, especially most user-defined objects, have
> > a *value*. I don't have an exact definition, but a hint is that two
> > objects that were created in the same way have the same value.
>
> Here I disagree. Consider the wheel example. I would expect that
> a wheel has a "wear level" or some such, and that this changes over
> time, and that it belongs to the "value" of the wheel ("value"
> being synonym to "state"). As this changes over time, it is certainly
> not that the object is created with that value.
>
> Think of lists: what is their value? Are they created with it?
>
My tounge failed me. I meant: created in the same way = have gone
through the same series of actions. That is:
a = []; a.append(5); a.extend([2,1]); a.pop()
b = []; b.append(5); b.entend([2,1]); b.pop()
a == b

> > Sometimes we wish to use the
> > identity of objects as a dictionary key or as a set member - and I
> > claim that we should do that by using the Ref class, whose *value* is
> > the object's *identity*, or by using a dict/set subclass, and not by
> > misusing the __hash__ and __eq__ methods.
>
> I think we should a specific type of dictionary then.
That's OK too. My point was that the one who uses the objects should
explicitly specify whether he means value-based of identity-based
lookup. This means that if an object has a "value", it should not make
__eq__ and __hash__ be identity-based just to make identity-based
lookup easier and implicit.
>
> > I think that whenever value-based comparison is meaningful, the __eq__
> > and __hash__ should be value-based. Treating objects by identity
> > should be done explicitly, by the one who uses

Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison

2005-11-27 Thread Noam Raphael
On 11/27/05, Samuele Pedroni <[EMAIL PROTECTED]> wrote:
> well, this still belongs to comp.lang.python.
...
> not if you think python-dev is a forum for such discussions
> on OO thinking vs other paradigms.

Perhaps my style made it look like a discussion on OO thinking vs
other paradigms, but my conclusion is exactly about the issue of this
thread -
Jim suggested to drop default __hash__ and __eq__ for Python 3K. Guido
decided not to, because it's useful to use them for identity-based
comparison and lookup. I say that I disagree, because I think that
__hash__ and __eq__ should be used for value-based comparison and
lookup, and because if the user of the object does explicit
identity-based comparison/lookup, it doesn't matter to him whether
__hash__ and __eq__ are defined or not. I also suggested, in a way,
that it's OK to define a default value-based __eq__ method.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A missing piece of information in weakref documentation

2005-12-12 Thread Noam Raphael
Hello,

I now discovered that a callback registered when creating a weak
reference will be called only if the weak reference object is still
alive. This is not documented in the weakref module documentation.

(It's a good behaviour - it just ought to be documented.)

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A missing piece of information in weakref documentation

2005-12-12 Thread Noam Raphael
On 12/12/05, Aahz <[EMAIL PROTECTED]> wrote:
> Please submit a doc patch to SF (or even just a bug report if you don't
> have time).  The patch may be plain text or reST; no need for Latex.

Done - patch number 1379023.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A few questions about setobject

2005-12-26 Thread Noam Raphael
Hello,

I'm going over setobject.c/setobject.h, while trying to change them to
support cheap frozen-copying. I have a few questions.

1) This is a part of setobject.h:

typedef struct {
long hash;
PyObject *key;
} setentry;

typedef struct _setobject PySetObject;
struct _setobject {
...
setentry *table;
setentry *(*lookup)(PySetObject *so, PyObject *key, long hash);
setentry smalltable[PySet_MINSIZE];
...
};

It seems to me that setentry and _setobject are defined for every file
that includes Python.h. In the Python C API, in the section about
include files, it is written that:
"All user visible names defined by Python.h (except those defined by
the included standard headers) have one of the prefixes "Py" or "_Py".
Names beginning with "_Py" are for internal use by the Python
implementation and should not be used by extension writers. Structure
member names do not have a reserved prefix."

Is this desirable? Even if it is, it seems that the second sentence
contradicts the first sentence. Perhaps the header file should stick
with writing "struct { long hash; PyObject *key; }" three times (or
define it in a macro and then undefine it), and the typedef be left to
the .c file?


2) The hash table used by sets uses a dummy element for deleted
entries. The implementation goes into the trouble of allocating it,
managing its reference count, and deallocating it at the end. What is
the reason for that? It seems to me that the only requirement of the
dummy element is that it shouldn't be a pointer to a valid PyObject,
and as such I would think that defining it like

int dummy_int;
PyObject *dummy = (PyObject *)(&dummy_int);

would be enough, and that it shouldn't be INCREFed or DECREFed every
time it is used. I think it should be ok because it's never used
really as a PyObject. Am I missing something? (Ok, I now thought that
maybe it's because some parts don't treat dummy elements specially.
But it seems to me that most parts do treat them specially, so perhaps
it would be better to make a few changes so that all parts will treat
them specially?)


3) The type of the result of a binary operator applied on a set and a
frozenset is the type of the left set. You are welcomed to ignore
this, but I just wanted to say that it seems to me better to make the
operator symmetric, and to return a frozenset only if both sets are
frozen.


If you think that these questions belong to c.l.py, then please say so
and I will go away.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A few questions about setobject

2005-12-28 Thread Noam Raphael
On 12/28/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>
> The setentry typedef clearly violates the principles of the API, so
> it should be renamed.
(And so will _setobject, although I think it will be much easier to remove it.)
>
> > Perhaps the header file should stick
> > with writing "struct { long hash; PyObject *key; }" three times (or
> > define it in a macro and then undefine it), and the typedef be left to
> > the .c file?
>
> That would not be conforming to the C language standard (although
> accepted by most compilers).

Can you explain why it is not conforming to the standard? Can't a
typedef be used intechangably with the original type?

> Not sure what "this" refers to in your message: the text of the C API
> documentation certainly is desirable as it stands (although it should
> be clearer as to whether struct names should be prefixed).

> > Even if it is, it seems that the second sentence
> > contradicts the first sentence.
>
> Why does that seem so? To quote, the first sentence is
>
> 'All user visible names defined by Python.h (except those defined by
> the included standard headers) have one of the prefixes "Py" or "_Py".'
>
> and the second sentence is
>
> 'Names beginning with "_Py" are for internal use by the Python
> implementation and should not be used by extension writers.'
>
> I cannot see any contradiction between these.
>
Oops. It's the first and the third:
The first: All user visible names defined by Python.h (except those
defined by the included standard headers) have one of the prefixes
"Py" or "_Py".
The third: Structure member names do not have a reserved prefix.

I think that "structure member names" refers to things like setentry.
The third sentence contradicts the first since structure member names
are user visible names. Anyway, it seems to me best if the third
sentence will be removed and all names will start with Py or _Py.
>
> > I think it should be ok because it's never used
> > really as a PyObject. Am I missing something? (Ok, I now thought that
> > maybe it's because some parts don't treat dummy elements specially.
> > But it seems to me that most parts do treat them specially, so perhaps
> > it would be better to make a few changes so that all parts will treat
> > them specially?)
>
> In principle, you are right. One place that doesn't special-case the
> dummy is set_clear_internal (in fact, the Py_DEBUG assertion is
> completely useless there, AFAICT).
>
> The tricky question is: can we be certain that we find all places,
> in all code paths, where we have to special-case the dummy? Having
> PyObject* which don't point to PyObject is error-prone.
>
> Also, what would we gain? If you think it is speed: I doubt it. In
> one place, a comment suggests that actually seeing the dummy element
> is so much more unlikely than the other cases that, for performance,
> the test for the dummy is done last. We would lose additional speed
> in the cases where the dummy isn't yet considered.
>
Ok, I tried. It took me 25 minutes to change the code, while going
over every occurence of "key" and "decref" in setobject.c. (It seems
to me that the dummy element can only be accessed from entry->key.)
Most of the code isn't bothered by the dummy element, since it uses
the data structure in a more abstract way. I think that it simplifies
the code, while adding a condition in only two places, which I don't
think should make anything noticably slower. The result passes the
complete test suite. In one sentence: I think that it makes the code
slightly better, and the risk is pretty low.

I thought to post it as a patch, but sourceforge didn't work for me,
and it's not that long, so I paste it at the end of this message. Feel
free to do whatever you want with it.

> > 3) The type of the result of a binary operator applied on a set and a
> > frozenset is the type of the left set. You are welcomed to ignore
> > this, but I just wanted to say that it seems to me better to make the
> > operator symmetric, and to return a frozenset only if both sets are
> > frozen.
>
> How would you implement this? The result is obtained from copying the
> left operand, and then applying the other operand. This is done so
> that set subtyping becomes possible:
>
> >>> class myset(set):pass
> ...
> >>> x=myset([2,6])
> >>> y=set([2,6])
> >>> x.union(y)
> myset([2, 6])
>
> So if the result is not obtained by copying the left operand first,
> how would you compute the result type, so that this example still
> works?
>
The behaviour will change to work like in other types - the returned
value will be of the base type:
>>> class MyInt(int): pass
...
>>> x = MyInt(3)
>>> y = 5
>>> x.__add__(y)
8

I'm not talking about backwards compatibility - I'm just currently
asking if others also feel that the symmetric version is preferable.

Ok, here's the diff:

=== modified file 'Objects/setobject.c'
--- Objects/setobject.c
+++ Objects/setobject.c
@@ -13,8 +13,12 @@
 /* This must be >= 1. */
 #define PERTURB_SHIFT 5

-/*

Re: [Python-Dev] Keep default comparisons - or add a second set?

2005-12-28 Thread Noam Raphael
And another example:

>>> a = 1+2j
>>> b = 2+1j
>>> a > b
Traceback (most recent call last):
  File "", line 1, in 
TypeError: no ordering relation is defined for complex numbers

I came to think that, when forgetting backwards compatibility for a
while, the best thing for comparison operators to do is to raise a
TypeError by default, and work only for types that it makes sense to
compare. I think it is more "explicit is better than implicit", and I
have now two reasons for that:
1. Things like "Decimal(3.0) == 3.0" will make more sense (raise an
exception which explains that Decimals should not be compared to
floats, instead of returning false constantly).
2. It is more forward compatible - when it is discovered that two
types can sensibly be compared, the comparison can be defined, without
changing an existing behaviour which doesn't raise an exception.

Perhaps you can explain to me again why arbitrary objects should be
comparable? I don't see why sorting objects according to values should
work when the order has no real meaning. I don't see why you need all
objects to be comparable for storing them in containers with the
behaviour of dict or set.

If the reason is that you want containers that work among multiple
sessions, and are "identity-based" (that is, only one object can be
used as a key), you can attach to each object an id that isn't session
dependent, and use that instead of the default memory address.

It may be a reason for dropping the default "hash is id": suppose that
you want a persistent storage that will work like dicts but will not
be associated with one Python session (it may be exactly Zope's BTrees
- I don't know). Currently you have a problem with using __hash__ for
that purpose, since the hash value of an object can change between
sessions - that happens when it's the id of the object. Now suppose
that we have a "persistent" id function - it can be implemented by
using weakrefs to associate a unique value with an object on the first
time that the function is called, and storing it with the object when
serializing it. Also suppose that we drop the default hash method, so
where currently hash(x) is id(x), hash(x) will raise a TypeError. Then
our persistent storage can use the persistent id instead of the
default id, and it will work. (You might not want mutable objects to
be used as keys, but that's another problem - the data structure will
be consistent anyway.)

In fewer words, the built-in id() is just one way to assign identities
to objects. __hash__ shouldn't use it implicitly when there's no
value-based hash value - if it wouldn't, the rule that x == y ->
hash(x) == hash(y) will be preserved also between different sessions,
so persistent objects would be able to use hash values.

Does it make sense to you?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keep default comparisons - or add a second set?

2005-12-28 Thread Noam Raphael
On 12/28/05, Adam Olsen <[EMAIL PROTECTED]> wrote:
> I agree for greaterthan and lessthan operators but I'm not convinced
> for equality.  Consider this in contrast to your example:
> >>> a = 1+2j
> >>> b = 1+2j
> >>> a is b
> False
> >>> a == b
> True
>
> Complex numbers can't be sorted but they can be tested for equality.
> Decimal numbers can't currently be tested for equality against a float
> but there's no loss-of-accuracy argument to prevent it (just a
> difficulty of implementation one.)
>
> If the comparison is to fail I would prefer an exception rather than
> an id-based fallback though.

I think we agree. I don't think that every type that supports equality
comparison should support order comparison. I think that if there's no
meaningful comparison (whether equality or order), an exception should
be raised.
>
> Speaking of id, there's no reason why "id(a) == id(b)" has to fail for
> mismatched types in the face of persistence so long as the result of
> id() has the same lifetime as the target object.  This could be done
> using weakrefs or by making an id type with a strong reference to the
> target object.

I don't mean to change the current behaviour of id() - I just meant
that an additional one may be implemented, possible by a specific
library (Zope, for instance), so the built-in one shouldn't be used as
a fallback default.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keep default comparisons - or add a second set?

2005-12-28 Thread Noam Raphael
On 12/29/05, Robert Brewer <[EMAIL PROTECTED]> wrote:
>
>  Just to keep myself sane...
>
>  def date_range(start=None, end=None):
>  if start == None:
>  start = datetime.date.today()
>  if end == None:
>  end = datetime.date.today()
>  return end - start
>
>  Are you saying the "if" statements will raise TypeError if start or end are
> dates? That would be a sad day for Python. Perhaps you're saying that there
> is a "meaningful comparison" between None and anything else, but please
> clarify if so.

Yes, I'm suggesting that they will raise a TypeError. Your example
shows that the change is not compatible with a lot of existing Python
code, which means that it's a Python 3000 thing. The following code
will continue to work:

def date_range(start=None, end=None):
if start is None:
start = datetime.date.today()
if end is None:
end = datetime.date.today()
return end - start

Using "is None" instead of "== None" is considered a better style even now.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] set.copy documentation string

2005-12-28 Thread Noam Raphael
is currently "Return a shallow copy of a set."

Perhaps "shallow" should be removed, since set members are supposed to
be immutable so there's no point in a deep copy?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] When do sets shrink?

2005-12-28 Thread Noam Raphael
Hello,

If I do something like this:

s = set()
for i in xrange(100):
s.add(i)
while s:
s.pop()
gc.collect()

the memory consumption of the process remains the same even after the pops.

I checked the code (that's where I started from, really), and there's
nothing in set.pop or set.remove that resizes the table. And it turns
out that it's the same with dicts.

Should something be done about it?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] set.copy documentation string

2005-12-28 Thread Noam Raphael
On 12/29/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Noam Raphael wrote:
> > is currently "Return a shallow copy of a set."
> >
> > Perhaps "shallow" should be removed, since set members are supposed to
> > be immutable so there's no point in a deep copy?
>
> That still doesn't make copy return a deep copy, right? "shallow copy"
> is more precise than "copy", and correct - what is gained from
> removing it?

Perhaps it bothers the programmer with something that shouldn't bother
him. I mean that I might do help(set.copy), and then think, "Oh, it
returns a shallow copy. Wait a minute - 'shallow' means that I get a
new object, which references the same objects as the old one. Perhaps
I should use another function, which does deep copying? Let me think
about it - no. All members of a set are immutable, so it doesn't
matter." I think that in this case, the fact that the copy is shallow
is an implementation detail, since there's no sense in making a deep
copy of a set. Implementation details should probably not be a part of
the definition of what a method does.

I know it's just a word, and that it doesn't matter a lot. But why not
make things a tiny bit better?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-28 Thread Noam Raphael
On 12/29/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Noam Raphael wrote:
> > Should something be done about it?
>
> It's very difficult to do something useful about it. Even if
> you manage to determine how much memory you want to release,
> it's nearly impossible to actually release the memory to the
> operating system, because of the many layers of memory
> management software that all need to agree that the memory
> should be reused somewhere else (instead of keeping it on
> that layer, just in case somebody using that layer wants
> some memory).
>
I checked - when doing the same thing with lists, all the memory was
released for use by other Python objects, and most of it was released
for use by the operating system.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] set.copy documentation string

2005-12-28 Thread Noam Raphael
On 12/29/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> If "makes no sense" means "would not make a difference", then you
> are wrong. Objects in a set are not necessarily unmodifiable;
> they just have to be hashable.
>
Oh, you are right. I thought so much about dropping default hash=id,
or more generally that only frozen objects should be hashable, that I
forgot that it's not the case yet... :)

(I used the term "frozen" instead of "immutable" since I think that
"immutable" is not defined very well, because tuples are considered
immutable even though their value can change if they reference mutable
objects.)

Thanks,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-28 Thread Noam Raphael
On 12/29/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> What could be done is to add a test for excess dummy entries and trigger
> a periodic resize operation.  That would make the memory available for
> other parts of the currently running script and possibly available for
> the O/S.
>
> The downside is slowing down a fine-grained operation like pop().  For
> dictionaries, this wasn't considered worth it.  For sets, I made the
> same design decision.  It wasn't an accident.  I don't plan on changing
> that decision unless we find a body of real world code that would be
> better-off with more frequent re-sizing.

The computer scientist in me prefers O() terms over changes in a
constant factor, but that's only me. Perhaps a note about it should be
added to the documentation, though?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-28 Thread Noam Raphael
On 12/29/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Noam Raphael wrote:
> > The computer scientist in me prefers O() terms over changes in a
> > constant factor, but that's only me.
>
> That remark, I don't understand. In a hash table, most "simple"
> operations are O(n) as the worst-case time, except for operations
> that may cause resizing, which are O(n**2) (worst case).
>
> So you are proposing that .pop() might trigger a resize, thus
> changing from O(n) worst case to O(n**2) worst case? Why would
> a computer scientist prefer that?
>
Perhaps I'm not that great computer scientist, but simple operations
on hash tables are supposed to be O(1) in the average case (which is
what matters to me). This means that resizing is an O(n) operation in
the average case.

This means that just like list.append() and list.pop(), that are made
O(1) operations amortized, so can operations on hash table be made
O(1) amortized - all you need is to use the trick for setting bounds
so that resize operations will always happen after O(n) simple
operations.

> > Perhaps a note about it should be added to the documentation, though?
>
> Sure. Patches (to sf.net/projects/python) are welcome.

I will try to send one when sf becomes healthier.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-29 Thread Noam Raphael
On 12/29/05, Donovan Baarda <[EMAIL PROTECTED]> wrote:
> Without some sort of fancy overkill size hinting or history tracking,
> that's probably as good a heuristic as you can get.

I'm sorry, but it's not correct. There's a simple resize scheduling
algorithm that is proven to take, when you sum things up, O(1) per
each simple operation, and that keeps the amount of used memory always
proportional to the number of elements in the set.

I'm not saying that practically it must be used - I'm just saying that
it can't be called a heuristic, and that it doesn't involve any "fancy
overkill size hinting or history tracking". It actually means
something like this:
1. If you want to insert and the table is full, resize the table to
twice the current size.
2. If you delete and the number of elements turns out to be less than
a quarter of the size of the table, resize the table to half of the
current size.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-29 Thread Noam Raphael
On 12/29/05, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> Noam Raphael wrote:
>
> > I'm not saying that practically it must be used - I'm just saying that
> > it can't be called a heuristic, and that it doesn't involve any "fancy
> > overkill size hinting or history tracking". It actually means
> > something like this:
> > 1. If you want to insert and the table is full, resize the table to
> > twice the current size.
> > 2. If you delete and the number of elements turns out to be less than
> > a quarter of the size of the table, resize the table to half of the
> > current size.
>
> sure sounds like a heuristic algorithm to me... (as in "not guaranteed to
> be optimal under all circumstances, even if it's probably quite good in all
> practical cases")

I'm not saying it's optimal, but it is really amortized O(1) per
insert/delete. I looked up in "Introduction to Algorithms" for this,
and it has a complicated explanation. A simple explanation is that
after every resize the table is exactly half-full. Let's say it has n
elements and the table size is 2*n. To get to the next resize, you
have to do at least n/2 removals of elements, or n insertion of
elements. After that, you do a resize operation. In either case, you
do an O(n) resize operation after at least O(n) insertions/removals
which are O(1) operations. This means that the toal cost remains O(n)
per n simple operations, which you can say is O(1) per simple
operation.

I hope that if you read this slowly it makes sense...

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-31 Thread Noam Raphael
Hello,

I thought about another reason to resize the hash table when it has
too few elements. It's not only a matter of memory usage, it's also a
matter of time usage: iteration over a set or a dict requires going
over all the table. For example, iteration over a set which once had
1,000,000 members and now has 2 can take 1,000,000 operations every
time you traverse all the (2) elements.

Apologies:
1. It may be trivial to you - I'm sorry, I thought about it just now.
2. You can, of course, still do whatever tradeoff you like.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-31 Thread Noam Raphael
On 12/31/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> [Noam]
> > For example, iteration over a set which once had
> > 1,000,000 members and now has 2 can take 1,000,000 operations every
> > time you traverse all the (2) elements.
>
> Do you find that to be a common or plausible use case?

I don't have a concrete example in this minute, but a set which is
repeatedly filled with elements and then emptied by pop operations
doesn't seem to me that far-fetched.
>
> Was Guido's suggestion of s=set(s) unworkable for some reason?  dicts
> and sets emphasize fast lookups over fast iteration -- apps requiring
> many iterations over a collection may be better off converting to a list
> (which has no dummy entries or empty gaps between entries).

It's workable, but I think that most Python programmers haven't read
that specific message, and are expecting operations which should take
a short time to take a short time. Converting to a list won't help the
use-case above, and anyway, it's another thing that I wouldn't expect
anyone to do - there's no reason that iteration over a set should take
a long time.

(I'm speaking of my point of view, which I believe is common. I don't
expect programs I write in Python to be super-fast - if that were the
case, I would write them in C. I do expect them to take a reasonable
amount of time, and in the case of iteration over a set, that means a
time proportional to the number of elements in the set.)
>
> Would the case be improved by incurring the time cost of 999,998 tests
> for possible resizing (one for each pop) and some non-trivial number of
> resize operations along the way (each requiring a full-iteration over
> the then current size)?
>
I believe it would. It seems to me that those 999,998 tests take not
much more than a machine clock, which means about 1 milisecond on
todays computers. Those resize operations will take some more
miliseconds. It all doesn't really matter, since probably all other
things will take much more. I now run this code

>>> s = set()
>>> for j in xrange(100):
... s.add(j)
...
>>> while s:
... tmp = s.pop()
...

And it took 2.4 seconds to finish. And it's okay - I'm just saying
that a few additional clock ticks per operation will usually not
matter when the overall complexity is the same, but changes in order
of complexity can matter much more.

> Even if this unique case could be improved, what is the impact on common
> cases?  Would a downsizing scheme risk thrashing with the
> over-allocation scheme in cases with mixed adds and pops?
>
I think that there shouldn't be additional damage beyond those clock
ticks. The simple method I took from "Introduction to Algorithms"
works no matter what sequence of adds and pops you have.

> Is there any new information/research beyond what has been obvious from
> the moment the dict resizing scheme was born?

I wanted to say that there isn't any new information, and yet I don't
think that I have to assume that everything in current Python is the
best that can be. All I did was finding another reason why a
downsizing scheme might be good, and posting it to ask if people have
thought about it. If you have a document listing all the design
decisions that went into dict implementation, then please send it to
me and I won't ask about things that were already thought about.

But the answer is, yes. I beleive that the current dict resizing
scheme was born before the iterator protocol was introduced, and it
may be a reason why the current scheme doesn't try to minimize the
number of empty hash table entries.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351

2006-02-11 Thread Noam Raphael
Hello,

I just wanted to say this: you can reject PEP 351, please don't reject
the idea of frozen objects completely. I'm working on an idea similar
to that of the PEP, and I think that it can be done elegantly, without
the concrete problems that Raymond pointed. I didn't work on it in the
last few weeks, because of my job, but I hope to come back to it soon
and post a PEP and a reference implementation in CPython.

My quick responses, mostly to try to convince that I know a bit about
what I'm talking about:

First about the last point: I suggest that the function will be named
frozen(x), which suggests that nothing happens to x, you only get a
"frozen x". I suggest that this operation won't be called "freezing
x", but "making a frozen copy of x".

Now, along with the original order. Frozen dicts - if you want, you
can decide that dicts aren't frozenable, and that's ok. But if you do
want to make frozen copies of dicts, it isn't really such a problem -
it's similar to hashing a tuple, which requires recursive hashing of
all its elements; for making a frozen copy of a dict, you make a
frozen copy of all its values.

Treating all containers polymorphically - I don't suggest that. In my
suggestion, you may have frozen lists, frozen tuples (which are normal
tuples with frozen elements), frozen sets and frozen dicts.

Treating tuples as frozen lists - I don't suggest to do that. But if
my suggestion is accepted, there would be no need for tuples - frozen
lists would be just as useful.

And about the other concerns:

> More important than all of the above is the thought that auto-freezing is
> like a bad C macro, it makes too much implicit and hides too much -- the
> supported methods change, there is a issue keeping in sync with the
> non-frozen original, etc.
>
> In my experience with frozensets, I've learned that freezing is not an
> incidental downstream effect; instead, it is an intentional, essential part
> of the design and needs to be explicit.

I think these concerns can only be judged given a real suggestion,
along with an implementation. I have already implemented most of my
idea in CPython, and I think it's elegant and doesn't cause problems.
Of course, I may not be objective about the subject, but I only ask to
wait for the real suggestion before dropping it down.

To summarize, I see the faults in PEP 351. I think that another,
fairly similar idea might be a good one.

Have a good week,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com