Re: [Python-Dev] Please reconsider PEP 479.

2014-11-26 Thread Hrvoje Niksic

On 11/26/2014 12:24 PM, Nick Coghlan wrote:

Now needs to be written out explicitly as:

 def my_generator():
 ...
try:
 yield next(it)
 except StopIteration
 return
 ...


To retrieve a single value from an iterator, one can use the 
for/break/else idiom:


def my_generator():
...
for val in it:
yield val
break
else:
return
...

In general, catching and raising StopIteration feels like something that 
should rarely be done by normal code.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 2.x and 3.x use survey, 2014 edition

2014-12-17 Thread Hrvoje Niksic

On 12/16/2014 08:18 PM, R. David Murray wrote:

On Tue, 16 Dec 2014 10:48:07 -0800, Mark Roberts  wrote:

> Besides, using iteritems() and friends is generally a premature
> optimization, unless you know you'll have very large containers.
> Creating a list is cheap.

[...]

No.  A premature optimization is one that is made before doing any
performance analysis, so language features are irrelevant to that
labeling.  This doesn't mean you shouldn't use "better" idioms when they
are clear.


This is a relevant point. I would make it even stronger: using 
iteritems() is not a premature optimization, it is a statement of 
intent. More importantly, using items() in iteration is a statement of 
expectation that the dict will change during iteration. If this is not 
in fact the case, then items() is the wrong idiom for reasons of 
readability, not (just) efficiency.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] How is obmalloc safe with "Invalid read of size 4" ?

2015-03-24 Thread Hrvoje Niksic

On 03/24/2015 03:28 PM, Karl Pickett wrote:

So we then tried running it under valgrind, and we got a lot of nasty
errors.  Even after reading the Misc/README.valgrind, which talks about
*uninitialized* reads being ok, I still don't see how reading from
*freed* memory would ever be safe, and why the suppression file thinks
thats ok:


PyObject_Free() is not reading *freed* memory, it is reading memory 
outside (right before) the allocated range. This is, of course, 
undefined behavior as far as C is concerned and an invalid read in the 
eyes of valgrind. Still, it's the kind of thing you can get away with if 
you are writing a heavily optimized allocator (and if your name starts 
with "Tim" and ends with "Peters").


README.valgrind explains in quite some detail why this is done. In 
short, it allows for a very fast check whether the memory passed to 
PyObject_Free() was originally allocated by system malloc or by Python's 
pool allocator.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] API design question: how to extend sys.settrace()?

2017-09-27 Thread Hrvoje Niksic

On 09/27/2017 02:56 PM, Victor Stinner wrote:

Hi,

In bpo-29400, it was proposed to add the ability to trace not only
function calls but also instructions at the bytecode level. I like the
idea, but I don't see how to extend sys.settrace() to add a new
"trace_instructions: bool" optional (keyword-only?) parameter without
breaking the backward compatibility. Should we add a new function
instead?


One possibility would be for settrace to query the capability on the 
part of the provided callable.


For example:

def settrace(tracefn):
if getattr(tracefn, '__settrace_trace_instructions__', False):
... we need to trace instructions ...

In general, the "trace function" can be upgraded to the concept of a 
"trace object" with (optional) methods under than __call__. This is 
extensible, easy to document, and fully supports the original 
"old=sys.gettrace(); ...; sys.settrace(old)" idiom.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make the stable API-ABI usable

2017-11-20 Thread Hrvoje Niksic

On 11/19/2017 12:50 PM, Serhiy Storchaka wrote:

But if PyTuple_GET_ITEM() is used for getting a reference to a C array
of items it can't be replaced with PyTuple_GetItem(). And actually there
is no replacement for this case in the limited API.

  PyObject **items = &PyTuple_GET_ITEM(tuple, 0);


That use case might be better covered with a new function, e.g. 
PyTuple_GetStorage, which the PyObject ** pointing to the first element 
of the internal array.


This function would serve two purposes:

* provide the performance benefits of PyTuple_GET_ITEM in tight loops, 
but without the drawback of exposing the PyTuple layout to the code that 
invokes the macro;


* allow invocation of APIs that expect a pointer to contiguous storage, 
such as STL algorithms that expect random access iterators.


Something similar is already available as PySequence_Fast_ITEMS, except 
that one is again a macro, and is tied to PySequence_FAST API, which may 
not be appropriate for the kind of performance-critical code where 
PyTuple_GET_ITEM tends to be used. (That kind of code is designed to 
deal specifically with lists or tuples and doesn't benefit from implicit 
conversion of arbitrary sequences to a temporary list; that conversion 
would only serve to mask bugs.)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tricky way of of creating a generator via a comprehension expression

2017-11-24 Thread Hrvoje Niksic

Guido van Rossum writes:

And my mind boggles when considering a generator expression
containing yield that is returned from a function. I tried this
and cannot say I expected the outcome:

    def f():
        return ((yield i) for i in range(3))
    print(list(f()))

In both Python 2 and Python 3 this prints

    [0, None, 1, None, 2, None]

Even if there's a totally logical explanation for that, I still
don't like it, and I think yield in a comprehension should be
banned. From this it follows that we should also simply ban
yield  from comprehensions.



Serhiy Storchaka writes:

This behavior doesn't look correct to me and Ivan.
The behavior is surprising, but it seems quite consistent with how 
generator expressions are defined in the language. A generator 
expression is defined by the language reference as "compact generator 
notation in parentheses", which yields (sic!) a "new generator object".


I take that to mean that a generator expression is equivalent to 
defining and calling a generator function. f() can be transformed to:


def f():
def _gen():
for i in range(3):
ret = yield i
yield ret
return _gen()

The transformed version shows that there are *two* yields per iteration 
(one explicitly written and one inserted by the transformation), which 
is the reason why 6 values are produced. The None values come from list 
constructor calling __next__() on the generator, which (as per 
documentation) sends None into the generator. This None value is yielded 
after the "i" is yielded, which is why Nones follow the numbers.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-18 Thread Hrvoje Niksic

On 11/18/2015 03:31 AM, Nick Coghlan wrote:

That behaviour is then inherited at the command line by both the -m
switch and the support for executing directories and zip archives.
When we consider that the "-c" switch also executes an in-memory
string, direct script execution is currently the odd one out in *not*
reading the entire source file into memory first, so Serhiy's proposed
simplification of the implementation makes sense to me.


Reading the whole script in memory will incur an overhead when executing 
scripts that contain (potentially large) data embedded after the end of 
script source.


The technique of reading data from sys.argv[0] is probably obsolete now 
that Python supports executing zipped archives, but it is popular in 
shell scripting and might still be used for self-extracting scripts that 
must support older Python versions. This feature doesn't affect imports 
and -c which are not expected to contain non-Python data.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reading Python source file

2015-11-18 Thread Hrvoje Niksic

On 11/18/2015 04:48 PM, Guido van Rossum wrote:

That trick doesn't work unless the data looks like Python comments or
data (e.g. a docstring). Python has always insisted on being able to
parse until EOF. The only extreme case would be a small script
followed by e.g. 4 GB of comments (where the old parser would indeed
be more efficient). But unless you can point me to an occurrence of
this in the wild I'm going to speculate that you just made this up
based on the shell analogy (which isn't perfect).


If this never really worked in Python, feel free to drop the issue. I 
may be misremembering the language in which scripts I saw using this 
techniques years ago were written - most likely sh or Perl.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistency of PyModule_AddObject()

2016-04-27 Thread Hrvoje Niksic

On 04/27/2016 09:14 AM, Serhiy Storchaka wrote:

There are three functions (or at least three documented functions) in C
API that "steals" references: PyList_SetItem(), PyTuple_SetItem() and
PyModule_AddObject(). The first two "steals" references even on failure,
and this is well known behaviour. But PyModule_AddObject() "steals" a
reference only on success. There is nothing in the documentation that
points on this.


This inconsistency has caused bugs (or, more fairly, potential leaks) 
before, see http://bugs.python.org/issue1782


Unfortunately, the suggested Python 3 change to PyModule_AddObject was 
not accepted.



1. Add a new function PyModule_AddObject2(), that steals a reference
even on failure.


This sounds like a good idea, except the name could be prettier :), e.g. 
PyModule_InsertObject. PyModule_AddObject could be deprecated.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

2014-01-07 Thread Hrvoje Niksic

On 01/07/2014 02:22 PM, Serhiy Storchaka wrote:

Most popular formatting codes in Mercurial sources:

 2519 %s
  493 %d
  102 %r
   48 %Y
   47 %M
   41 %H
   39 %S
   38 %m
   33 %i
   29 %b

[...]

Are you sure you're not including str[fp]time formats in the count?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.4: Cherry-picking into rc2 and final

2014-02-19 Thread Hrvoje Niksic

On 02/19/2014 01:20 PM, Antoine Pitrou wrote:

On Tue, 18 Feb 2014 18:46:16 -0800
Guido van Rossum  wrote:

I do think there's one legitimate concern -- someone might pull a diff from
Larry's branch and then accidentally push it back to the public repo, and
then Larry would be in trouble if he was planning to rebase that diff. (The
joys of DVCS -- we never had this problem in the cvs or svn days...)


I don't think I understand the concern. Why is this different from any
other mistake someone may make when pushing code?
Also "rebase" is only really ok on private repos, as soon as something
is published you should use "merge".


If the branch were private, pushing to it would not count as 
"publishing", but would still provide the benefit of having a redundant 
server-side backup of the data. Being able to rebase without fuss is a 
possible legitimate reason to keep the branch private, which Guido 
provided in response to Matthias's question:


>sorry, but this is so wrong. Is there *any* reason why to keep
>this branch private?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intricacies of calling __eq__

2014-03-19 Thread Hrvoje Niksic

On 03/18/2014 10:19 PM, Paul Moore wrote:

Surely in the presence of threads the optimisation is invalid anyway


Why? As written, the code uses no synchronization primitives to ensure 
that the modifications to the dict are propagated at a particular point. 
As a consequence, it cannot rely on the modification done at a time that 
coincides with execution at HERE to be immediately propagated to all 
threads.


The optimization is as valid as a C compiler rearranging variable 
assignments, which also "breaks" unsychronized threaded code.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Returning None from methods that mutate object state

2014-05-19 Thread Hrvoje Niksic

On 05/17/2014 10:26 AM, Terry Reedy wrote:
> When list.pop was added, the convention was changed to
> "do not return the 'self' parameter"

Do you have a reference for this? It is my understanding that the 
convention is for mutators to return None, in order to make it clear 
that the change is destructive. For example, the tutorial at 
https://docs.python.org/3.4/tutorial/datastructures.html says:


"""
You might have noticed that methods like insert, remove or sort that 
modify the list have no return value printed – they return None. [1] 
This is a design principle for all mutable data structures in Python.

"""

Methods like list.pop and dict.pop would seem like a case of 
"practicality beats purity" because it's more convenient (and faster) to 
delete and retrieve value at the same go.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-06 Thread Hrvoje Niksic

On 06/04/2014 05:52 PM, Mark Lawrence wrote:

On 04/06/2014 16:32, Steve Dower wrote:


If copying into a separate list is a problem (memory-wise), re.finditer('\\S+', 
string) also provides the same behaviour and gives me the sliced string, so 
there's no need to index for anything.



Out of idle curiosity is there anything that stops MicroPython, or any
other implementation for that matter, from providing views of a string
rather than copying every time?  IIRC memoryviews in CPython rely on the
buffer protocol at the C API level, so since strings don't support this
protocol you can't take a memoryview of them.  Could this actually be
implemented in the future, is the underlying C code just too
complicated, or what?



Memory view of Unicode strings is controversial for two reasons:

1. It exposes the internal representation of the string. If memoryviews 
of strings were supported in Python 3, PEP 393 would not have been 
possible (without breaking that feature).


2. Even if it were OK to expose the internal representation, it might 
not be what the users expect. For example, memoryview("Hrvoje") would 
return a view of a 6-byte buffer, while memoryview("Nikšić") would 
return a view of a 12-byte UCS-2 buffer. The user of a memory view might 
expect to get UCS-2 (or UCS-4, or even UTF-8) in all cases.


An implementation that decided to export strings as memory views might 
be forced to make a decision about internal representation of strings, 
and then stick to it.


The byte objects don't have these issues, which is why in Python 2.7 
memoryview("foo") works just fine, as does memoryview(b"foo") in Python 3.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-06 Thread Hrvoje Niksic

On 06/06/2014 05:59 PM, Terry Reedy wrote:

The other problem is that a small slice view of a large object keeps the
large object alive, so a view user needs to think carefully about
whether to make a copy or create a view, and later to copy views to
delete the base object. This is not for beginners.


And this was important enough that Java 7 actually removed the 
long-standing feature of String.substring creating a string that shares 
the character array with the original.


http://java-performance.info/changes-to-string-java-1-7-0_06/

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rationale for different signatures of tuple.__new__ and namedtuple.__new__

2013-02-18 Thread Hrvoje Niksic

On 02/18/2013 03:32 PM, John Reid wrote:

I can do

tuple([1,2,3])

but not:

from collections import namedtuple
namedtuple('B', 'x y z')([1,2,3])

I get a TypeError: __new__() takes exactly 4 arguments (2 given)
However I can do:

namedtuple('B', 'x y z')._make([1,2,3])

So namedtuple's _make classmethod looks a lot like tuple's __new__().
What's the rationale for this? Wouldn't it be better to share the same
signature for __new__?


Sharing the constructor signature with tuple would break the common case of:

namedtuple('B', 'x y z')(1, 2, 3)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] can't assign to function call

2013-03-18 Thread Hrvoje Niksic

On 03/18/2013 03:23 PM, Chris Angelico wrote:

The languages that permit you to assign to a function call all have
some notion of a reference type.


Assigning to function calls is orthogonal to reference types.  For 
example, Python manages assignment to subscripts without having 
references just fine:


val = obj[index]  # val = obj.__getitem__(index)
obj[index] = val  # obj.__setitem__(index, val)

In analogy with that, Python could implement what looks like assignment 
to function call like this:


val = f(arg)  # val = f.__call__(arg)
f(arg) = val  # f.__setcall__(arg, val)

I am not arguing that this should be added, I'm only pointing out that 
Python's object customization is not fundamentally at odds with 
assignment to function calls.  Having said that, I am in fact arguing 
that Python doesn't need them.  All C++ uses of operator() overloads can 
be implemented with the subscript operator.


Even if one needs more different assignments than there are operators, 
Python can provide it as easily as C++.  For example, on 
std::vector::operator[] provides access to the container without error 
checking, and std::vector::at() checks bounds:


vec[i] = val  // no error checking
vec.at(i) = val   // error checking

This is trivially translated to Python as:

vec[i] = val  # primary functionality, use __setitem__
vec.at[i] = val   # secondary functionality, __setitem__ on a proxy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] can't assign to function call

2013-03-18 Thread Hrvoje Niksic

On 03/18/2013 04:40 PM, Steven D'Aprano wrote:

In analogy with that, Python could implement what looks like assignment to 
function call like this:

val = f(arg)  # val = f.__call__(arg)
f(arg) = val  # f.__setcall__(arg, val)


That's all very well, but what would it do? It's not enough to say

> that the syntax could exist, we also need to have semantics.

I am not the best person to answer because I go on to argue that this 
syntax is not needed in Python at all (anything it can do can be 
implemented with __setitem__ at no loss of clarity).  Still, if such a 
feature existed in Python, I imagine people would use it to set the same 
resource that the function obtains, where such a thing is applicable.



Aside: I'd reverse the order of the arg, val in any such hypothetical

> __setcall__, so as to support functions with zero or more arguments:


f(*args, **kwargs) = val  <=>  f.__setcall__(val, *args, **kwargs)


That would be a better design, I agree.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-03 Thread Hrvoje Niksic

On 04/03/2013 01:17 PM, Nick Coghlan wrote:

 > > >
 > > I like Nick's answer to that: int *should* always return something of
 > > exact type int.  Otherwise you're always left wondering whether you
 > > have to do "int(int(x))", or perhaps even "int(int(int(x)))", to be
 > > absolutely sure of getting an int.
 >
 > Agreed.

Perhaps we should start emitting a DeprecationWarning for int subclasses
returned from __int__ and __index__ in 3.4?


Why would one want to be absolutely sure of getting an int?

It seems like a good feature that an __int__ implementation can choose 
to return an int subclass with additional (and optional) information. 
After all, int subclass instances should be usable everywhere where ints 
are, including in C code.  I can imagine numpy and similar projects 
would be making use of this ability already -- just think of uses for 
numpy's subclasses of "float".


If one wants to break the abstraction and be absolutely positively sure 
of getting an int and not a subclass thereof, they can write something 
like (0).__add__(obj).  But I suspect this will be extremely rare.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Semantics of __int__(), __index__()

2013-04-04 Thread Hrvoje Niksic

Eric Snow:

On Wed, Apr 3, 2013 at 6:47 AM, Hrvoje Niksic  wrote:

It seems like a good feature that an __int__ implementation can choose to
return an int subclass with additional (and optional) information. After
all, int subclass instances should be usable everywhere where ints are,
including in C code.


Unless you want to try to use the concrete C-API in CPython.  In my
experience the concrete API is not very subclass friendly.


Nick:
> Using it with subclasses is an outright bug (except as part of
> a subclass implementation).

This is true for mutable objects like dicts and lists where calling 
things like PyDict_SetItem will happily circumvent the object.


But for ints and floats, all that the C code really cares about is the 
object's intrinsic value as returned by PyLong_AS_LONG and friends, 
which is constant and unchangeable by subclasses.


The typical reason to subclass int is to add more information or new 
methods on the instance, not to break basic arithmetic. Doing anything 
else breaks subtitutability and is well outside the realm of "consenting 
adults". Someone who wants to change basic arithmetic is free to 
implement an independent int type.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 409 and the stdlib

2013-05-21 Thread Hrvoje Niksic

On 05/20/2013 05:15 PM, Ethan Furman wrote:

1)  Do nothing and be happy I use 'raise ... from None' in my own libraries

2)  Change the wording of 'During handling of the above exception, another 
exception occurred' (no ideas as to what at
the moment)


The word "occurred" misleads one to think that, during handling of the 
real exception, an unrelated and unintended exception occurred.  This is 
not the case when the "raise" keyword is used.  In that case, the 
exception was intentionally *converted* from one type to another.  For 
the "raise" case a wording like the following might work better:


The above exception was converted to the following exception:
...

That makes it clear that the conversion was explicit and (hopefully) 
intentional, and that the latter exception supersedes the former.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 409 and the stdlib

2013-05-21 Thread Hrvoje Niksic

On 05/21/2013 10:36 AM, Serhiy Storchaka wrote:

 The above exception was converted to the following exception:
 ...

That makes it clear that the conversion was explicit and (hopefully)
intentional, and that the latter exception supersedes the former.


How do you distinguish intentional and unintentional exceptions?


By the use of the "raise" keyword.  Given the code:

try:
x = d['key']
except KeyError:
raise BusinessError(...)

...the explicit raising is a giveaway that the new exception was quite 
intentional.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 409 and the stdlib

2013-05-21 Thread Hrvoje Niksic

On 05/21/2013 11:56 AM, Serhiy Storchaka wrote:

try:
  x = d['key']
except KeyError:
  x = fallback('key')

def fallback(key):
  if key not in a:
  raise BusinessError(...)
  return 1 / a[key] # possible TypeError, ZeroDivisionError, etc


Yes, in that case the exception will appear unintentional and you get 
the old message — it's on a best-effort basis.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 409 and the stdlib

2013-05-21 Thread Hrvoje Niksic

On 05/21/2013 02:57 PM, Serhiy Storchaka wrote:

21.05.13 13:05, Hrvoje Niksic написав(ла):

On 05/21/2013 11:56 AM, Serhiy Storchaka wrote:

try:
  x = d['key']
except KeyError:
  x = fallback('key')

def fallback(key):
  if key not in a:
  raise BusinessError(...)
  return 1 / a[key] # possible TypeError, ZeroDivisionError, etc


Yes, in that case the exception will appear unintentional and you get
the old message — it's on a best-effort basis.


In both cases the BusinessError exception raised explicitly. How do you
distinguish one case from another?


In my example code the "raise" keyword appears lexically inside the 
"except" clause.  The compiler would automatically emit a different 
raise opcode in that case.


NB in your example the "raise" is just as intentional, but invoked from 
a different function, which causes the above criterion to result in a 
false negative.  Even in so, the behavior would be no worse than now, 
you'd just get the old message.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 409 and the stdlib

2013-06-21 Thread Hrvoje Niksic

On 05/21/2013 10:36 AM, Serhiy Storchaka wrote:

21.05.13 10:17, Hrvoje Niksic написав(ла):

On 05/20/2013 05:15 PM, Ethan Furman wrote:

1)  Do nothing and be happy I use 'raise ... from None' in my own
libraries

2)  Change the wording of 'During handling of the above exception,
another exception occurred' (no ideas as to what at
the moment)


The word "occurred" misleads one to think that, during handling of the
real exception, an unrelated and unintended exception occurred.  This is
not the case when the "raise" keyword is used.  In that case, the
exception was intentionally *converted* from one type to another.  For
the "raise" case a wording like the following might work better:

 The above exception was converted to the following exception:
 ...

That makes it clear that the conversion was explicit and (hopefully)
intentional, and that the latter exception supersedes the former.


How do you distinguish intentional and unintentional exceptions?


By the use of the "raise" keyword.  Given the code:

try:
x = bla['foo']
except KeyError:
raise BusinessError(...)

...explicit raise is a giveaway that the exception replacement was quite 
intentional.


Hrvoje

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add a "transformdict" to collections

2013-09-10 Thread Hrvoje Niksic

On 09/10/2013 02:24 PM, Paul Moore wrote:

td['FOO'] = 42
td['foo'] = 32
list(td.keys())


['FOO'] or ['foo']? Both answers are justifiable.


Note that the same question can be reasonably asked for dict itself:

>>> d = {}
>>> d[1.0] = 'foo'
>>> d[1] = 'bar'
>>> d
{1.0: 'bar'}

So, dict.__setitem__ only replaces the value, leaving the original key 
in place. transformdict should probably do the same, returning 'FOO' in 
your example.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sys.intern should work on bytes

2013-09-23 Thread Hrvoje Niksic

On 09/20/2013 06:50 PM, PJ Eby wrote:

On Fri, Sep 20, 2013 at 9:54 AM, Jesus Cea  wrote:

Why str/bytes doesn't support weakrefs, beside memory use?


The typical use case for weakrefs is to break reference cycles,


Another typical use case, and the prime reason why languages without 
reference counting tend to introduce weak references, is managing object 
caches with automatic disposal of otherwise unused items. Such a cache 
is rarely necessary for primitive objects, so Python's choice to spare 
memory for weakrefs is quite justified.


However, if one wanted to implement their own sys.intern(), inability to 
refer to strings would become a problem. This is one reason why 
sys.intern() directly fiddles with reference counts instead of reusing 
the weakref machinery. (The other of course being that intern predates 
weakrefs by many years.)


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] type.__subclasses__() doesn't work

2013-10-09 Thread Hrvoje Niksic

On 10/09/2013 02:22 PM, Peter Otten wrote:

py> type.__subclasses__(type)
[, ]


The underlying problem seems to be that there is no helper function to
bypass the instance attribute.


Note that the problem is specific to the "type" type, which is its own 
metatype. With other types that get __subclasses__ from type, is no 
problem with just calling __subclasses__():


>>> int.__subclasses__()
[]

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] List mutation in list_repr?

2016-12-06 Thread Hrvoje Niksic

> and I also don’t see any clue in the source as to when [list mutation]
> would actually happen. Since inside the loop, the list object `v` is
> never accessed other than passing `v->ob_item[i]` to the recursive
> repr call, there shouldn’t be any mutation on the list object itself.

The individual object can have a reference to the list and (in extreme 
cases) do with it what it pleases:


class Evil:
def __init__(self, l):
self.l = l

def __repr__(self):
del l[:]
return "evil"

l = []
l.append(Evil(l))
l.append(Evil(l))
print(l)

That is not something normal Python code does, but it shouldn't be 
allowed to crash the interpreter, either.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] multiprocessing not compatible with functional.partial

2009-02-12 Thread Hrvoje Niksic

Calvin Spealman wrote:

I don't think it would be unreasonable to consider either 1) making
functools.partial picklable (I don't know how feasible this is)


It's not only feasible, but quite easy and, I think, useful.  A 
"partial" instance is a simple triplet of (function, args, kwds), and it 
can be pickled as such.  For example:


>>> import copy_reg, functools
>>> def _reconstruct_partial(f, args, kwds):
... return functools.partial(f, *args, **(kwds or {}))
...
>>> def _reduce_partial(p):
... return _reconstruct_partial, (p.func, p.args, p.keywords)
...
>>> copy_reg.pickle(functools.partial, _reduce_partial)

Test:

>>> import operator, cPickle as cp
>>> p = functools.partial(operator.add, 3)
>>> p(10)
13
>>> cp.dumps(p)
'c__main__\n_reconstruct_partial\np1\n(coperator\nadd\np2\n(I3\ntp3\nNtRp4\n.'
>>> p2 = cp.loads(_)
>>> p2(10)
13

Iedally this should be implemented in the functools.partial object itself.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 372 -- Adding an ordered directory to collections ready for pronouncement

2009-03-04 Thread Hrvoje Niksic

Steven D'Aprano wrote:

Gisle Aas wrote:

Instead of introducing a sorteddict I would instead suggest that the 
future should bring an odict with a sort method; possibly also 
keys_sorted and items_sorted methods.


Instead of odict.sorted(), that can be spelled:

sorted(odict)  # sort the keys
sorted(odict.values())  # sort the items
sorted(odict.items())  # sort the (key, value) pairs


All of these are useful, but not when you want to sort the odict 
in-place.  Since ordered dict is all about order, a method for changing 
the underlying key order seems quite useful.  An odict.sort() would be 
easy to implement both in the current code (where it would delegate to 
self._keys.sort()) and in an alternative implementation using a linked 
list of keys (variants of mergesort work very nicely with linked lists).



The idea of a SortedDict is that it should be sorted at all times,


A SortedDict would be nice to have, but we might rather want to use a 
balanced tree for the its C implementation, i.e. not inherit from dict 
at all.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] asyncore fixes in Python 2.6 broke Zope's version of medusa

2009-03-06 Thread Hrvoje Niksic

Greg Ewing wrote:

Antoine Pitrou wrote:


For starters, since py3k is supposed to support non-blocking IO, why not write a
portable API to make a raw file or socket IO object non-blocking?


I think we need to be clearer what we mean when we talk
about non-blocking in this context. Normally when you're
using select/poll you *don't* make the underlying file
descriptor non-blocking in the OS sense. The non-blockingness
comes from the fact that you're using select/poll to make
sure the fd is ready before you access it.

So I don't think it makes sense to talk about having a
non-blocking API as a separate thing from a select/poll
wrapper. The select/poll wrapper *is* the non-blocking
API.


This is not necessarily the case.  In fact, modern sources often 
recommend using poll along with the non-blocking sockets for (among 
other things) performance reasons.  For example, when a non-blocking 
socket becomes readable, you don't read from it only once and go back to 
the event loop, you read from it in a loop until you get EAGAIN.  This 
allows for processing of fast-incoming data with fewer system calls.


Linux's select(2) man page includes a similar advice with different 
motivation:


   Under Linux, select() may report a socket file descriptor
   as "ready for reading",  while  nevertheless
   a subsequent read blocks.  This could for example
   happen when data has arrived but upon
   examination has wrong checksum and is discarded.  There may
   be other circumstances  in  which  a
   file  descriptor  is  spuriously  reported  as ready.
   Thus it may be safer to use O_NONBLOCK on
   sockets that should not block.

Even if you don't agree that using O_NONBLOCK with select/poll is the 
best approach to non-blocking, I think there is enough existing practice 
of doing this to warrant separate consideration of non-blocking sockets 
(in the OS sense) and select/poll.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] asyncore fixes in Python 2.6 broke Zope's version of medusa

2009-03-06 Thread Hrvoje Niksic

Greg Ewing wrote:
Even if you don't agree that using O_NONBLOCK with select/poll is the 
best approach to non-blocking, I think there is enough existing practice 
of doing this to warrant separate consideration of non-blocking sockets 
(in the OS sense) and select/poll.


I'm not saying there isn't merit in having support for
non-blocking file descriptors, only that it's not in
any sense a prerequisite or first step towards a
select/poll wrapper. They're orthogonal issues, even
if you might sometimes want to use them together.


In that case we are in agreement.  Looking back, I was somewhat confused 
by this paragraph:


So I don't think it makes sense to talk about having a
non-blocking API as a separate thing from a select/poll
wrapper. The select/poll wrapper *is* the non-blocking
API.

If they're orthogonal, then it does make sense to talk about having a 
separate non-blocking socket API and poll API, even if the latter can be 
used to implement non-blocking *functionality* (hypothetical Linux 
issues aside).

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Hrvoje Niksic

Joachim König wrote:
To me, the flaw seem to be in the close() call (of the operating 
system). I'd expect the data to be

in a persistent state once the close() returns.


I wouldn't, because that would mean that every cp -r would effectively 
do an fsync() for each individual file it copies, which would bog down 
in the case of copying many small files.  Operating systems aggressively 
buffer file systems for good reason: performance of the common case.



Why has this ext4 problem not come up for other filesystems?


It has come up for XFS many many times, for example 
https://launchpad.net/ubuntu/+bug/37435


ext3 was resillient to the problem because of its default allocation 
policy; now that ext4 has implemented the same optimization XFS had 
before, it shares the problems.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ext4 data loss

2009-03-11 Thread Hrvoje Niksic

Christian Heimes wrote:

Guido van Rossum wrote:

Let's not think too Unix-specific. If we add such an API it should do
something on Windows too -- the app shouldn't have to test for the
presence of the API. (And thus the API probably shouldn't be called
fsync.)


In my initial proposal one and a half hour earlier I suggested 'sync()'
as the name of the method and 'synced' as the name of the flag that
forces a fsync() call during the close operation.


Maybe it would make more sense for "synced" to force fsync() on each 
flush, not only on close.  I'm not sure how useful it is, but that's 
what "synced" would imply to me.  Maybe it would be best to avoid having 
such a variable, and expose a close_sync() method instead?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] In-place operators

2009-03-18 Thread Hrvoje Niksic

Martin v. Löwis wrote:

Certainly, the doc string is wrong:

  isub(a, b) -- Same as a -= b.

That's not quite the same - you would have to write

  a = isub(a, b) -- Same as a -= b.


It seems a perfectly fine solution is simply to fix the docstring, 
exactly like that:


a = isub(a, b) -- Same as a -= b.

This both corrects the error when using isub with immutable types and 
educates user as to how a -= b actually works.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: new list function: pack

2009-03-20 Thread Hrvoje Niksic

Isaac Morland wrote:

I propose this because i need a lot of times pack and slide function
over list and this one
combine the two in a generator way.


I've written functions with a subset of this functionality on more than 
one occasion.  Having it in itertools looks like it would be useful to a 
lot of people.



See the Python documentation for zip():

http://docs.python.org/library/functions.html#zip


zip can be used to achieve this purpose, but only with serious 
itertools-fu.  If I want to iterate over a list [1, 2, 3, 4] looking at 
pairs (1, 2) and (3, 4), it would be much nicer to write:


for a, b in itertools.pack(l, 2):
...

than

for a, b in itertools.izip(*[iter(l)]*2):
...

which is what the zip documentation proposes.  The former would be clear 
to anyone looking at the documentation of "pack" (and maybe even without 
it if we choose a better name), while the latter requires quite some 
deciphering, followed by carefully looking at izip's documentation that 
it's actually legal to rely on argument evaluation order and not peeking 
at iterables, like that code does.


izip is not the only contender for this pattern; something similar is 
possible using groupby, but it's hard to make it fit in an easily 
understable line either.  This is the shortest I came up with:


def pack(iterable, n):
cycler = (i for i in itertools.count() for j in xrange(n))
return (g for k, g in
itertools.groupby(iterable, lambda x: cycler.next()))

This has the nice property that it returns iterables rather than tuples, 
although tuples are probably good enough (they seem to be good enough 
for izip).


The name "pack" is a bit too cryptic, even by itertools standards, so it 
might be better to choose a name that conveys the intention of returning 
"groups of n adjacent elements" (group_adjacent?).  To fit with the rest 
of itertools, and to be really useful, the function shouldn't insist on 
sequences, but should accept any iterable.



http://drj11.wordpress.com/2009/01/28/my-python-dream-about-groups/


That posting ends with:

"""
It still scares me a bit.
This code is obviously ridiculous. I can’t help feeling I’ve missed a 
more Pythonic way of doing it.

"""

Looking at izip(*[iter(l)]*n), I tend to agree.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] speeding up PyObject_GetItem

2009-03-24 Thread Hrvoje Niksic

Nick Coghlan wrote:

Many of the routines in abstract.c check their parameters for NULL, as a
sanity check, and throw a SystemError if NULL is detected.  I'm tempted
to suggest making these checks only when Py_DEBUG is defined, but I
suspect if you wanted it that way, you would have changed it already. ;)

Assuming you really want the NULL checks in production Python, there are
5 checks for NULL even though there are only two parameters.  Seems like
overkill?


The main problem is that many of these methods are not only used
internally, but are *also* part of the public C API made available to
extension modules. We want misuse of the latter to trigger exceptions,
not segfault the interpreter.


Agreed, and more importantly, I have yet to be convinced that those NULL 
checks introduce a measurable slowdown.  Daniel, have you tried 
measuring the performance difference with only the NULL checks removed?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] suggestion for try/except program flow

2009-03-27 Thread Hrvoje Niksic

Mark Donald wrote:

I frequently have this situation:

try:
try:
raise Thing
except Thing, e:
# handle Thing exceptions
raise
except:
# handle all exceptions, including Thing


How about:

try:
... code that can raise Thing or another exception ...
except Exception, e:
if isinstance(e, Thing):
# handle thing
# generic exception handling
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] suggestion for try/except program flow

2009-03-27 Thread Hrvoje Niksic

Mark Donald wrote:

Thanks for the idea, but I believe it has a different outcome. You'd
have to copy the generic handler into an except clause to get exactly
the behaviour I'm looking for, which is worse than nesting the try
blocks


Then simply change Exception to BaseException.  Since all exceptions 
should derive from BaseException, there should be no difference between 
that and a bare "except:" clause.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Let's update CObject API so it is safe and regular!

2009-04-02 Thread Hrvoje Niksic

Greg Ewing wrote:

Attaching some kind of type info to a CObject and having
an easy way of checking it makes sense to me. If the
existing CObject API can't be changed, maybe a new
enhanced one could be added.


I thought the entire *point* of C object was that it's an opaque box 
without any info whatsoever, except that which is known and shared by 
its creator and its consumer.


If we're adding type information, then please make it a Python object 
rather than a C string.  That way the creator and the consumer can use a 
richer API to query the "type", such as by calling its methods or by 
inspecting it in some other way.  Instead of comparing strings with 
strcmp, it could use PyObject_RichCompareBool, which would allow a much 
more flexible way to define "types".  Using a PyObject also ensures that 
the lifecycle of the attached "type" is managed by the well-understood 
reference-counting mechanism.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Let's update CObject API so it is safe and regular!

2009-04-03 Thread Hrvoje Niksic

Larry Hastings wrote:
If we're adding type information, then please make it a Python object 
rather than a C string.  That way the creator and the consumer can use 
a richer API to query the "type", such as by calling its methods or by 
inspecting it in some other way.


I'm not writing my patch that way; it would be too cumbersome for what 
is ostensibly an easy, light-weight API.   If you're going that route

> you might as well create a real PyTypeObject for the blob you're
> passing in.

Well, that's exactly the point, given a PyObject* tag, you can add any 
kind of type identification you need, including some Python type.  (It 
is assumed that the actual pointer you're passing is not a PyObject 
itself, of course, otherwise you wouldn't need PyCObject at all.)


I have no desire to compete with your patch, it was a suggestion for 
(what I see as) improvement.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Getting values stored inside sets

2009-04-03 Thread Hrvoje Niksic
I've stumbled upon an oddity using sets.  It's trivial to test if a 
value is in the set, but it appears to be impossible to retrieve a 
stored value, other than by iterating over the whole set.  Let me 
describe a concrete use case.


Imagine a set of objects identified by some piece of information, such 
as a "key" slot (guaranteed to be constant for any particular element). 
 The object could look like this:


class Element(object):
def __init__(self, key):
self.key = key
def __eq__(self, other):
return self.key == other
def __hash__(self):
return hash(self.key)
# ...

Now imagine a set "s" of such objects.  I can add them to the set:

>>> s = set()
>>> s.add(Element('foo'))
>>> s.add(Element('bar'))

I can test membership using the keys:

>>> 'foo' in s
True
>>> 'blah' in s
False

But I can't seem to find a way to retrieve the element corresponding to 
'foo', at least not without iterating over the entire set.  Is this an 
oversight or an intentional feature?  Or am I just missing an obvious 
way to do this?


I know I can work around this by changing the set of elements to a dict 
that maps key -> element, but this feels unsatisfactory.  It's 
redundant, as the element already contains all the necessary 
information, and the set already knows how to use it, and the set must 
remember the original elements anyway, to be able to iterate over them, 
so why not allow one to retrieve them?  Secondly, the data structure I 
need conceptually *is* a set of elements, so it feels wrong to 
pigeonhole it into a dict.


This wasn't an isolated case, we stumbled on this several times while 
trying to use sets.  In comparison, STL sets don't have this limitation.


If this is not possible, I would like to propose either that set's 
__getitem__ translates key to value, so that s['foo'] would return the 
first element, or, if this is considered ugly, an equivalent method, 
such as s.get('foo').

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Getting values stored inside sets

2009-04-06 Thread Hrvoje Niksic

Raymond Hettinger wrote:

Hrvoje Niksic wrote:
I've stumbled upon an oddity using sets.  It's trivial to test if a 
value is in the set, but it appears to be impossible to retrieve a 
stored value, 


See:  http://code.activestate.com/recipes/499299/


Thanks, this is *really* good, the kind of idea that seems perfectly 
obvious once pointed out by someone else.  :-)  I'd still prefer sets to 
get this functionality so they can be used to implement, say, interning, 
but this is good enough for me.


In fact, I can derive from set and add a method similar to that in the 
recipe.  It can be a bit simpler than yours because it only needs to 
support operations needed by sets (__eq__ and __hash__), not arbitrary 
attributes.


class Set(set):
def find(self, item, default=None):
capt = _CaptureEq(item)
if capt in self:
return capt.match
return default

class _CaptureEq(object):
__slots__ = 'obj', 'match'
def __init__(self, obj):
self.obj = obj
def __eq__(self, other):
eq = (self.obj == other)
if eq:
self.match = other
return eq
def __hash__(self):
return hash(self.obj)

>>> s = Set([1, 2, 3])
>>> s.find(2.0)
2
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 (again)

2009-04-28 Thread Hrvoje Niksic

Lino Mastrodomenico wrote:

Let's suppose that I use Python 2.x or something else to create a file
with name b'\xff'. My (Linux) system has a sane configuration and the
filesystem encoding is UTF-8, so it's an invalid name but the kernel
will blindly accept it anyway.

With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'.


One question that really bothers me about this proposal is the following:

Assume a UTF-8 locale.  A file named b'\xff', being an invalid UTF-8 
sequence, will be converted to the half-surrogate '\udcff'.  However, a 
file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be 
converted to '\udcff'.  Those are quite different POSIX pathnames; how 
will Python know which one it was when I later pass '\udcff' to open()?


A poster hinted at this question, but I haven't seen it answered, yet.


[1]
I'm assuming that it's valid UTF8 because it passes through Python 2.5's 
'\xed\xb3\xbf'.decode('utf-8').  I don't claim to be a UTF-8 expert.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Hrvoje Niksic

Thomas Breuel wrote:
But the biggest problem with the proposal is that it isn't needed: if 
you want to be able to turn arbitrary byte sequences into unicode 
strings and back, just set your encoding to iso8859-15.  That already 
works and it doesn't require any changes.


Are you proposing to unconditionally encode file names as iso8859-15, or 
to do so only when undecodeable bytes are encountered?


If you unconditionally set encoding to iso8859-15, then you are 
effectively reverting to treating file names as bytes, regardless of the 
locale.  You're also angering a lot of European users who expect 
iso8859-2, etc.


If you switch to iso8859-15 only in the presence of undecodable UTF-8, 
then you have the same round-trip problem as the PEP: both b'\xff' and 
b'\xc3\xbf' will be converted to u'\u00ff' without a way to 
unambiguously recover the original file name.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-28 Thread Hrvoje Niksic

Lino Mastrodomenico wrote:

Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character 
when
decoded with UTF-8, it should simply be considered an invalid UTF-8
sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not*
'\udcff').


"Should be considered" or "will be considered"?  Python 3.0's UTF-8 
decoder happily accepts it and returns u'\udcff':


>>> b'\xed\xb3\xbf'.decode('utf-8')
'\udcff'

If the PEP depends on this being changed, it should be mentioned in the PEP.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-29 Thread Hrvoje Niksic

Zooko O'Whielacronx wrote:
If you switch to iso8859-15 only in the presence of undecodable  
UTF-8, then you have the same round-trip problem as the PEP: both  
b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a  
way to unambiguously recover the original file name.


Why do you say that?  It seems to work as I expected here:

 >>> '\xff'.decode('iso-8859-15')
u'\xff'
 >>> '\xc3\xbf'.decode('iso-8859-15')
u'\xc3\xbf'


Here is what I mean by "switch to iso8859-15" only in the presence of 
undecodable UTF-8:


def file_name_to_unicode(fn, encoding):
try:
return fn.decode(encoding)
except UnicodeDecodeError:
return fn.decode('iso-8859-15')

Now, assume a UTF-8 locale and try to use it on the provided example 
file names.


>>> file_name_to_unicode(b'\xff', 'utf-8')
'ÿ'
>>> file_name_to_unicode(b'\xc3\xbf', 'utf-8')
'ÿ'

That is the ambiguity I was referring to -- to different byte sequences 
result in the same unicode string.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Functions that steal references (Re: [pygame] [patch] minor memory leaks...)

2009-06-17 Thread Hrvoje Niksic

Christian Heimes wrote:

I assumed that since PyModule_AddObject is documented as stealing a
reference, it always stole a reference. But in reality it only does so
conditionally, when it succeeds.

As an aside, is this a general feature of functions
that steal references, or is PyModule_AddObject an
oddity?


IIRC, It's an oddity.


But it is a convenient oddity nonetheless.


Stealing references is sometimes convenient, but Greg was referring to 
functions that steal references *conditionally*, which is indeed an 
oddity.  Most functions and macros that steal references do so 
unconditionally, typically because they can't fail anyway.  Conditional 
stealing of references requires very careful thinking on the side of 
callers that care about not leaking references in the face of 
exceptions.  See http://bugs.python.org/issue1782 for an example.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Py_TPFLAGS_HEAPTYPE too overloaded

2009-07-30 Thread Hrvoje Niksic

Campbell Barton wrote:

I'm not expert enough in this area to know if malloc'ing PyTypeObject
and initializing has some other problems.


The only problem is that such types will be expected to be around 
forever - they are not reference-counted like heap types, so there is no 
mechanism to free them once they are no longer needed.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 389: argparse - new command line parsing module

2009-10-07 Thread Hrvoje Niksic

Paul Moore wrote:

Traceback (most recent call last):
  File "hello.py", line 13, in 
main()
  File "hello.py", line 7, in main
sys.stdout.flush()
IOError: [Errno 9] Bad file descriptor

(Question - is it *ever* possible for a Unix program to have invalid
file descriptors 0,1 and 2? At startup - I'm assuming anyone who does
os.close(1) knows what they are doing!)


Of course; simply use the >&- pseudo-redirection, which has been a 
standard sh feature (later inherited by ksh and bash, but not csh) for 
~30 years.  The error message is amusing, too:


$ python -c 'print "foo"' >&-
close failed in file object destructor:
Error in sys.excepthook:

Original exception was:


Adding an explicit flush results in a more understandable error message:

Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 9] Bad file descriptor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] recursive closures - reference leak

2009-12-08 Thread Hrvoje Niksic

Kristján Valur Jónsson wrote:
The problem with this is that once you have called factorial() once, you 
end up with a recursive cycle.  „factorial“ has become a cell object, 
referencing the „helper“ function, which again refers to the outer cell 
object.  This requires „gc“ to clean up.  Also, it is entirely 
non-obvious.  the problem becomes worse if the inner function also 
refers to some large, temporary variable, since it will get caught up in 
the reference loop.


What problem are you referring to?  Python has a gc exactly to deal with 
situations like this one.  Surely you are aware that the cycle collector 
is invoked automatically without requiring user intervention.  What 
specific issue are you trying to work around?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Possible patch for functools partial - Interested?

2010-05-17 Thread Hrvoje Niksic

On 05/14/2010 06:39 AM, Daniel Urban wrote:

I've made a new patch, in which the keywords attribute is a read-only
proxy of the dictionary.


What about backward compatibility?  This looks like an incompatible change.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New regex module for 3.2?

2010-07-23 Thread Hrvoje Niksic

On 07/22/2010 01:34 PM, Georg Brandl wrote:

Timings (seconds to run the test suite):

re 26.689  26.015  26.008
regex  26.066  25.797  25.865

So, I thought there wasn't a difference in performance for this use case
(which is compiling a lot of regexes and matching most of them only a
few times in comparison).  However, I found that looking at the regex
caching is very important in this case: re._MAXCACHE is by default set to
100, and regex._MAXCACHE to 1024.  When I set re._MAXCACHE to 1024 before
running the test suite, I get times around 18 (!) seconds for re.


This seems to point to re being significantly *faster* than regexp, even 
in matching, and as such may be something the author would want to look 
into.


Nick writes:

> That still fits with the compile/match performance trade-off changes
> between re and regex though.

The performance trade-off should make regex slower with sufficiently 
small compiled regex cache, when a lot of time is wasted on compilation. 
 But as the cache gets larger (and, for fairness, of the same size in 
both implementations), regex should outperform re.  Georg, would you 
care to measure if there is a difference in performance with an even 
larger cache?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] mkdir -p in python

2010-07-28 Thread Hrvoje Niksic

On 07/27/2010 06:18 PM, Alexander Belopolsky wrote:

On Tue, Jul 20, 2010 at 10:20 AM, R. David Murray  wrote:

 I'd go with putting it in shutil.


+1

I would also call it shutil.mktree which will go well with
shutil.rmtree next to it.


Note that mktree is not analogous to rmtree - while rmtree removes a 
directory tree beneath a specified directory, mktree would only create a 
single "branch", not an entire tree.  I'd imagine a mktree function to 
accept a data structure describing the tree to be created.


If you're going for a short name distinctive from mkdir, I propose 
mksubdirs.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-24 Thread Hrvoje Niksic

On 08/23/2010 04:56 PM, Guido van Rossum wrote:

On Mon, Aug 23, 2010 at 7:46 AM, Benjamin Peterson  wrote:

 2010/8/23 Yury Selivanov:

 1) I propose to change 'hasattr' behaviour in Python 3, making it to swallow 
only AttributeError exceptions (exactly like 'getattr').  Probably, Python 3.2 
release is our last chance.


 I would be in support of that.


I am cautiously in favor. The existing behavior is definitely a
mistake and a trap. But it has been depended on for almost 20 years
now.


I'll note that a similar incompatible change has made it to python2.6. 
This has bitten us in production:


class X(object):
def __getattr__(self, name):
raise KeyError, "error looking for %s" % (name,)

def __iter__(self):
yield 1

print list(X())

I would expect it to print [1], and in python2.5 it does.  In python2.6 
it raises a KeyError!  The attribute being looked up is an unexpected one:


{hrzagude5003}[~]$ python2.6 a.py
Traceback (most recent call last):
  File "a.py", line 8, in 
print list(X())
  File "a.py", line 3, in __getattr__
raise KeyError, "error looking for %s" % (name,)
KeyError: 'error looking for __length_hint__'

The __length_hint__ lookup expects either no exception or 
AttributeError, and will propagate others.  I'm not sure if this is a 
bug.  On the one hand, throwing anything except AttributeError from 
__getattr__ is bad style (which is why we fixed the bug by deriving our 
business exception from AttributeError), but the __length_hint__ check 
is supposed to be an internal optimization completely invisible to the 
caller of list().


Being aware that this can be construed as an argument both in favor and 
against the change at hand, my point is that, if propagating 
non-AttributeError exceptions is done in checks intended to be 
invisible, it should certainly be done in hasattr, where it is at least 
obvious what is being done.  Other generic functions and operators, 
including boolean ones such as ==, happily propagate exceptions.


Also, don't expect that this won't break code out there.  It certainly 
will, it's only a matter of assessment whether such code was broken in a 
different, harder to detect way, to begin with.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-24 Thread Hrvoje Niksic

On 08/24/2010 02:31 PM, Benjamin Peterson wrote:

2010/8/24 Hrvoje Niksic:

 The __length_hint__ lookup expects either no exception or AttributeError,
 and will propagate others.  I'm not sure if this is a bug.  On the one hand,
 throwing anything except AttributeError from __getattr__ is bad style (which
 is why we fixed the bug by deriving our business exception from
 AttributeError), but the __length_hint__ check is supposed to be an internal
 optimization completely invisible to the caller of list().


__length_hint__ is internal and undocumented, so it can do whatever it wants.


Of course, but that's beside the point.  In this case __length_hint__ 
was neither implemented in the class, nor were we aware of its 
existence, and the code still broke (as in the example in previous 
mail).  The point I'm making is that:


a) a "business" case of throwing anything other than AttributeError from 
__getattr__ and friends is almost certainly a bug waiting to happen, and


b) making the proposed change is bound to break real, production code.

I still agree with the proposed change, but I wanted to also point out 
that it will cause breakage and illustrate it with a similar real-world 
example that occurred during migration to python 2.6.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buffer protocol for io.BytesIO?

2010-09-03 Thread Hrvoje Niksic

On 09/02/2010 10:35 PM, Antoine Pitrou wrote:

 Then it came to me then perhaps it would be too automatic. So I'm
currently floating between:
- add implicit buffer protocol support to BytesIO objects
- add explicit buffer protocol support through the call of a
   getbuffer() method, which would return a small intermediate object
   supporting the buffer protocol on behalf of the original BytesIO
   object

What do you think would be better?


getbuffer() sounds better.  Accessing buffer contents is a nice feature, 
but one shouldn't be able to do it by accident.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Behaviour of max() and min() with equal keys

2010-09-08 Thread Hrvoje Niksic

On 09/07/2010 11:40 PM, Jeffrey Yasskin wrote:

Decimal may actually have this backwards. The idea would be that
min(*lst) == sorted(lst)[0], and max(*lst) == sorted(lst)[-1].


Here you mean "is" rather than "==", right?  The relations you spelled 
are guaranteed regardless of stability.


(This doesn't apply to Decimal.max and Decimal.min, which return new 
objects.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Resource leaks warnings

2010-09-29 Thread Hrvoje Niksic

On 09/29/2010 02:42 PM, Antoine Pitrou wrote:

 It seems like a slippery slope. Sometimes you really don't care like
 when you're just hacking together a quick script.


Isn't the "with" statement appropriate in these cases?


A hacked-together quick script might contain code like:

parse(open(bla).read())

Compared to this, "with" adds a new indentation level and a new 
variable, while breaking the flow of the code:


with open(bla) as foo:
contents = foo.read()
parse(contents)

People used to writing production code under stringent guidelines 
(introduced for good reason) will probably not be sympathetic to 
quick-hack usage patterns, but Python is used on both sides of the fence.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread Hrvoje Niksic

On 10/26/2010 07:04 AM, Peter Ingebretson wrote:

I have a patch that adds a new function to the gc module.  The gc.remap()
function uses the tp_traverse mechanism to find all references to any keys
in a provided mapping, and remaps these references in-place to instead point
to the value corresponding to each key.


What about objects that don't implement tp_traverse because they cannot 
take part in cycles?


Changing immutable objects such as tuples and frozensets doesn't exactly 
sound appealing.



A potentially controversial aspect of this change is that the signature of the
visitproc has been modified to take (PyObject **) as an argument instead of
(PyObject *) so that a visitor can modify fields visited with Py_VISIT.


This sounds like a bad idea -- visitproc is not limited to visiting 
struct members.  Visited objects can be stored in data structures where 
their address cannot be directly obtained.  For example, in C++, you 
could have an std::map with PyObject* keys, and it wouldn't be legal to 
pass addresses of those.  Various C++ bindings also implement 
smart_ptr-style wrappers over PyObject* that handle Python reference 
counting, and those will also have problems with visitproc taking 
PyObject **.


And this is not just some oddball C++ thing.  Many extensions wrap 
arbitrary C objects which can reach Python data through other C objects, 
which expose the PyObject* only through a generic "void 
*get_user_data()"-style accessor.  For such objects to cooperate with 
the GC, they must be able to visit arbitrary PyObject pointers without 
knowing their address.  PyGTK and pygobject are the obvious examples of 
this, but I'm sure there are many others.


If you want to go this route, rather create an extended visit procedure 
(visitchangeproc?) that accepts a function that can change the 
reference.  A convenience function or macro could implement this for the 
common case of struct member or PyObject**.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-27 Thread Hrvoje Niksic

On 10/26/2010 07:11 PM, Peter Ingebretson wrote:

The main argument is that preserving immutable objects increases the
complexity of remapping and does not actually solve many problems.
The primary reason for objects to be immutable is so that their
comparison operators and hash value can remain consistent.


There are other reasons as well (thread-safety), but I guess those don't 
really apply to python.  I guess one could defend the position that the 
tuple hasn't really "changed" if its elements merely get upgraded in 
this way, but it still feels wrong.


> Changing,
> for example, the contents of a tuple that a dictionary key references
> has the same effect as changing the identity of the tuple -- both
> modify the hash value of the key and thus invalidate the dictionary.
> The full reload processs needs to rehash collections invalidated by
> hash values changing, so we might as well modify the contents of
> tuples.

Do you also rehash when tuples of upgraded objects are used as dict keys?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/

2010-11-03 Thread Hrvoje Niksic

On 11/03/2010 01:47 AM, Ben Finney wrote:

 If someone wants to depend on some undocumented detail of the
 directory layout it's their problem (like people depending on bytecode
 and other stuff).


I would say that names without a single leading underscore are part of
the public API, whether documented or not.


I understand this reasoning, but I'd like to offer counter-examples. 
For instance, would you say that glob.glob0 and glob.glob1 are public 
API?  They're undocumented, they're not in __all__, but they don't have 
a leading underscore either, and source comments call them "helper 
functions."  I'm sure there is a lot of other examples like that, both 
in the standard library and in python packages out there.


Other than the existing practice, there is the matter of esthetics. 
Accepting underscore-less identifiers as automatically public leads to a 
proliferation of identifiers with leading underscores, which many people 
(myself included) plainly don't like.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-10 Thread Hrvoje Niksic

On 11/10/2010 05:12 AM, Stephen J. Turnbull wrote:

But these identifiers will appear at the module level, not global, no?
Otherwise this technique couldn't be used.  I don't really understand
what Tres is talking about when he writes "modules that expect to be
imported this way".  The *imported* module shouldn't care, no?


I think he's referring to the choice of identifiers, and the usage 
examples given in the documentation and tutorials.  For example, in the 
original PyGTK, all identifiers included "Gtk" in the name, so it made 
sense to write from pygtk import * so you could spell GtkWindow as 
GtkWindow rather than the redundant pygtk.GtkWindow.  In that sense the 
module writer "expected" to be imported this way, although you are right 
that it doesn't the least bit matter for the correct operation of the 
module itself.  For GTK 2 PyGTK switch to "gtk.Window", which 
effectively removes the temptation to import * from the module.


There are other examples of that school, most notably ctypes, but also 
Tkinter and the python2 threading module.  Fortunately it has become 
much less popular in the last ~5 years of Python history.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking undocumented API

2010-11-12 Thread Hrvoje Niksic

On 11/11/2010 11:24 PM, Greg Ewing wrote:

Nick Coghlan wrote:


 My personal opinion is that we should be trying to get the standard
 library to the point where __all__ definitions are unnecessary - if a
 name isn't in __all__, it should start with an underscore (and if that
 is true, then the __all__ definition becomes effectively redundant).


What about names imported from other modules that are used by
the module, but not intended for re-export? How would you
prevent them from turning up in help() etc. without using
__all__?


import foo as _foo

I believe I am not the only one who finds that practice ugly, but I find 
it just as ugly to underscore-ize every non-public helper function. 
__all__ is there for a reason, let's use it.  Maybe help() could 
automatically ignore stuff not in __all__, or display it but warn the 
user of non-public identifiers?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS

2010-11-22 Thread Hrvoje Niksic

On 11/22/2010 04:37 PM, Antoine Pitrou wrote:

+1.  The problem with int constants is that the int gets printed, not
the name, when you dump them for debugging purposes :)


Well, it's trivial to subclass int to something with a nicer __repr__. 
PyGTK uses that technique for wrapping C enums:


>>> gtk.PREVIEW_GRAYSCALE

>>> isinstance(gtk.PREVIEW_GRAYSCALE, int)
True
>>> gtk.PREVIEW_GRAYSCALE + 0
1
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Using logging in the stdlib and its unit tests

2010-12-10 Thread Hrvoje Niksic



On 12/10/2010 10:47 AM, Stephen J. Turnbull wrote:

Vinay Sajip writes:

  >  Indeed, and the very first code sample in the logging documentation
  >  shows exactly the simplistic easy usage you're talking about. I
  >  can't see why anyone would be scared off by that example.

They're not scared by that example.  What you need is a paragraph
below it that says

 """
 Do you think the above is all you should need?  If so, you're
 right.  You can stop reading now.  If you think you need more,
 we've got that, too.  Read on (you may need more coffee).
 """


+1
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] nonlocal x = value

2010-12-24 Thread Hrvoje Niksic

On 12/23/2010 10:03 PM, Laurens Van Houtven wrote:

On Thu, Dec 23, 2010 at 9:51 PM, Georg Brandl  wrote:

 Yes and no -- there may not be an ambiguity to the parser, but still to
 the human.  Except if you disallow the syntax in any case, requiring
 people to write

 nonlocal x = (3, y)

 which is then again inconsistent with ordinary assignment statements.


Right -- but (and hence the confusion) I was arguing for not mixing
global/nonlocal with assignment at all, and instead having nonlocal
and global only take one or more names. That would (obviously) remove
any such ambiguity ;-)


I would like to offer the opposing viewpoint: nonlocal x = value is a 
useful shortcut because nonlocal is used in closure callbacks where 
brevity matters.  The reason nonlocal is introduced is to change the 
variable, so it makes sense that the two can be done in the same line of 
code.


As for global x = value being disallowed, I have been annoyed at times 
with that, so that sounds like a good argument to change both.


Requiring the parentheses for tuple creation sounds like a good 
compromise for resolving the ambiguity, consistent with similar 
limitations of the generator expression syntax.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] %-formatting depracation

2011-02-23 Thread Hrvoje Niksic

On 02/22/2011 11:03 PM, Antoine Pitrou wrote:

I think there are many people still finding %-style more practical for
simple uses,


It's also a clash of cultures. People coming from a C/Unix background 
typically find %-style format obvious and self-explanatory, while people 
coming from Java/DotNET background feel the same way about {}-style formats.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] I am now lost - committed, pulled, merged, what is "collapse"?

2011-03-21 Thread Hrvoje Niksic

On 03/21/2011 01:34 PM, Stephen J. Turnbull wrote:

  >  Subversion never ever creates versions in the repository that
  >  didn't before exist in some working copy.

John Arbash-Meinel disagrees with you, so I think I'll go with his
opinion


Besides, it's easy to confirm:

# create a repository and two checkouts:
[~/work]$ svnadmin create repo
[~/work]$ svn co file:///home/hniksic/work/repo checkout1
Checked out revision 0.
[~/work]$ svn co file:///home/hniksic/work/repo checkout2
Checked out revision 0.

# add a file to checkout 1
[~/work]$ cd checkout1
[~/work/checkout1]$ touch a && svn add a && svn commit -m c1
A a
Adding a
Transmitting file data .
Committed revision 1.

# now add a file to the second checkout without ever seeing
# the new file added to the first one
[~/work/checkout1]$ cd ../checkout2
[~/work/checkout2]$ touch b && svn add b && svn commit -m c2
A b
Adding b
Transmitting file data .
Committed revision 2.

The second commit would be rejected by a DVCS on the grounds of a merge 
with revision "1" never having happened.  What svn calls revision two is 
in reality based on revision 0, a fact the DVCS is aware of.


The message "committed revision 2", while technically accurate, is 
misleading if you believe the revision numbers to apply to the entire 
tree (as the svn manual will happily point out).  It doesn't indicate 
that what you have in your tree when the message is displayed can be 
very different from the state of a freshly-checked-out revision 2.  In 
this case, it's missing the file "a":


[~/work/checkout2]$ ls
b

This automatic merging often causes people who migrate to a DVCS to feel 
that they have to go through an unnecessary extra step in their 
workflows.  But once you grasp the "hole" in the svn workflow, what svn 
does (and what one used to take for granted) tends to become 
unacceptable, to put it mildly.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] I am now lost - committed, pulled, merged, what is "collapse"?

2011-03-22 Thread Hrvoje Niksic

On 03/21/2011 05:44 PM, s...@pobox.com wrote:


Thanks for the example, Hrvoje.

 Hrvoje>  This automatic merging often causes people who migrate to a DVCS
 Hrvoje>  to feel that they have to go through an unnecessary extra step
 Hrvoje>  in their workflows.  But once you grasp the "hole" in the svn
 Hrvoje>  workflow, what svn does (and what one used to take for granted)
 Hrvoje>  tends to become unacceptable, to put it mildly.

In the run-up to a release when there is lots of activity happening, do you
find yourself in a race with other developers to push your changes cleanly?


I work on a small project in comparison to Python, so this doesn't 
happen to me personally.  But such a race is certain to happen on larger 
projects.  But it doesn't mean that we are helpless to prevent it. 
After all, one of the selling points of DVCS is the ability to support 
different integration workflows.


If you (we) are running into a push race with the other developers over 
the central repository's head, this could indicate that the project is 
large enough that the centralized workflow is no longer the appropriate 
one.  If you are not familiar with other DVCS workflows, take a look at, 
for example, chapter 5 of the "Pro Git" book, which describes the 
alternatives such as integrator-manager and dictator-lieutenant 
workflows: http://progit.org/book/ch5-1.html


Python obviously wouldn't benefit from a strict hierarchy implied by the 
dictator-lieutenants model, but perhaps it could switch to something 
between that and the integrator model for the releases. The release 
manager could act as the dictator (as well as integrator), while the 
core committers would be lieutenants (as well as developers).  Just a 
thought.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PyObject_RichCompareBool identity shortcut

2011-04-27 Thread Hrvoje Niksic

The other day I was surprised to learn this:

>>> nan = float('nan')
>>> nan == nan
False
>>> [nan] == [nan]
True  # also True in tuples, dicts, etc.

# also:
>>> l = [nan]
>>> nan in l
True
>>> l.index(nan)
0
>>> l[0] == nan
False

The identity test is not in container comparators, but in 
PyObject_RichCompareBool:


/* Quick result when objects are the same.
   Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}

The guarantee referred to in the comment is not only (AFAICT) 
undocumented, but contradicts the documentation, which states that the 
result should be the "equivalent of o1 op o2".


Calling PyObject_RichCompareBool is inconsistent with calling 
PyObject_RichCompare and converting its result to bool manually, 
something that wrappers (C++) and generators (cython) might reasonably 
want to do themselves, for various reasons.


If this is considered a bug, I can open an issue.

Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] __import__ problems

2008-11-27 Thread Hrvoje Niksic

Mart Somermaa wrote:

There at least two workarounds:
  * the getattr approach documented in [3]


I can't comment on the rest, but the getattr seems overly complicated. 
If you need just the module, what's wrong with:


__import__(modname)
modobj = sys.modules[modname]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] __import__ problems

2008-11-28 Thread Hrvoje Niksic

Mart Somermaa wrote:

The variant proposed by Hrvoje Niksic:

 >>> __import__(modname)
 >>> mod = sys.modules[modname]

looks more appealing, but comes with the drawback that sys has to be 
imported for that purpose only.


That is not a real drawback, as "sys" will certainly be present in the 
system, so the "importing" boils down to a dict lookup and a variable 
assignment.


Having said that, I'd add that I found the behavior of __import__ 
counter-intuitive, but assumed there's a good reason for it.  If I 
hadn't known about sys.modules beforehand, I would have probably gone 
the chained-getattr route as well.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python under valgrind

2008-11-28 Thread Hrvoje Niksic
A friend pointed out that running python under valgrind (simply 
"valgrind python") produces a lot of "invalid read" errors.  Reading up 
on Misc/README.valgrind only seems to describe why "uninitialized reads" 
should occur, not invalid ones.  For example:


$ valgrind python
[... lots of output ...]
==31428== Invalid read of size 4
==31428==at 0x808EBDF: PyObject_Free (in /usr/bin/python2.5)
==31428==by 0x810DD0A: (within /usr/bin/python2.5)
==31428==by 0x810DD34: PyNode_Free (in /usr/bin/python2.5)
==31428==by 0x80EDAD9: PyRun_InteractiveOneFlags (in /usr/bin/python2.5)
==31428==by 0x80EDDB7: PyRun_InteractiveLoopFlags (in 
/usr/bin/python2.5)

==31428==by 0x80EE515: PyRun_AnyFileExFlags (in /usr/bin/python2.5)
==31428==by 0x80595E6: Py_Main (in /usr/bin/python2.5)
==31428==by 0x8058961: main (in /usr/bin/python2.5)
==31428==  Address 0x43bf010 is 3,112 bytes inside a block of size 6,016 
free'd

==31428==at 0x4024B4A: free (vg_replace_malloc.c:323)
==31428==by 0x8059C07: (within /usr/bin/python2.5)
==31428==by 0x80EDAA5: PyRun_InteractiveOneFlags (in /usr/bin/python2.5)
...

valgrind claims that Python reads 4 bytes inside a block on which free() 
has already been called.  Is valgrind wrong, or is Python really doing 
that?  Googling revealed previous reports of this, normally answered by 
a reference to README.valgrind.  But README.valgrind justifies reading 
from ununitialized memory, which doesn't help me understand how reading 
from the middle of a block of freed memory (more precisely, memory on 
which the libc free() has already been called) would be okay.


I suppose valgrind could be confused by PyFree's pool address validation 
that intentionally reads the memory just before the allocated block, and 
incorrectly attributes it to a previously allocated (and hence freed) 
block, but I can't prove that.  Has anyone investigated this kind of 
valgrind report?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] __import__ problems

2008-11-28 Thread Hrvoje Niksic

Mart Somermaa wrote:

I meant that you have to


import sys


only to access sys.modules (i.e. importing sys may not be necessary otherwise).


I understand that, but I'm arguing that that's a non-problem.  Importing 
sys is a regular thing in Python, not an exception.  You need sys to get 
to sys.argv, sys.exit, sys.stdout, etc. -- it's not like sys is an 
infrequently used module.  Since sys is always present, importing it is 
not an efficiency problem, either.



mod = __import__(modname, submodule=True)


with


import sys
__import__(modname)
mod = sys.modules[modname]


"import sys" is normally located near the beginning of the file (and 
needed by other things), so the actual code snippet would really contain 
only those two lines, which don't strike me as bad.  Ideally, __import__ 
would simply return the "tail" imported module in the first place, but I 
don't think introducing a boolean keyword argument really improves the 
design.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python under valgrind

2008-11-28 Thread Hrvoje Niksic

Amaury Forgeot d'Arc wrote:

Did you use the suppressions file as suggested in Misc/README.valgrind?


Thanks for the suggestion (as well as to Gustavo and Victor), but my 
question wasn't about how to suppress the messages, but about why the 
messages appear in the first place.  I think my last paragraph answers 
my own question, but I'm not sure.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] subprocess crossplatformness and async communication

2009-01-26 Thread Hrvoje Niksic

Nick Craig-Wood wrote:

But for the conversational case (eg using it to do a remote login) it
doesn't work at all :-

  run child
  send stuff to stdin
  child reads stdin and writes stdout


Can this really be made safe without an explicit flow control protocol, 
such as a pseudo-TTY?  stdio reads data from pipes such as stdin in 4K 
or so chunks.  I can easily imagine the child blocking while it waits 
for its stdin buffer to fill, while the parent in turn blocks waiting 
for the child's output arrive.


Shell pipelines (and the subprocess module as it stands) don't have this 
problem because they're unidirectional: you read input from one process 
and write output to another, but you typically don't feed data back to 
the process you've read it from.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] C API for appending to arrays

2009-02-02 Thread Hrvoje Niksic
The array.array type is an excellent type for storing a large amount of 
"native" elements, such as integers, chars, doubles, etc., without 
involving the heavy machinery of numpy.  It's both blazingly fast and 
reasonably efficient with memory.  The one thing missing from the array 
module is the ability to directly access array values from C.


This might seem superfluous, as it's perfectly possible to manipulate 
array contents from Python/C using PyObject_CallMethod and friends.  The 
problem is that it requires the native values to be marshalled to Python 
objects, only to be immediately converted back to native values by the 
array code.  This can be a problem when, for example, a numeric array 
needs to be filled with contents, such as in this hypothetical example:


/* error checking and refcounting subtleties omitted for brevity */
PyObject *load_data(Source *src)
{
  PyObject *array_type = get_array_type();
  PyObject *array = PyObject_CallFunction(array_type, "c", 'd');
  PyObject *append = PyObect_GetAttrString(array, "append");
  while (!source_done(src)) {
double num = source_next(src);
PyObject *f = PyFloat_FromDouble(num);
PyObject *ret = PyObject_CallFunctionObjArgs(append, f, NULL);
if (!ret)
  return NULL;
Py_DECREF(ret);
Py_DECREF(f);
  }
  Py_DECREF(array_type);
  return array;
}

The inner loop must convert each C double to a Python Float, only for 
the array to immediately extract the double back from the Float and 
store it into the underlying array of C doubles.  This may seem like a 
nitpick, but it turns out that more than half of the time of this 
function is spent creating and deleting those short-lived floating-point 
objects.


Float creation is already well-optimized, so opportunities for speedup 
lie elsewhere.  The array object exposes a writable buffer, which can be 
used to store values directly.  For test purposes I created a faster 
"append" specialized for doubles, defined like this:


int array_append(PyObject *array, PyObject *appendfun, double val)
{
  PyObject *ret;
  double *buf;
  Py_ssize_t bufsize;
  static PyObject *zero;
  if (!zero)
zero = PyFloat_FromDouble(0);

  // append dummy zero value, created only once
  ret = PyObject_CallFunctionObjArgs(appendfun, zero, NULL);
  if (!ret)
return -1;
  Py_DECREF(ret);

  // append the element directly at the end of the C buffer
  PyObject_AsWriteBuffer(array, (void **) &buf, &bufsize));
  buf[bufsize / sizeof(double) - 1] = val;
  return 0;
}

This hack actually speeds up array creation by a significant percentage 
(30-40% in my case, and that's for code that was producing the values by 
parsing a large text file).


It turns out that an even faster method of creating an array is by using 
the fromstring() method.  fromstring() requires an actual string, not a 
buffer, so in C++ I created an std::vector with a contiguous 
array of doubles, passed that array to PyString_FromStringAndSize, and 
called array.fromstring with the resulting string.  Despite all the 
unnecessary copying, the result was much faster than either of the 
previous versions.



Would it be possible for the array module to define a C interface for 
the most frequent operations on array objects, such as appending an 
item, and getting/setting an item?  Failing that, could we at least make 
fromstring() accept an arbitrary read buffer, not just an actual string?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for appending to arrays

2009-02-03 Thread Hrvoje Niksic

Raymond Hettinger wrote:

[Hrvoje Niksic]
 The one thing missing from the array 
module is the ability to directly access array values from C.


Please put a feature request on the bug tracker.


Done, http://bugs.python.org/issue5141
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for appending to arrays

2009-02-04 Thread Hrvoje Niksic

Mike Klaas wrote:
Do you need to append, or are you just looking to create/manipulate an  
array with a bunch of c-float values?


Mostly real-life examples I've seen of this were creating an array from 
C values obtained from an external source, such as an on-disk file, or 
another process.  The code example was a (simplified and de-C++-ized) 
snippet of actual code.



I find As{Write/Read}Buffer sufficient for most of these tasks.


They improve things, as shown in the second example, but they're still 
cumbersome to use for appending/initialization of the array.


(Note that you can  
get the size of the resulting c array more easily than you are by  
using PyObject_Length).


Note that AsWriteBuffer always gives you the buffer size anyway -- you 
can't pass bufsize==NULL.  Since I have to call AsWriteBuffer in each 
iteration (because I don't know when the buffer will resize), calling 
PyObject_Length in addition to that doesn't buy much, if anything.


> I've included some example pyrex  code that populates a new
> array.array at c speed.
[...]

 cdef int NA
 NA = len(W1)
 W0 = array('d', [colTotal]) * NA


The thing is, when reading values from a file or a general iterator, you 
typically don't know the number of values in advance.  If I did, I would 
probably use an approach similar to yours.


Thanks for the code -- even if it doesn't help in this case, I 
appreciate it as an instructing example of the advanced usage of Pyrex.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Missing operator.call

2009-02-04 Thread Hrvoje Niksic
Is there a reason why the operator module doesn't have an operator.call 
function?  It would seem logical to be able to write:


map(operator.call, lst)

which calls each object in lst, just like map(operator.neg, lst) negates 
every object.  Of course, operator.call is equivalent to lambda x: x(), 
but such an equivalence exists for all functions in the operator module.


__call__ should also be provided for symmetry with other operators that 
correspond to special-name methods.


If there is interest in this and no reason why it shouldn't be done, I 
can write up an issue in the tracker and provide a patch.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Missing operator.call

2009-02-04 Thread Hrvoje Niksic

Andrew Bennetts wrote:

A patch to add operator.caller(*args, **kwargs) may be a good idea.  Your
example would then be:

map(operator.caller(), lst)


Regarding the name, note that I proposed operator.call (and 
operator.__call__) because it corresponds to the __call__ special 
method, which is analogous to how operator.neg corresponds to __neg__, 
operator.add to __add__, etc.  The term "caller" implies creation of a 
new object that carries additional state, such as method name in 
operator.methodcaller, item in operator.itemgetter, or attr in 
operator.attrgetter.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Missing operator.call

2009-02-04 Thread Hrvoje Niksic

Nick Coghlan wrote:

I'm somewhere between -0 and +0 though (-0 due to the lack of concrete
use cases, +0 because the improved consistency is appealing)


The operator module is one of the rare cases in python where consistency 
is valued more than concrete use cases.  But, for what it's worth, I 
really wished to use operator.call in my code, expected to find 
operator.call, and was quite surprised to find it missing.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Missing operator.call

2009-02-05 Thread Hrvoje Niksic

Guido van Rossum wrote:

def call(o, *args, **kwds):
   return o(*args, **kwds)

which would make call a synonym for apply (and would also provide for
the first definition as a special case). However, with that API, it
isn't so easy anymore to pass the same arguments to all callables
(unless it is no arguments that you want to pass).


My version is in line with the other operators in the operator module.
The version that binds the arguments and returns a callable is already
available as functools.partial.


And it works well in the case I encountered.  In fact, it works even 
better because it allows things like map(call, l1, l2) to apply each 
element of l2 to the corresponding function in l1.


If there's no opposition to this, I'll post a patch to the tracker.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: Add a new builtin strarray type to Python?

2011-10-03 Thread Hrvoje Niksic

On 10/02/2011 06:34 PM, Alex Gaynor wrote:

There are a number of issues that are being conflated by this thread.

1) Should str += str be fast. In my opinion, the answer is an obvious and
resounding no. Strings are immutable, thus repeated string addition is
O(n**2). This is a natural and obvious conclusion. Attempts to change this
are only truly possible on CPython, and thus create a worse enviroment for
other Pythons, as well as a quite misleading, as they'll be extremely
brittle. It's worth noting that, to my knowledge, JVMs haven't attempted
hacks like this.


CPython is already misleading and ahead of JVM, because the str += str 
optimization has been applied to Python 2 some years ago - see

http://hg.python.org/cpython-fullhistory/rev/fb6ffd290cfb?revcount=480

I like Python's immutable strings and consider it a good default for 
strings.  Nevertheless a mutable string would be useful for those 
situations when you know you are about to manipulate a string-like 
object a number of times, where immutable strings require too many 
allocations.


I don't think Python needs a StringBuilder - constructing strings using 
a list of strings or StringIO is well-known and easy.  Mutable strings 
are useful for the cases where StringBuilder doesn't suffice because you 
need modifications other than appends.  This is analogous to file writes 
- in practice most of them are appends, but sometimes you also need to 
be able to seek and write stuff in the middle.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Identifier API

2011-10-11 Thread Hrvoje Niksic

On 10/08/2011 04:54 PM, "Martin v. Löwis" wrote:

  tmp = PyObject_CallMethod(result, "update", "O", other);

would be replaced with

   PyObject *tmp;
   Py_identifier(update);
   ...
   tmp = PyObject_CallMethodId(result,&PyId_update, "O", other);


An alternative I am fond of is to to avoid introducing a new type, and 
simply initialize a PyObject * and register its address.  For example:


  PyObject *tmp;
  static PyObject *s_update;// pick a naming convention

  PY_IDENTIFIER_INIT(update);
  tmp = PyObject_CallMethodObj(result, s_update, "O", other);

  (but also PyObject_GetAttr(o, s_update), etc.)

PY_IDENTIFIER_INIT(update) might expand to something like:

  if (!s_update) {
s_update = PyUnicode_InternFromString("update");
_Py_IdentifierAdd(&s_update);
  }

_PyIdentifierAdd adds the address of the variable to a global set of C 
variables that need to be decreffed and zeroed-out at interpreted shutdown.


The benefits of this approach is:
  * you don't need special "identifier" versions of functions such as
PyObject_CallMethod. In my example I invented a
PyObject_CallMethodObj, but adding that might be useful anyway.
  * a lot of Python/C code implements similar caching, often
leaking strings.

Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Identifier API

2011-10-11 Thread Hrvoje Niksic

On 10/11/2011 02:45 PM, Amaury Forgeot d'Arc wrote:

It should also check for errors; in this case the initialization is a
bit more verbose:
if (PY_IDENTIFIER_INIT(update) < 0)
;


Error checking is somewhat more controversial because behavior in case 
of error differs between situations and coding patterns.  I think it 
should be up to the calling code to check for s_update remaining NULL. 
In my example, I would expect PyObject_CallMethodObj and similar to 
raise InternalError when passed a NULL pointer.  Since their return 
values are already checked, this should be enought to cover the unlikely 
case of identifier creation failing.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Compiling the source without stat

2011-12-15 Thread Hrvoje Niksic

On 12/14/2011 05:04 PM, Hossein wrote:

If there is anything I should do


You can determine what the code that calls stat() is trying to do, and 
implement that with other primitives that your platform provides.  For 
example, you can determine whether a file exists by trying to open it in 
read-only mode and checking the error.  You can find whether a 
filesystem path names a directory by trying to chdir into it and 
checking the error.  You can find the size of a regular file by opening 
it and seeking to the end.  These substitutions would not be acceptable 
for a desktop system, but may be perfectly adequate for an embedded one 
that doesn't provide stat() in the first place.  Either way, I expect 
that you will need to modify the sources.


Finally, are you 100% sure that your platform doesn't provide another 
API similar to stat()?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of the fix for the hash collision vulnerability

2012-01-18 Thread Hrvoje Niksic

On 01/17/2012 09:29 PM, "Martin v. Löwis" wrote:

I(0) = H&  MASK
PERTURB(0) = H
I(n+1) = (5*I(n) + 1 + PERTURB(n))&  MASK
PERTURN(n+1) = PERTURB(n)>>  5

So if two objects O1 and O2 have the same hash value H, the sequence of
probed indices is the same for any MASK value. It will be a different
sequence, yes, but they will still collide on each and every slot.

This is the very nature of open addressing.


Open addressing can still deploy a collision resolution mechanism 
without this property. For example, double hashing uses a different hash 
function (applied to the key) to calculate PERTURB(0). To defeat it, the 
attacker would have to produce keys that hash the same using both hash 
functions.


Double hashing is not a good general solution for Python dicts because 
it complicates the interface of hash tables that support arbitrary keys. 
Still, it could be considered for dicts with known key types (built-ins 
could hardcode the alternative hash function) or for SafeDicts, if they 
are still considered.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashing proposal: change only string-only dicts

2012-01-20 Thread Hrvoje Niksic

On 01/18/2012 06:55 PM, "Martin v. Löwis" wrote:

I was thinking about adding the field at the end,


Will this make all strings larger, or only those that create dict 
collisions?  Making all strings larger to fix this issue sounds like a 
really bad idea.


Also, would it be acceptable to simply not cache the alternate hash? 
The cached string hash is an optimization anyway.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggested addition to PEP 8 for context managers

2012-04-18 Thread Hrvoje Niksic

On 04/17/2012 04:21 PM, Guido van Rossum wrote:

I hope that's now what it says about slices -- that was meant for dict
displays. For slices it should be symmetrical. In this case I would
remove the spaces around the +, but it's okay to add spaces around the
: too. It does look odd to have an operator that binds tighter (the +)
surrounded by spaces while the operator that binds less tight (:) is
not.


The same oddity occurs with expressions in kwargs calls:

func(pos1, pos2, keyword=foo + bar)

I find myself wanting to add parentheses arround the + to make the code 
clearer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding types.build_class for 3.3

2012-05-07 Thread Hrvoje Niksic

On 05/07/2012 02:15 PM, Nick Coghlan wrote:

Benjamin's suggestion of a class method on type may be a good one,
though. Then the invocation (using all arguments) would be:

   mcl.build_class(name, bases, keywords, exec_body)

Works for me, so unless someone else can see a problem I've missed,
we'll go with that.


Note that to call mcl.build_class, you have to find a metaclass that 
works for bases, which is the job of build_class.  Putting it as a 
function in the operator module seems like a better solution.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST optimizer implemented in Python

2012-08-14 Thread Hrvoje Niksic

On 08/14/2012 03:32 PM, Victor Stinner wrote:

 I had the idea (perhaps not an original one) that peephole optimization would 
be much better
 done in python than in C.  The C code is clunky and unwieldly, wheras python 
would be much
 better suited, being able to use nifty regexes and the like.

 The problem is, there exists only bytecode disassembler, no corresponding 
assembler.


Why would you like to work on bytecode instead of AST? The AST
contains much more information, you can implement better optimizations


AST allows for better high-level optimizations, but a real peephole 
optimization pass is actually designed to optimize generated code.  This 
allows eliminating some inefficiencies which would be fairly hard to 
prevent at higher levels - wikipedia provides some examples.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why not using the hash when comparing strings?

2012-10-19 Thread Hrvoje Niksic

On 10/19/2012 03:22 AM, Benjamin Peterson wrote:

It would be interesting to see how common it is for strings which have
their hash computed to be compared.


Since all identifier-like strings mentioned in Python are interned, and 
therefore have had their hash computed, I would imagine comparing them 
to be fairly common. After all, strings are often used as makeshift 
enums in Python.


On the flip side, those strings are typically small, so a measurable 
overall speed improvement brought by such a change seems unlikely.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] type vs. class terminology

2012-11-25 Thread Hrvoje Niksic

On 11/26/2012 06:01 AM, Chris Jerdonek wrote:

I would like to know when we should use "class" in the Python 3
documentation, and when we should use "type."  Are these terms
synonymous in Python 3, and do we have a preference for which to use
and when?


Some people like to use "class" for the subset of types created by 
Python's "class" statement or its moral equivalent (explicit invocation 
of the metaclass). It makes sense that "class" is used to create 
classes. The word "type" then refers to both classes and built-in and 
extension types, such as "list" or "array.array".

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (aef7db0d3893): sum=287

2013-01-14 Thread Hrvoje Niksic

On 01/12/2013 02:46 PM, Eli Bendersky wrote:

The first report is legit, however. PyTuple_New(0) was called and its
return value wasn't checked for NULL.


The author might have been relying on Python caching the empty tuple.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 433: second try

2013-01-30 Thread Hrvoje Niksic

On 01/30/2013 01:00 PM, Victor Stinner wrote:

Disable inheritance by default
(...)
* It violates the principle of least surprise.  Developers using the
  os module may expect that Python respects the POSIX standard and so
  that close-on-exec flag is not set by default.


Oh, I just saw that Perl is "violating POSIX" since Perl 1:
close-on-exec flag in set on new created file descriptors if their
number is greater than $SYSTEM_FD_MAX (which is usually 2).


I haven't checked the source, but I suspect this applies only to file 
descriptors opened with open(), not to explicit POSIX::* calls.  (The 
documentation of the latter doesn't mention close-on-exec at all.) 
Perl's open() contains functionality equivalent to Python's open() and 
subprocess.Popen(), the latter of which already closes on exec by default.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PyObject_RichCompareBool identity shortcut

2011-04-28 Thread Hrvoje Niksic AVL HR

On 04/28/2011 04:31 AM, Stephen J. Turnbull wrote:

Are you saying you would expect that


 nan = float('nan')
 a = [1, ..., 499, nan, 501, ..., 999]# meta-ellipsis, not Ellipsis
 a == a

False

??


I would expect l1 == l2, where l1 and l2 are both lists, to be 
semantically equivalent to len(l1) == len(l2) and all(imap(operator.eq, 
l1, l2)).  Currently it isn't, and that was the motivation for this thread.


If objects that break reflexivity of == are not allowed, this should be 
documented, and such objects banished from the standard library.


Hrvoje
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com