[Python-Dev] Should collections.Counter check for int?

2009-05-13 Thread Hagen Fürstenau
I just noticed that while the docs say that "Counts are allowed to be
any integer value including zero or negative counts",
collections.Counter doesn't perform any check on the types of count
values. Instead, non-numerical values will lead to strange behaviour or
exceptions later on:

>>> c = collections.Counter({'a':'3', 'b':'20', 'c':'100'})
>>> c.most_common(2)
[('a', '3'), ('b', '20')]
>>> c+c
Traceback (most recent call last):
  File "", line 1, in 
  File "/local/hagenf/lib/python3.1/collections.py", line 467, in __add__
if newcount > 0:
TypeError: unorderable types: str() > int()

I'd prefer Counter to refuse non-numerical values right away as the
present behaviour may hide bugs (e.g. a forgotten string->int
conversion). Any opinions? (And what about negative values or floats?)

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should collections.Counter check for int?

2009-05-16 Thread Hagen Fürstenau

I'd prefer Counter to refuse non-numerical values right away as the
present behaviour may hide bugs (e.g. a forgotten string->int
conversion). Any opinions? (And what about negative values or floats?)


Please file a report on bugs.python.org so that there's a record of this
issue.


Done: http://bugs.python.org/issue6038

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Iterator version of contextlib.nested

2009-06-12 Thread Hagen Fürstenau
contextlib.nested has recently been deprecated on grounds of being 
unnecessary now that the with statement accepts multiple context 
managers. However, as has been mentioned before 
(http://mail.python.org/pipermail/python-dev/2009-May/089359.html), that 
doesn't cover the case of a variable number of context managers, i.e.


with contextlib.nested(*list_of_managers) as list_of_results:

or

with contexlib.nested(*iterator_of_managers):

It was suggested that in these use cases a custom context manager should 
be implemented. However, it seems that such an implementation would be 
an almost exact copy of the present code for "nested".


I'm proposing to add an iterator version of "nested" to contextlib 
(possibly called "inested"), which takes an iterable of context managers 
instead of a variable number of parameters. The implementation could be 
taken over from the present "nested", only changing "def 
nested(*managers)" to "def inested(managers)".


This has the advantage that an iterator can be passed to "inested", so 
that each context managers is created in the context of all previous 
ones, which was one of the reasons for introducing the multi-with 
statement in the first place. "contextlib.inested" would therefore be 
the generalization of the multi-with statement to a variable number of 
managers (and "contextlib.nested" would stay deprecated).


- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Iterator version of contextlib.nested

2009-06-13 Thread Hagen Fürstenau
> The semantic change actually needed to make nested() more equivalent to
> the multi-with statement is for it to accept zero-argument callables
> that create context managers as arguments rather than pre-created
> context managers.

It seems to me that both passing callables which return managers and
passing a generator which yields managers achieve about the same thing.
Are you proposing the former just to avoid introducing a new interface?

> Rather than changing the name of the function, this could be done by
> inspecting the first argument for an "__enter__" method. If it has one,
> use the old semantics (and issue a DeprecationWarning as in 3.1).
> Otherwise, use the proposed new semantics.

I guess this is much too late for 3.1, but could we then at least
un-deprecate "contextlib.nested" for now? As it is, you get a
DeprecationWarning for something like

with contextlib.nested(*my_managers):

without any good way to get rid of it.

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Iterator version of contextlib.nested

2009-06-14 Thread Hagen Fürstenau
> I actually almost asked for that to be changed to a
> PendingDeprecationWarning when it was first added - Benjamin, do you
> mind if I downgrade this warning to a pending one post rc2?

I'm not sure what that would buy us. For the use case I mentioned it
would be just as annoying to get a PendingDeprecationWarning. But if the
warning was completely removed now, nested could still get deprecated in
3.2 as soon as some better mechanism for a variable number of managers
has been provided.

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Iterator version of contextlib.nested

2009-06-15 Thread Hagen Fürstenau
> Part of the justification for the new with-statement syntax was
> that nested() doesn't have a way to finalize the constructors
> if one of them fails.

I think the problem was a little bit more subtle: nested() gets passed
managers, so their __init__()s should all have run when the first
context is entered. The only problem comes up when the __exit__() of an
outer manager tries to suppress an exception raised by the __enter__()
of an inner one. This is a limited defect in that it doesn't affect the
common situation where no __exit__() tries to suppress any exceptions.
(In a quick glance over the std library I couldn't find a single
instance of an exception-suppressing __exit__().).

> And now
> that we have the new with-statement syntax, it mostly just
> represents a second-way-to-do-it (a second way that has
> has the stated pitfall).

So the functionalities of nested() and multi-with overlap in the common
use cases, and each has its own limitation in an uncommon one. I agree
that this situation is unfortunate, but I think introducing support for
one uncommon case and removing it for another is not the way to go in
3.1. That's why I think nested() should stay un-deprecated until there
is a replacement which handles a superset of its use cases.

> The new statement was not designed to support passing in
> tuples of context-managers.  This issue was raised while
> the new with-statement was being designed and it was
> intentionally left-out (in part, because the use cases were
> questionable

FWIW, my use case (which made me notice the DeprecationWarning in the
first place) is in a command dispatch function, which looks at the
command to be executed and pre-processes its arguments in a uniform way.
Part of that pre-processing is entering contexts of context manager
before passing them along (and exiting them when the command finishes or
raises an exception).

> and in-part because there were other ways
> to do it such as adding __enter__ and __exit__ to tuple).

Which has not been done for 3.1. Granted, you could subclass tuple and
add them yourself, but then you would mostly be copying what's already
implemented in nested().

> I suggest a PEP for 2.7 and 3.2 for building-out the
> with-statement to support tuples of context managers

That sounds like a good idea.

> IMO, this represents doing-it-the-right-way instead of preserving a
> construct that is known to be problematic.
> Leaving it in will enshrine it.

I don't see the problem with deprecating it only after a completely
suitable replacement is found. Why would it be any harder to deprecate
nested() in 3.2?

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Iterator version of contextlib.nested

2009-06-15 Thread Hagen Fürstenau
> Unlike a full DeprecationWarning, a PendingDeprecationWarning is ignored
> by default. You have to switch them on explicitly via code or a command
> line switch in order to see them.

Sorry, I should have made myself more familiar with the warnings
mechanism before writing. In that case I'm fine with a
PendingDeprecationWarning. :-)

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mercurial: tag generation incorrect

2009-07-10 Thread Hagen Fürstenau
> be32850b093f is listed
> as having a child revision, 52b0a279fec6, and ISTM that *this*
> should be the revision that got tagged.

I think the tag is correct. Note that the concept of tagging is
different in Mercurial, where a tag can only refer to a revision
previous to the one where it is inserted in .hgtags. If I understand
correctly, all relevant tagging revisions from SVN are replaced by
Mercurial revisions setting tags, which then refer to their immediate
predecessors.

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Very Strange Argument Handling Behavior

2010-04-16 Thread Hagen Fürstenau
> This behavior seems pretty strange to me, indeed PyPy gives the
> TypeError for both attempts.  I just wanted to confirm that it was in
> fact intentional.

Oleg already answered why f(**{1:3}) raises a TypeError. But your
question seems to be rather why dict(**{1:3}) doesn't.

For functions implemented in Python, non-string arguments are always
rejected, but C functions (like the dict constructor) don't have to
reject them. I don't see any benefit in allowing them, but it's probably
not worth breaking code by disallowing them either.

I couldn't find this documented. Perhaps we should just say "don't rely
on being able to pass non-string keywords" somewhere?

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Ask a question for a script about re.findall Modlue

2010-05-22 Thread Hagen Fürstenau
> Your problem is easily explained however: the second argument to
> p.findall() should be an offset, not a flag set. (You are confusing
> re.findall() and p.findall().)

I filed a doc bug for this:

http://bugs.python.org/issue8785

Cheers,
Hagen



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (Not) delaying the 3.2 release

2010-09-16 Thread Hagen Fürstenau
> Why not? Since the I/O speed problem is fixed, I have no idea what you
> are referring to.  Please do be concrete.

There's still a performance issue with pickling, but if issue 3873 could
be resolved, Python 3 would actually be faster there.

- Hagen



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Prefetching on buffered IO files

2010-09-29 Thread Hagen Fürstenau
> Ow... I've always assumed that seek() is essentially free, because
> that's how a typical OS kernel implements it. If seek() is bad on
> GzipFile, how hard would it be to fix this?

I'd imagine that there's no easy way to make arbitrary seeks on a
GzipFile fast. But wouldn't it be enough to optimize small relative
(backwards) seeks?

> How common is the use case where you need to read a gzipped pickle
> *and* you need to leave the unzipped stream positioned exactly at the
> end of the pickle?

Not uncommon, I think. You need this for unpickling objects which were
dumped one after another into a GzipFile, right?

ISTM that the immediate performance issue can be solved by the present
patch, and there's room for future improvement by optimizing GzipFile
seeks and/or extending the IO API.

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Hagen Fürstenau
>> During PEP 3003 discussion, it was suggested to handle it on a case by
>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP
>> 3003.
> 
> It's covered by "As the standard library is not directly tied to the
> language definition it is not covered by this moratorium."

How is this restricted to the stdlib if it defines the set of valid
identifiers?

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] set iteration order

2011-02-26 Thread Hagen Fürstenau
Hi,

I just hunted down a change in behaviour between Python 3.1 and 3.2 to
possibly changed iteration order of sets due to the optimization in
issue #8685. Of course, this order shouldn't be relied on in the first
place, but the side effect of the optimization might be worth mentioning
in "What's new", maybe also pointing out that the old behaviour can be
simulated with {x for x in a if x not in b} in place of "a-b".

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] set iteration order

2011-02-26 Thread Hagen Fürstenau
> Code with any dependence on the iteration order of unordered collections
> (other than the guarantee that d.keys() and d.values() match at any
> given time as long as d is unchanged) is buggy.

It's not a matter of dependence on iteration order, but of
reproducibility (in my case there were minor numerical differences due
to different iteration orders). I think we also warn about changes in
pseudorandom number sequences, although you could argue that no code
should depend on specific pseudorandom numbers.

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] set iteration order

2011-02-27 Thread Hagen Fürstenau
>> It's not a matter of dependence on iteration order, but of
>> reproducibility (in my case there were minor numerical differences due
>> to different iteration orders).
> 
> Can you give a code example?  I don’t understand your case.

It's a bit involved (that's why it took me a while to locate the
difference in behavior), but it boils down to a (learning) algorithm
that in principle should not care about order of input data, but will in
practice show slightly different numerical behavior. I ran into the
problem when trying to exactly reproduce previously published
experimental results. Of course, I should have anticipated this and
fixed some arbitrary order in the first place. I just thought a note
about this change might save someone in a similar situation some confusion.

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.2.1 rc 1

2011-05-18 Thread Hagen Fürstenau
> On behalf of the Python development team, I am pleased to announce the
> first release candidate of Python 3.2.1.

Shouldn't there be a tag "v3.2.1rc1" in the hg repo?

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.2.1 rc 1

2011-05-18 Thread Hagen Fürstenau
> P.S. "Shouldn't" makes it sound as if there was a mistake.

Well, I thought there was. When do these tags get merged into "cpython"
then? "v3.2.1b1" is there, but "v3.2.1rc1" isn't:

http://hg.python.org/cpython/tags

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.2.1 rc 1

2011-05-19 Thread Hagen Fürstenau
> 3.2.1b1 was already merged back.  (And 3.2.1rc1 will also be merged back
> soon, since there will be a 3.2.1rc2.)

Thanks for the clarification! :-)

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-12 Thread Hagen Fürstenau
> EOH  = b'\r'[0]
> CHAR = b'C'[0]
> DATE = b'D'[0]
> FLOAT = b'F'[0]
> INT = b'I'[0]
> LOGICAL = b'L'[0]
> MEMO = b'M'[0]
> NUMBER = b'N'[0]
> 
> This is not beautiful code.

You still have the alternative

EOH = ord('\r')
CHAR = ord('C')
...

which looks fine to me.

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Hagen Fürstenau
> If the Unicode APIs only have correct unicode, sure.  If not you'll
> get errors translating to UTF-8 (and the byte APIs are supposed to
> pass bad names through unaltered.)  Kinda ironic, no?

As far as I can see all Python Unicode strings can be encoded to UTF-8,
even things like lone surrogates because Python doesn't care about them.
So both the Unicode API and the binary API would be fail-safe on Windows.

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-07 Thread Hagen Fürstenau
>> As far as I can see all Python Unicode strings can be encoded to UTF-8,
>> even things like lone surrogates because Python doesn't care about them.
>> So both the Unicode API and the binary API would be fail-safe on Windows.
> 
> Python is broken and needs to be fixed.
> 
> http://bugs.python.org/issue3672
> http://bugs.python.org/issue3297

But the question of whether Python should care about lone surrogates or
not is at best tangential to the issue at hand.  If you have lone
surrogates in the Unicode API (and didn't raise an exception on the way
getting there), then the sensible thing is to encode them into lone
UTF-8 surrogates.  Even if you wanted to prevent lone surrogates,
encoding to UTF-8 for the binary API would not be the place to enforce it.

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Py3k: magical dir()

2008-12-19 Thread Hagen Fürstenau
> Is there some reason no set tp_hash for rangeobject to
> PyObject_HashNotImplemented ?

http://bugs.python.org/issue4701

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: Partial function application 'from the right'

2009-02-02 Thread Hagen Fürstenau
Ludvig Ericson wrote:
> Well, I was trying to be funny and was under the impression that Python
> 3.0 had Unicode identifiers, but apparently it doesn't. (I used …, not ...)

It does, but they may not contain characters of the category
"Punctuation, other":

>>> import unicodedata
>>> unicodedata.category("…")
'Po'

- Hagen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Hagen Fürstenau
>> [...] some text drawing engines draw decomposed characters ("o"
>> followed by " ̈" -> "ö") differently compared to their composite
>> equivalents ("ö") and this may be perceived as better or worse. I'd
>> like to offer an option to replace some decomposed characters with
>> their composite equivalent before drawing but since other characters
>> may look worse, I don't want to do a full normalization.
> 
> Isn't this an issue properly solved by various normal forms?

I think he's rather describing the need for custom "abnormal forms".

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 393 Summer of Code Project

2011-09-01 Thread Hagen Fürstenau
> Ok, I thought there was also a form normalized (denormalized?) to
> decomposed form. But I'll take your word.

If I understood the example correctly, he needs a mixed form, with some
characters decomposed and some composed (depending on which one looks
better in the given font). I agree that this sound more like a font
problem, but it's a wide spread font problem and it may be necessary to
address it in an application.

But this is only one example of why an application-specific concept of
graphemes different from the Unicode-defined normalized forms can be
useful. I think the very concept of a grapheme is context, language, and
culture specific. For example, in Chinese Pinyin it would be very
natural to write tone marks with composing diacritics (i.e. in
decomposed form). But then you have the vowel "ü" and it would be
strange to decompose it into an "u" and combining diaeresis. So
conceptually the most sensible representation of "lǜ" would be neither
the composed not the decomposed normal form, and depending on its needs
an application might want to represent it in the mixed form (composing
the diaeresis with the "u", but leaving the grave accent separate).

There must be many more examples where the conceptual context determines
the right composition, like for "ñ", which is Spanish is certainly a
grapheme, but in mathematics might be better represented as n-tilde. The
bottom line is that, while an array of Unicode code points is certainly
a generally useful data type (and PEP 393 is a great improvement in this
regard), an array of graphemes carries many subtleties and may not be
nearly as universal. Support in the spirit of unicodedata's
normalization function etc. is certainly a good thing, but we shouldn't
assume that everyone will want Python to do their graphemes for them.

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.2.2

2011-09-04 Thread Hagen Fürstenau
> To download Python 3.2 visit:
> 
> http://www.python.org/download/releases/3.2/

It's a bit confusing that the download link is to 3.2 and not 3.2.2.

Cheers,
Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com