[Python-Dev] what is a dict_keys and where can I import it from?

2013-02-12 Thread Chris Withers

Hi all,

So, dicts in Python 3 return "something different" from their keys and 
values methods:


>>> dict(x=1, y=2).keys()
dict_keys(['y', 'x'])
>>> type(dict(x=1, y=2).keys())


I have vague memories of these things being referred to as views or some 
such? Where can I learn more?


More importantly, how can I tell if I have one of them?
I guess I can keep a reference to type({}.keys()) somewhere, but that 
feels a bit yucky. Should these things be in the types module?


cheers,

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what is a dict_keys and where can I import it from?

2013-02-12 Thread Andrew Svetlov
Is collections.KeysView good for you?

On Tue, Feb 12, 2013 at 9:05 AM, Chris Withers  wrote:
> Hi all,
>
> So, dicts in Python 3 return "something different" from their keys and
> values methods:
>
 dict(x=1, y=2).keys()
> dict_keys(['y', 'x'])
 type(dict(x=1, y=2).keys())
> 
>
> I have vague memories of these things being referred to as views or some
> such? Where can I learn more?
>
> More importantly, how can I tell if I have one of them?
> I guess I can keep a reference to type({}.keys()) somewhere, but that feels
> a bit yucky. Should these things be in the types module?
>
> cheers,
>
> Chris
>
> --
> Simplistix - Content Management, Batch Processing & Python Consulting
> - http://www.simplistix.co.uk
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com



--
Thanks,
Andrew Svetlov
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question regarding: Lib/_markupbase.py

2013-02-12 Thread Antoine Pitrou
Le Mon, 11 Feb 2013 11:02:04 -0800,
Guido van Rossum  a écrit :
> Warning: see http://bugs.python.org/issue17170. Depending on the
> length of the string being scanned and the probability of finding the
> specific character, the proposed change could actually be a
> *pessimization*. OTOH if the character occurs many times, the slice
> will actually cause O(N**2) behavior. So yes, it depends greatly on
> the distribution of the input data.

That said, the savings are still puny unless you spend your time
calling str.find().

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what is a dict_keys and where can I import it from?

2013-02-12 Thread Steven D'Aprano

On 12/02/13 18:05, Chris Withers wrote:

Hi all,

So, dicts in Python 3 return "something different" from their keys and values 
methods:


>> dict(x=1, y=2).keys()

dict_keys(['y', 'x'])

>> type(dict(x=1, y=2).keys())



I have vague memories of these things being referred to as views or some such? 
Where can I learn more?


The Fine Manual is usually a good place to refresh your vague memories :-)

http://docs.python.org/3/library/stdtypes.html#dict.keys

By the way, they're also in Python 2.7, only they're called  "viewkeys" instead.




More importantly, how can I tell if I have one of them?


Depends why you care. You may not care, but for those times where you do, they 
are in collections.


py> from collections import KeysView
py> keys = {}.keys()
py> isinstance(keys, KeysView)
True


An anomaly, which I cannot explain:

py> issubclass(type(keys), KeysView)
True
py> type(keys) is KeysView
False
py> type(keys).__mro__
(, )


This disturbs my calm, because I expect that if issubclass returns True, the 
two classes will either be identical, or the second will be in the MRO of the 
first. What have I missed?




I guess I can keep a reference to type({}.keys()) somewhere, but that feels a 
bit yucky.


I remember Python 1.4 days when the only way to type-test something was:

if type(something) is type([]):
...

so dynamically grabbing the type from a literal when needed does not seem the 
least bit yucky to me.


--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what is a dict_keys and where can I import it from?

2013-02-12 Thread Amaury Forgeot d'Arc
2013/2/12 Steven D'Aprano 

> An anomaly, which I cannot explain:
>
> py> issubclass(type(keys), KeysView)
> True
> py> type(keys) is KeysView
> False
> py> type(keys).__mro__
> (, )
>
>
> This disturbs my calm, because I expect that if issubclass returns True,
> the two classes will either be identical, or the second will be in the MRO
> of the first. What have I missed?
>

Ah, the magic of ABCs...
KeysView overrides __instancecheck__, and can pretend to be any other class.

This is precisely set in Lib/collections/abc.py:
  KeysView.register(dict_keys)


-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
Hi

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance "fix" that uses += for
strings instead of "".join() that was there before.

Now this hurts pypy (we can mitigate it to some degree though) and
possible Jython and IronPython too.

How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?

What about other performance improvements in stdlib that are
problematic for pypy or others?

Personally I would like cleaner code in stdlib vs speeding up CPython.
Typically that also helps pypy so I'm not unbiased.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou

Hi !

On Tue, 12 Feb 2013 23:03:04 +0200
Maciej Fijalkowski  wrote:
> 
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses += for
> strings instead of "".join() that was there before.
> 
> Now this hurts pypy (we can mitigate it to some degree though) and
> possible Jython and IronPython too.
> 
> How people feel about generally not having += on long strings in
> stdlib (since the refcount = 1 thing is a hack)?

I agree that += should not be used as an optimization (on strings) in
the stdlib code. The optimization is there so that uncareful code does
not degenerate, but deliberately relying on it is a bit devilish.
(optimisare diabolicum :-))

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Brett Cannon
On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou  wrote:

>
> Hi !
>
> On Tue, 12 Feb 2013 23:03:04 +0200
> Maciej Fijalkowski  wrote:
> >
> > We recently encountered a performance issue in stdlib for pypy. It
> > turned out that someone commited a performance "fix" that uses += for
> > strings instead of "".join() that was there before.
> >
> > Now this hurts pypy (we can mitigate it to some degree though) and
> > possible Jython and IronPython too.
> >
> > How people feel about generally not having += on long strings in
> > stdlib (since the refcount = 1 thing is a hack)?
>
> I agree that += should not be used as an optimization (on strings) in
> the stdlib code. The optimization is there so that uncareful code does
> not degenerate, but deliberately relying on it is a bit devilish.
> (optimisare diabolicum :-))
>

Ditto from me. If you're going so far as to want to optimize Python code
then you probably are going to care enough to accelerate it in C, in which
case you can leave the Python code idiomatic.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
On Tue, Feb 12, 2013 at 11:16 PM, Brett Cannon  wrote:
>
>
>
> On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou  wrote:
>>
>>
>> Hi !
>>
>> On Tue, 12 Feb 2013 23:03:04 +0200
>> Maciej Fijalkowski  wrote:
>> >
>> > We recently encountered a performance issue in stdlib for pypy. It
>> > turned out that someone commited a performance "fix" that uses += for
>> > strings instead of "".join() that was there before.
>> >
>> > Now this hurts pypy (we can mitigate it to some degree though) and
>> > possible Jython and IronPython too.
>> >
>> > How people feel about generally not having += on long strings in
>> > stdlib (since the refcount = 1 thing is a hack)?
>>
>> I agree that += should not be used as an optimization (on strings) in
>> the stdlib code. The optimization is there so that uncareful code does
>> not degenerate, but deliberately relying on it is a bit devilish.
>> (optimisare diabolicum :-))
>
>
> Ditto from me. If you're going so far as to want to optimize Python code
> then you probably are going to care enough to accelerate it in C, in which
> case you can leave the Python code idiomatic.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

I should actually reference the original CPython issue
http://bugs.python.org/issue1285086
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread fwierzbi...@gmail.com
On Tue, Feb 12, 2013 at 1:03 PM, Maciej Fijalkowski  wrote:
> Hi
>
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses += for
> strings instead of "".join() that was there before.
>
> Now this hurts pypy (we can mitigate it to some degree though) and
> possible Jython and IronPython too.
Just to confirm Jython does not have optimizations for += String and
will do much better with the idiomatic "".join().

-Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Tue, 12 Feb 2013 13:32:50 -0800
"fwierzbi...@gmail.com"  wrote:
> On Tue, Feb 12, 2013 at 1:03 PM, Maciej Fijalkowski  wrote:
> > Hi
> >
> > We recently encountered a performance issue in stdlib for pypy. It
> > turned out that someone commited a performance "fix" that uses += for
> > strings instead of "".join() that was there before.
> >
> > Now this hurts pypy (we can mitigate it to some degree though) and
> > possible Jython and IronPython too.
> Just to confirm Jython does not have optimizations for += String and
> will do much better with the idiomatic "".join().

For the record, io.StringIO should be quite fast in 3.3.
(except for the method call overhead that Guido is complaining
about :-))

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Ned Batchelder

On 2/12/2013 4:16 PM, Brett Cannon wrote:




On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou > wrote:



Hi !

On Tue, 12 Feb 2013 23:03:04 +0200
Maciej Fijalkowski mailto:fij...@gmail.com>> wrote:
>
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses
+= for
> strings instead of "".join() that was there before.
>
> Now this hurts pypy (we can mitigate it to some degree though) and
> possible Jython and IronPython too.
>
> How people feel about generally not having += on long strings in
> stdlib (since the refcount = 1 thing is a hack)?

I agree that += should not be used as an optimization (on strings) in
the stdlib code. The optimization is there so that uncareful code does
not degenerate, but deliberately relying on it is a bit devilish.
(optimisare diabolicum :-))


Ditto from me. If you're going so far as to want to optimize Python 
code then you probably are going to care enough to accelerate it in C, 
in which case you can leave the Python code idiomatic.


But the only reason "".join() is a Python idiom in the first place is 
because it was "the fast way" to do what everyone initially coded as "s 
+= ...".   Just because we all learned a long time ago that joining was 
the fast way to build a string doesn't mean that "".join() is the clean 
idiomatic way to do it.


--Ned.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread R. David Murray
On Tue, 12 Feb 2013 16:40:38 -0500, Ned Batchelder  
wrote:
> On 2/12/2013 4:16 PM, Brett Cannon wrote:
> > On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou  > > wrote:
> > On Tue, 12 Feb 2013 23:03:04 +0200
> > Maciej Fijalkowski mailto:fij...@gmail.com>> wrote:
> > >
> > > We recently encountered a performance issue in stdlib for pypy. It
> > > turned out that someone commited a performance "fix" that uses
> > += for
> > > strings instead of "".join() that was there before.
> > >
> > > Now this hurts pypy (we can mitigate it to some degree though) and
> > > possible Jython and IronPython too.
> > >
> > > How people feel about generally not having += on long strings in
> > > stdlib (since the refcount = 1 thing is a hack)?
> >
> > I agree that += should not be used as an optimization (on strings) in
> > the stdlib code. The optimization is there so that uncareful code does
> > not degenerate, but deliberately relying on it is a bit devilish.
> > (optimisare diabolicum :-))
> >
> > Ditto from me. If you're going so far as to want to optimize Python 
> > code then you probably are going to care enough to accelerate it in C, 
> > in which case you can leave the Python code idiomatic.
> 
> But the only reason "".join() is a Python idiom in the first place is 
> because it was "the fast way" to do what everyone initially coded as "s 
> += ...".   Just because we all learned a long time ago that joining was 
> the fast way to build a string doesn't mean that "".join() is the clean 
> idiomatic way to do it.

If 'idiomatic' (a terrible term) means "the standard way in this
language", which is how it is employed in the programming community,
then yes, "".join() is the idiomatic way to write that *in Python*,
and thus is cleaner code *in Python*.

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Tue, 12 Feb 2013 16:40:38 -0500
Ned Batchelder  wrote:
> 
> But the only reason "".join() is a Python idiom in the first place is 
> because it was "the fast way" to do what everyone initially coded as "s 
> += ...".   Just because we all learned a long time ago that joining was 
> the fast way to build a string doesn't mean that "".join() is the clean 
> idiomatic way to do it.

It's idiomatic because strings are immutable (by design, not because of
an optimization detail) and therefore concatenation *has* to imply
building a new string from scratch.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Xavier Morel
On 2013-02-12, at 22:40 , Ned Batchelder wrote:
> But the only reason "".join() is a Python idiom in the first place is because 
> it was "the fast way" to do what everyone initially coded as "s += ...".   
> Just because we all learned a long time ago that joining was the fast way to 
> build a string doesn't mean that "".join() is the clean idiomatic way to do 
> it.

Well no, str.join is the idiomatic way to do it because it is:

> idiomatic |ˌidēəˈmatik|
> adjective
> 1 using, containing, or denoting expressions that are natural to a native 
> speaker 

or would you argue that the natural way for weathered python developers
to concatenate string is to *not* use str.join?

Of course usually idioms have original reasons for being, reasons which
are sometimes long gone (not unlike religious mandates or prohibitions).

For Python, ignoring the refcounting hack (which is not only cpython
specific but *current* cpython specific *and* doesn't apply to all
cases) that reason still exist: python's strings are formally immutable
bytestrings, and repeated concatenation of immutable bytestrings is
quadratic.

Thus str.join is idiomatic, and although it's possible (if difficult) to
change the idiom straight string concatenation would make a terrible new
idiom as it will behave either unreliably (current CPython) or simply
terribly (every other Python implementation).

No?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread MRAB

On 2013-02-12 21:44, Antoine Pitrou wrote:

On Tue, 12 Feb 2013 16:40:38 -0500
Ned Batchelder  wrote:


But the only reason "".join() is a Python idiom in the first place is
because it was "the fast way" to do what everyone initially coded as "s
+= ...".   Just because we all learned a long time ago that joining was
the fast way to build a string doesn't mean that "".join() is the clean
idiomatic way to do it.


It's idiomatic because strings are immutable (by design, not because of
an optimization detail) and therefore concatenation *has* to imply
building a new string from scratch.


Tuples are much like immutable lists; sets were added, and then frozensets;
should we be adding mutable strings too (a bit like C#'s StringBuilder)?
(Just wondering...)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Christian Heimes
Am 12.02.2013 22:32, schrieb Antoine Pitrou:
> For the record, io.StringIO should be quite fast in 3.3.
> (except for the method call overhead that Guido is complaining
> about :-))

AFAIK it's not the actual *call* of the method that is slow, but rather
attribute lookup and creation of bound method objects. If speed is of
the essence, code can cache the method object locally:

strio = io.StringIO()
write = strio.write
for element in elements:
write(element)
result = strio.getvalue()


Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 1:28 AM, Christian Heimes  wrote:
> Am 12.02.2013 22:32, schrieb Antoine Pitrou:
>> For the record, io.StringIO should be quite fast in 3.3.
>> (except for the method call overhead that Guido is complaining
>> about :-))
>
> AFAIK it's not the actual *call* of the method that is slow, but rather
> attribute lookup and creation of bound method objects. If speed is of
> the essence, code can cache the method object locally:
>
> strio = io.StringIO()
> write = strio.write
> for element in elements:
> write(element)
> result = strio.getvalue()

And this is a great example of muddying code in stdlib for the sake of
speeding up CPython.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 1:20 AM, MRAB  wrote:
> On 2013-02-12 21:44, Antoine Pitrou wrote:
>>
>> On Tue, 12 Feb 2013 16:40:38 -0500
>> Ned Batchelder  wrote:
>>>
>>>
>>> But the only reason "".join() is a Python idiom in the first place is
>>> because it was "the fast way" to do what everyone initially coded as "s
>>> += ...".   Just because we all learned a long time ago that joining was
>>> the fast way to build a string doesn't mean that "".join() is the clean
>>> idiomatic way to do it.
>>
>>
>> It's idiomatic because strings are immutable (by design, not because of
>> an optimization detail) and therefore concatenation *has* to imply
>> building a new string from scratch.
>>
> Tuples are much like immutable lists; sets were added, and then frozensets;
> should we be adding mutable strings too (a bit like C#'s StringBuilder)?
> (Just wondering...)

Isn't bytearray what you're looking for?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Nick Coghlan
On 13 Feb 2013 07:08, "Maciej Fijalkowski"  wrote:
>
> Hi
>
> We recently encountered a performance issue in stdlib for pypy. It
> turned out that someone commited a performance "fix" that uses += for
> strings instead of "".join() that was there before.
>
> Now this hurts pypy (we can mitigate it to some degree though) and
> possible Jython and IronPython too.
>
> How people feel about generally not having += on long strings in
> stdlib (since the refcount = 1 thing is a hack)?
>
> What about other performance improvements in stdlib that are
> problematic for pypy or others?
>
> Personally I would like cleaner code in stdlib vs speeding up CPython.

For the specific case of "Don't rely on the fragile refcounting hack in
CPython's string concatenation" I strongly agree. However, as a general
principle, I can't agree until speed.python.org is a going concern and we
can get a reasonable overview of any resulting performance implications.

Regards,
Nick.

> Typically that also helps pypy so I'm not unbiased.
>
> Cheers,
> fijal
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-12 Thread Christian Tismer

Hi friends,

_efficient string concatenation_ has been a topic in 2004.
Armin Rigo proposed a patch with the name of the subject,
more precisely:

/[Patches] [ python-Patches-980695 ] efficient string concatenation//
//on sourceforge.net, on 2004-06-28.//
/
This patch was finally added to Python 2.4 on 2004-11-30.

Some people might remember the larger discussion if such a patch should be
accepted at all, because it changes the programming style for many of us
from "don't do that, stupid" to "well, you may do it in CPython", which 
has quite

some impact on other implementations (is it fast on Jython, now?).

It changed for instance my programming and teaching style a lot, of course!

But I think nobody but people heavily involved in PyPy expected this:

Now, more than eight years after that patch appeared and made it into 2.4,
PyPy (!) still does _not_ have it!

Obviously I was mislead by other optimizations, and the fact that
this patch was from a/the major author of PyPy who invented the initial
patch for CPython. That this would be in PyPy as well sooner or later was
without question for me. Wrong... ;-)

Yes, I agree that for PyPy it is much harder to implement without the
refcounting trick, and probably even more difficult in case of the JIT.

But nevertheless, I tried to find any reference to this missing crucial 
optimization,

with no success after an hour (*).

And I guess many other people are stepping in the same trap.

So I can imagine that PyPy looses some of its speed in many programs, 
because
Armin's great hack did not make it into PyPy, and this is not loudly 
declared

somewhere. I believe the efficiency of string concatenation is something
that people assume by default and add it to the vague CPython compatibility
claim, if not explicitly told otherwise.



Some silly proof, using python 2.7.3 vs PyPy 1.9:


$ cat strconc.py
#!env python

from timeit import default_timer as timer

tim = timer()

s = ''
for i in xrange(10):
 s += 'X'

tim = timer() - tim

print 'time for {} concats = {:0.3f}'.format(len(s), tim)



$ python strconc.py
time for 10 concats = 0.028
$ pypy strconc.py
time for 10 concats = 0.804


Something is needed - a patch for PyPy or for the documentation I guess.

This is not just some unoptimized function in some module, but it is used
all over the place and became a very common pattern since introduced.

/How ironic that a foreseen problem occurs _now_, and _there_ :-)//
/
cheers -- chris


(*)
http://pypy.readthedocs.org/en/latest/cpython_differences.html
http://pypy.org/compat.html
http://pypy.org/performance.html

--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Alexandre Vassalotti
On Tue, Feb 12, 2013 at 1:44 PM, Antoine Pitrou  wrote:

> It's idiomatic because strings are immutable (by design, not because of
> an optimization detail) and therefore concatenation *has* to imply
> building a new string from scratch.
>

Not necessarily. It is totally possible to implement strings such they are
immutable and  concatenation takes O(1): ropes are the canonical example of
this.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Christian Tismer

On 12.02.13 22:03, Maciej Fijalkowski wrote:

Hi

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance "fix" that uses += for
strings instead of "".join() that was there before.

Now this hurts pypy (we can mitigate it to some degree though) and
possible Jython and IronPython too.

How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?

What about other performance improvements in stdlib that are
problematic for pypy or others?

Personally I would like cleaner code in stdlib vs speeding up CPython.
Typically that also helps pypy so I'm not unbiased.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/tismer%40stackless.com


Howdy.

Funny coincidence that this issue came up an hour after I asked about
string_concat optimization absence on the pypy channel.

I did not read email while writing the "efficient string concatenation"
re-iteration._
_
Maybe we should use the time machine, go backwards and undo the
patch, although it still makes a lot of sense and is fastest, opcode-wise,
at least on CPython.

Which will not matter so much for PyPy of course because _that_ goes away.

Alas, the damage to the mindsets already has happened, and the cure
will probably be as hard as the eviction of the print statement, after all.

But since I'm a complete Python 3.3 convert (with consequent changes
to my projects which was not so trivial),
I think to also start publicly saying that "s += t" is a pattern that should
not be used in the Gigabyte domain, from 2013.

Actually a tad, because it contradicted normal programming patterns
in an appealing way. Way too sexy...

But let's toss it. Keep the past eight years in good memories as an 
exceptional

period of liberal abuse. Maybe we should add it as an addition to the
"Zen of Python":
There are obviously good things, but "obvious" is the finest liar.

--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Christian Tismer

On 13.02.13 02:09, Alexandre Vassalotti wrote:
On Tue, Feb 12, 2013 at 1:44 PM, Antoine Pitrou > wrote:


It's idiomatic because strings are immutable (by design, not
because of
an optimization detail) and therefore concatenation *has* to imply
building a new string from scratch.


Not necessarily. It is totally possible to implement strings such they 
are immutable and  concatenation takes O(1): ropes are the canonical 
example of this.


Ropes have been implemented by Carl-Friedrich Bolz in 2007 as I remember.
No idea what the impact was, if any at all.
Would ropes be an answer (and a simple way to cope with string mutation
patterns) as an alternative implementation, and therefore still justify
the usage of that pattern?

--
Christian Tismer :^)   
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Terry Reedy

On 2/12/2013 6:20 PM, MRAB wrote:


Tuples are much like immutable lists; sets were added, and then frozensets;
should we be adding mutable strings too (a bit like C#'s StringBuilder)?
(Just wondering...)


StringIO is effectively a mutable string with a file interface.
sio.write('abc') is the equivalent of lis.extend(['a', 'b', 'c']).


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Terry Reedy

On 2/12/2013 4:03 PM, Maciej Fijalkowski wrote:

Hi

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance "fix" that uses += for
strings instead of "".join() that was there before.

Now this hurts pypy (we can mitigate it to some degree though) and
possible Jython and IronPython too.

How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?

What about other performance improvements in stdlib that are
problematic for pypy or others?

Personally I would like cleaner code in stdlib vs speeding up CPython.
Typically that also helps pypy so I'm not unbiased.


I agree. sum() refuses to sum strings specifically to encourage .join().

>>> sum(('x', 'b'), '')
Traceback (most recent call last):
  File "", line 1, in 
sum(('x', 'b'), '')
TypeError: sum() can't sum strings [use ''.join(seq) instead]

The doc entry for sum says the same thing.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Wed, 13 Feb 2013 00:28:15 +0100
Christian Heimes  wrote:
> Am 12.02.2013 22:32, schrieb Antoine Pitrou:
> > For the record, io.StringIO should be quite fast in 3.3.
> > (except for the method call overhead that Guido is complaining
> > about :-))
> 
> AFAIK it's not the actual *call* of the method that is slow, but rather
> attribute lookup and creation of bound method objects.

Take a look at http://bugs.python.org/issue17170

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Wed, 13 Feb 2013 09:39:23 +1000
Nick Coghlan  wrote:
> On 13 Feb 2013 07:08, "Maciej Fijalkowski"  wrote:
> >
> > Hi
> >
> > We recently encountered a performance issue in stdlib for pypy. It
> > turned out that someone commited a performance "fix" that uses += for
> > strings instead of "".join() that was there before.
> >
> > Now this hurts pypy (we can mitigate it to some degree though) and
> > possible Jython and IronPython too.
> >
> > How people feel about generally not having += on long strings in
> > stdlib (since the refcount = 1 thing is a hack)?
> >
> > What about other performance improvements in stdlib that are
> > problematic for pypy or others?
> >
> > Personally I would like cleaner code in stdlib vs speeding up CPython.
> 
> For the specific case of "Don't rely on the fragile refcounting hack in
> CPython's string concatenation" I strongly agree. However, as a general
> principle, I can't agree until speed.python.org is a going concern and we
> can get a reasonable overview of any resulting performance implications.

Anybody can run the benchmark suite for himself, speed.p.o is
(fortunately) not a roadblock:
http://bugs.python.org/issue17170

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Wed, 13 Feb 2013 08:16:21 +0100
Antoine Pitrou  wrote:
> On Wed, 13 Feb 2013 09:39:23 +1000
> Nick Coghlan  wrote:
> > On 13 Feb 2013 07:08, "Maciej Fijalkowski"  wrote:
> > >
> > > Hi
> > >
> > > We recently encountered a performance issue in stdlib for pypy. It
> > > turned out that someone commited a performance "fix" that uses += for
> > > strings instead of "".join() that was there before.
> > >
> > > Now this hurts pypy (we can mitigate it to some degree though) and
> > > possible Jython and IronPython too.
> > >
> > > How people feel about generally not having += on long strings in
> > > stdlib (since the refcount = 1 thing is a hack)?
> > >
> > > What about other performance improvements in stdlib that are
> > > problematic for pypy or others?
> > >
> > > Personally I would like cleaner code in stdlib vs speeding up CPython.
> > 
> > For the specific case of "Don't rely on the fragile refcounting hack in
> > CPython's string concatenation" I strongly agree. However, as a general
> > principle, I can't agree until speed.python.org is a going concern and we
> > can get a reasonable overview of any resulting performance implications.
> 
> Anybody can run the benchmark suite for himself, speed.p.o is
> (fortunately) not a roadblock:
> http://bugs.python.org/issue17170

And I meant to paste the repo URL actually:
http://hg.python.org/benchmarks/

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [pypy-dev] efficient string concatenation (yep, from 2004)

2013-02-12 Thread Maciej Fijalkowski
Hi Christian.

We have it, just not enabled by default. --objspace-with-strbuf I think

On Wed, Feb 13, 2013 at 1:53 AM, Christian Tismer  wrote:
> Hi friends,
>
> efficient string concatenation has been a topic in 2004.
> Armin Rigo proposed a patch with the name of the subject,
> more precisely:
>
> [Patches] [ python-Patches-980695 ] efficient string concatenation
> on sourceforge.net, on 2004-06-28.
>
> This patch was finally added to Python 2.4 on 2004-11-30.
>
> Some people might remember the larger discussion if such a patch should be
> accepted at all, because it changes the programming style for many of us
> from "don't do that, stupid" to "well, you may do it in CPython", which has
> quite
> some impact on other implementations (is it fast on Jython, now?).
>
> It changed for instance my programming and teaching style a lot, of course!
>
> But I think nobody but people heavily involved in PyPy expected this:
>
> Now, more than eight years after that patch appeared and made it into 2.4,
> PyPy (!) still does _not_ have it!
>
> Obviously I was mislead by other optimizations, and the fact that
> this patch was from a/the major author of PyPy who invented the initial
> patch for CPython. That this would be in PyPy as well sooner or later was
> without question for me. Wrong... ;-)
>
> Yes, I agree that for PyPy it is much harder to implement without the
> refcounting trick, and probably even more difficult in case of the JIT.
>
> But nevertheless, I tried to find any reference to this missing crucial
> optimization,
> with no success after an hour (*).
>
> And I guess many other people are stepping in the same trap.
>
> So I can imagine that PyPy looses some of its speed in many programs,
> because
> Armin's great hack did not make it into PyPy, and this is not loudly
> declared
> somewhere. I believe the efficiency of string concatenation is something
> that people assume by default and add it to the vague CPython compatibility
> claim, if not explicitly told otherwise.
>
> 
>
> Some silly proof, using python 2.7.3 vs PyPy 1.9:
>
> $ cat strconc.py
> #!env python
>
> from timeit import default_timer as timer
>
> tim = timer()
>
> s = ''
> for i in xrange(10):
>  s += 'X'
>
> tim = timer() - tim
>
> print 'time for {} concats = {:0.3f}'.format(len(s), tim)
>
>
> $ python strconc.py
> time for 10 concats = 0.028
> $ pypy strconc.py
> time for 10 concats = 0.804
>
>
> Something is needed - a patch for PyPy or for the documentation I guess.
>
> This is not just some unoptimized function in some module, but it is used
> all over the place and became a very common pattern since introduced.
>
> How ironic that a foreseen problem occurs _now_, and _there_ :-)
>
> cheers -- chris
>
>
> (*)
> http://pypy.readthedocs.org/en/latest/cpython_differences.html
> http://pypy.org/compat.html
> http://pypy.org/performance.html
>
> --
> Christian Tismer :^)   
> Software Consulting  : Have a break! Take a ride on Python's
> Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
> 14482 Potsdam: PGP key -> http://pgp.uni-mainz.de
> phone +49 173 24 18 776  fax +49 (30) 700143-0023
> PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
>   whom do you want to sponsor today?   http://www.stackless.com/
>
>
> ___
> pypy-dev mailing list
> pypy-...@python.org
> http://mail.python.org/mailman/listinfo/pypy-dev
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-12 Thread Lennart Regebro
> Something is needed - a patch for PyPy or for the documentation I guess.

Not arguing that it wouldn't be good, but I disagree that it is needed.

This is only an issue when you, as in your proof, have a loop that
does concatenation. This is usually when looping over a list of
strings that should be concatenated together. Doing so in a loop with
concatenation may be the natural way for people new to Python, but the
"natural" way to do it in Python is with a ''.join() call.

This:

s = ''.join(('X' for x in xrange(x)))

Is more than twice as fast in Python 2.7 than your example. It is in
fact also slower in PyPy 1.9 than Python 2.7, but only with a factor
of two:

Python 2.7:
time for 1000 concats = 0.887
Pypy 1.9:
time for 1000 concats = 1.600

(And of course s = 'X'* x takes only a bout a hundredth of the time,
but that's cheating. ;-)

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Alexandre Vassalotti
On Tue, Feb 12, 2013 at 5:25 PM, Christian Tismer wrote:

> Would ropes be an answer (and a simple way to cope with string mutation
> patterns) as an alternative implementation, and therefore still justify
> the usage of that pattern?
>

I don't think so. Ropes are really useful when you work with gigabytes of
data, but unfortunately they don't make good general-purpose strings.
Monolithic arrays are much more efficient and simple for the typical
use-cases we have in Python.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Nick Coghlan
On Wed, Feb 13, 2013 at 5:42 PM, Alexandre Vassalotti
 wrote:
> On Tue, Feb 12, 2013 at 5:25 PM, Christian Tismer 
> wrote:
>>
>> Would ropes be an answer (and a simple way to cope with string mutation
>> patterns) as an alternative implementation, and therefore still justify
>> the usage of that pattern?
>
>
> I don't think so. Ropes are really useful when you work with gigabytes of
> data, but unfortunately they don't make good general-purpose strings.
> Monolithic arrays are much more efficient and simple for the typical
> use-cases we have in Python.

If I recall correctly, io.StringIO and io.BytesIO have been updated to
use ropes internally in 3.3. Writing to one of those and then calling
getvalue() at the end is the main alternative to the list+join trick
(when concatenating many small strings, the memory saving relative to
a list can be notable since strings have a fairly large per-instance
overhead).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com