Re: [Python-Dev] Retrieve an arbitrary element from a set without removing it

2009-10-26 Thread Chris Bergstresser
On Mon, Oct 26, 2009 at 11:38 AM, Guido van Rossum  wrote:
> - If sets were to grow an API to non-destructively access the object
> stored in the set for a particular key, then dicts should have such a
> method too.
>
> - Ditto for an API to non-destructively get an arbitrary element.
>
> - I'm far from convinced that we urgently need either API. But I'm
> also not convinced it's unneeded.

   These clearly aren't urgently needed, but I do think they're needed
and useful.  For those who want a use-case for getting an arbitrary
element from a set, I've run into the need several times over the last
year, and each time I'm a little surprised I had the need and a little
surprised there wasn't an good way of going about it.
   In the most recent example, I was writing some UI code.  I had a
set of all the open window references so I could clean them up at the
end of the program.  I needed to call some API function that required
a window reference as the first argument, but it returned a global
value common to all the window references.

   I like the proposed set.get() method, personally.  list.get(index)
gets the item at that index, dict.get(key) gets the item associated
with that key, set.get() gets an item, but doesn't place any
guarantees on which item is returned.  Makes sense to me.  I also like
the idea there aren't any guarantees about which item is returned--it
allows subclasses to implement different behavior (so one might always
return the last item placed in the set, another could always return a
random item, another could implement some round-robin behavior, and
all would fulfill the contract of the set class).
   The existing methods aren't great for accomplishing this, mainly
because they're obfuscatory.  "iter(s).next()" is probably clearest,
and at least will throw a StopIteration exception if the set is empty.
 "for x in s: break" is just confusing, easy to accidentally confuse
with "for x in s: pass", and causes an unrevealing NameError if the
set is empty.  Add in all the other idioms for accomplishing the same
thing ("x, = s", etc.) and I think there's a good argument for adding
the method to sets, if only to provide a single obvious way of doing
it--and throwing a single, appropriate exception if the set is empty.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a set without removing it

2009-10-27 Thread Chris Bergstresser
On Tue, Oct 27, 2009 at 11:06 AM, Georg Brandl  wrote:
> Sorry to nitpick, but there is no list.get().

   No?  How ... odd.  I guess it wouldn't have come up, but I was sure
there was a .get method which took an optional default parameter if
the index didn't exist, mirroring the dict method.  Still, I think my
point stands--it's a clear extrapolation from the existing dict.get().

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a set withoutremoving it

2009-10-27 Thread Chris Bergstresser
On Tue, Oct 27, 2009 at 12:47 PM, Raymond Hettinger  wrote:
> [Chris Bergstresser]
>  Still, I think my
>>
>> point stands--it's a clear extrapolation from the existing dict.get().
>
> Not really.  One looks-up a key and supplies a default value if not found.
> The other, set.get(), doesn't have a key to lookup.

Right, which is why dict.get() takes a key as an argument, while
the proposed set.get() wouldn't.

> A dict.get() can be meaningfully used in a loop (because the key can vary).
> A set.get() returns the same value over and over again (because there is no
> key).

I don't think "can it be used meaningfully in a loop?" is an
especially interesting or useful way of evaluating language features.
Besides, why would set.get() necessarily return the same value
over and over again?  I thought it would be defined to return an
arbitrary value--which could be the same value over and over again,
but could just as easily be defined to return a round-robin value, or
the minimum value, or some *other* value as the container defined it.
   The fact is, set.get() succinctly describes an action which is
otherwise obscure in Python.  It doesn't come up all that frequently,
but when it does the alternatives are poor at best.  Add in the
uncertainty about which is optimized (imagine the situation where the
set you're calling is actually a proxy for an object across the
network, and constructing an iterator is expensive) and you start to
see the value.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a set withoutremoving it

2009-10-30 Thread Chris Bergstresser
On Fri, Oct 30, 2009 at 8:29 PM, Steven D'Aprano  wrote:
>> > Iterating over an iterable is
>> > what iterators are for.
>
> set.get(), or set.pick() as Wikipedia calls it, isn't for iterating over
> sets. It is for getting an arbitrary element from the set.
>
> If the requirement that get/pick() cycles through the sets elements is
> the killer objection to this proposal, I'd be willing to drop it. I
> thought that was part of the OP's request, but apparently it isn't. I
> would argue that it's hardly "arbitrary" if you get the same element
> every time you call the method, but if others are happy with that
> behaviour, I'm not going to stand in the way.

   It's arbitrary in the sense that the API doesn't make any
assurances which item the caller will get, not that it's arbitrary for
any particular * implementation*.

> The purpose is to
> return an arbitrary item each time it is called. If two threads get the
> same element, that's not a problem; if one thread misses an element
> because another thread grabbed it first, that's not a problem either.
> If people prefer a random element instead, I have no problem with
> that -- personally I think that's overkill, but maybe that's just me.

   I also think returning a random one is overkill, in the base set.
And I'd have the base implementation do the simplest thing possible:
return a fixed element (either the first returned when iterated over,
or the last stored, or whatever's easiest based on the internals).
For me, leaving the choice of *which* element to return on each call
is a feature.  It allows subclasses to change the behavior to support
other use cases, like a random or round-robin behavior.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Wed, Nov 4, 2009 at 7:07 PM, Raymond Hettinger  wrote:
> [Steven D'Aprano]
>>> Anyway, given the level of opposition to the suggestion, I'm no longer
>>> willing to carry the flag for it. If anyone else -- perhaps the OP --
>>> feels they want to take it any further, be my guest.

   I feel pretty strongly that it's a wart in the language, and a
sufficiently strong one that it should be remedied.  I'm happy to
champion it, but haven't the faintest idea what that entails.

> Summarizing my opposition to a new set method:
> 1) there already are at least two succinct ways to get the same effect
> 2) those ways work with any container, not just sets
> 3) set implementations in other languages show that this isn't needed.
> 4) there is value to keeping the API compact
> 5) isn't needed for optimization (selecting the same value in a loop makes
> no sense)
> 6) absence of real-world code examples that would be meaningfully improved
>
> I would be happy to add an example to the docs so that this thread
> can finally end.

   Adding an example to the docs does not solve the problem, which is
if you come across the following code:

 for x in s:
 break

... it really looks like it does nothing.  It's only because of the
slightly idiosyncratic way Python handles variable scoping that it has
an effect at all, and that effect isn't overtly related to what the
code says, which is "Iterate over all the elements in this set, then
immediately stop after the first one".  s.get() or s.pick() are both
more succinct and more clear, saying "Get me an arbitrary element from
this set".  To make matters worse, "for x in s: break" fails silently
when s is empty, and "x = iter(s).next()" raises a StopIteration
exception.  Neither is clear.
   The obvious way, for newcomers, of achieving the effect is:

 x = s.pop()
 s.add(x)

... and that's simply horrible in terms of efficiency.  So the
"obvious" way of doing it in Python is wrong(TM), and the "correct"
way of doing it is obscure and raises misleading exceptions.

I suppose, mulling things over, the method should be called
.pick(), which avoids any confusion with .get().  And, as I've stated,
I think it should return a member of the set, with no guarantees what
member of the set is returned.  It could be the same one every time,
or a random one, or the last one placed in the set.
   For cases where people want to cycle through the members of the set
in a predictable order, they can either copy the contents into a list
(sorted or unsorted) *or* subclass set and override the .pick() method
to place stronger guarantees on the API.

So, summarizing my responses:

1) the two succinct ways are unclear and not immediately obvious
2) the existing methods aren't needed for other objects
3) set implementations in other languages are irrelevant
4) this is a small, targeted change which not make the API disordered or unruly
5) could very well be needed for optimization, in cases where
constructing an iterator is expensive
6) there have been several real-world examples posted which would be
improved by this change

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Thu, Nov 5, 2009 at 3:21 PM, "Martin v. Löwis"  wrote:
> There are two ways
>
> a) write a library that provides what you want, publish it on PyPI,
>   and report back in a few years of how many users your library has,
>   what they use it for, and why it should become builtin

This clearly isn't called for in this case.  We're talking about a
single function on a collection.  In this case, importing an
alternative set API (and maintaining the dependency) is more work than
just writing your own workaround.  The purpose of adding a method is
to prevent the need of everyone writing their own workaround.

> b) write a PEP, wait a few years for the language moratorium to be
>   lifted, provide an implementation, and put the PEP for pronouncement.
>   Careful reading of the Moratorium PEP may allow shortening of the
>   wait.

   Clearly, I'll need to write up the PEP.

> In any case, it seems that this specific change will see some
> opposition. So you will need to convince the opposition, one way or
> the other.

   I doubt some of the people on either side are going to be
convinced.  I'd settle for convincing most of the fence-sitters, along
with a few of the loyal opposition.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from asetwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Thu, Nov 5, 2009 at 5:02 PM, Raymond Hettinger  wrote:
> Forgot to post the code.  It is short, fast, and easy.  It is explicit about
> handing the case with an empty input.  And it is specific about which value
> it returns (always the first iterated value; not an arbitrary one).  There's
> no guessing about what it does.  It gets the job done.

I'm trying to take this suggestion in the best possible light,
which is that you honestly think I didn't read past Chapter 3 of the
Python Tutorial, and I am therefore in fact unfamiliar with function
definitions.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Thu, Nov 5, 2009 at 6:30 PM, geremy condra  wrote:
> I'm testing the speed because the claim was made that the pop/add
> approach was inefficient. Here's the full quote:
>
>>    The obvious way, for newcomers, of achieving the effect is:
>>
>>  x = s.pop()
>>  s.add(x)
>>
>> ... and that's simply horrible in terms of efficiency.  So the
>> "obvious" way of doing it in Python is wrong(TM), and the "correct"
>> way of doing it is obscure and raises misleading exceptions.

   I was talking mainly from a theoretical standpoint, and because the
library I'm working on is designed to work seamlessly over the
network.  In those cases, where the set the user is working with is
actually a proxy object across the wire, the time to acquire the
locks, remove the object, release the locks, reacquire the locks, add
the object, then rerelease the locks is *significantly* more expensive
than just noting the set hasn't changed and returning a cached object
from it.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from asetwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Thu, Nov 5, 2009 at 11:43 PM, "Martin v. Löwis"  wrote:
> I read Raymond's suggestion rather as a question: why bother with a
> tedious, multi-year process, when a three-line function will achieve
> exactly the same?

   Because it doesn't achieve exactly the same.  What I want is a
sane, rational way to describe what I'm doing in the core API, so
other programmers learning the language don't spend the amount of time
I did perplexed that there was a .pop() and a .remove() and a
.discard(), but there wasn't a .pick().  I don't want to have to write
the same little helper function in every project to fill a deficiency
in the library.  I don't want to have to argue about the flaws in
solutions with race conditions, or the fact that cheap functions
become expensive functions when performed over the network, or that
there's a real value in having an atomic operation which throws a sane
exception when it fails, or how it's disturbing to my OCD core to have
an API which believes:

  if x in s:
s.remove(x)

... is too confusing, so there should be a .discard() method, but ...

  for x in s:
break

... is self-evident and obvious, so there's no need for a .pick().

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-21 Thread Chris Bergstresser
On Thu, Jan 21, 2010 at 9:49 PM, Tres Seaver  wrote:
> IIUC, optimizing your application using standard (non-JITed) profiling
> tools would still be a win for the app when run under the JIT, because
> your are going to be trimming code / using better algorithms, which will
> tend to provide "orthagonal" speedups to anything the JIT does.  The
> worst case would be that you hand-optimze the code to the point that the
> JIT can't help any longer, kind of like writing libc syscalls in
> assembler rather than C.

   You'd hope.  I don't think it's quite that simple, though.
   The problem is code might have completely different hotspots with
the JIT than without it.  The worst case in this scenario would be
that some code takes 1 second to run function A and 30 seconds to run
function B without the JIT, but 30 seconds to run function A and 1
second to run function B with the JIT.  The profiler's telling you to
put all your effort into fixing function A, but you won't see any
significant performance gains no matter how often you change it.
   Generally, that's not going to be the case.  But the broader
point--that you've no longer got an especially good idea of what's
taking time to run in your program--is still very valid.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Summary of 2 years of Python fuzzing

2010-01-27 Thread Chris Bergstresser
On Wed, Jan 27, 2010 at 2:54 AM, Ben Finney  wrote:
> Neal Norwitz  writes:
>> I definitely hope you continue to find and fix problems in Python. It
>> helps everyone who uses Python even those who will never know to thank
>> you. Who knows, someone might even write a book about Fusil someday
>> about a topic as obscure as Beautiful Testing. :-)
>
> Your suggested title is already taken, though, for exactly this purpose.
> The book “Beautiful Testing”, published by O'Reilly, might help
> http://oreilly.com/catalog/9780596159825>.

   I suspect Neal already knows that, since he cowrote chapter 9
"Beautiful is Better than Ugly".

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Set the namespace free!

2010-07-22 Thread Chris Bergstresser
On Thu, Jul 22, 2010 at 10:37 AM, Antoine Pitrou  wrote:
> On Thu, 22 Jul 2010 16:54:58 +0100
> Georg Brandl  wrote:
>>
>> That also has the advantage of introducing a measure of much needed
>> compatibility with industry-leading web programming languages.
>
> Also, Python would gain much needed flexibility if we allowed indirect
> name lookup using `$$foo`. Current abstractions are too poor compared
> to best-of-breed OO languages such as PHP or Perl 5.

   Let's not forget additional lookup operators, like %foo, to specify
the kind of lookup we're interested in (whether we want the result as
a dict vs. list vs. whatever).  We could even allow overloading
(something like object.__$__) to allow objects to customize the
results of their lookup operations.
   Really, I think with this and a world-class regex implementation
we'll be well-positioned when the Internet finally hits it big.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Question on bz2 codec. Is this a bug?

2010-09-29 Thread Chris Bergstresser
Hi all --

   I looked through the bug tracker, but I didn't see this listed.  I
was trying to use the bz2 codec, but it seems like it's not very
useful in the current form (and I'm not sure if it's getting added
back to py3k, so maybe this is a moot point).  It looks like the codec
writes every piece of data fed to it as a separate compressed block.
This results in compressed files which are significantly larger than
the uncompressed files, if you're writing a lot of small bursts of
data.  It also leads to interesing oddities like this:

import codecs

with codecs.open('text.bz2', 'w', 'bz2') as f:
for x in xrange(20):
f.write('This is data %i\n' % x)

with codecs.open('text.bz2', 'r', 'bz2') as f:
print f.read()

This prints "This is data 0" and exits, because the codec won't read
beyond the first compressed block.

My question is, is this known, intended behavior?  Should I open a bug
report?  Is it going away in py3k, so there's no real point in fixing
it?

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question on bz2 codec. Is this a bug?

2010-09-29 Thread Chris Bergstresser
On Wed, Sep 29, 2010 at 5:23 PM, Antoine Pitrou  wrote:
> Anyway, the obvious way to write line-by-line to a bz2 file is to use
> the BZ2File class!

   The BZ2File class does not allow you to open a file for appending.
   Using the incremental encoder does work, which leads to the obvious
question of why the codecs.open() method doesn't use the incremental
method by default, at least in this case.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Question on bz2 codec. Is this a bug?

2010-09-29 Thread Chris Bergstresser
On Wed, Sep 29, 2010 at 5:59 PM, Antoine Pitrou  wrote:
> Le mercredi 29 septembre 2010 à 17:41 -0400, Chris Bergstresser a
> écrit :
>> On Wed, Sep 29, 2010 at 5:23 PM, Antoine Pitrou  wrote:
>> > Anyway, the obvious way to write line-by-line to a bz2 file is to use
>> > the BZ2File class!
>>
>>    The BZ2File class does not allow you to open a file for appending.
>>    Using the incremental encoder does work,
>
> In what sense? Do you mean it adds a new bz2 stream at the end of the
> existing file?

   Yes.  If you open an existing bz2 file for appending and use the
incremental encoder to encode the data you write to it, you end up
with a single file containing two separate bz2 compressed blocks of
data.  The bunzip2 program handles multiple streams in a single file
correctly, and there's a bug open (complete with working patch) in the
Python tracker to handle them as well.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com