Re: Dictionary keys (again) (was Re: lambda)
In article <[EMAIL PROTECTED]>,
Nick Coghlan <[EMAIL PROTECTED]> wrote:
> For a 'mutable key' to make sense, the following:
>
> lst = []
> dct = {l: "Hi!"}
> print dct[[]]
> print dct[lst]
> lst.append(1)
> print dct[[1]]
> print dct[lst]
>
> Should print:
> Hi
> Hi
> Hi
> Hi
Yes, and what should the following do?
lst1 = [1]
lst2 = [2]
dct = {lst1: "1", lst2: "2"}
lst2[0]=1
lst1[0]=2
print dct[[1]]
print dct[[2]]
--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
--
http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] combinatorics fun
In article <[EMAIL PROTECTED]>,
"Xah Lee" <[EMAIL PROTECTED]> wrote:
> combo(n) returns a collection with elements of pairs that is all
> possible combinations of 2 things from n. For example, combo(4)
> returns {'3,4' => ['3',4],'1,2' => [1,2],'1,3' => [1,3],'1,4' =>
> [1,4],'2,3' => ['2',3],'2,4' => ['2',4]}; Hash form is returned
> instead of array for this program.
def combo(n):
return dict([('%d,%d'%(i,j),(i,j))
for j in range(n) for i in range(j)])
import pprint
pprint.pprint(combo(5))
output:
{'0,1': (0, 1),
'0,2': (0, 2),
'0,3': (0, 3),
'0,4': (0, 4),
'1,2': (1, 2),
'1,3': (1, 3),
'1,4': (1, 4),
'2,3': (2, 3),
'2,4': (2, 4),
'3,4': (3, 4)}
Note I'm using 0-based indexing, use range(1,n+1) and range(1,j+1)
instead if you really need it to be 1-based.
Also I'm using Python 2.3, I think in 2.4 you can take out the square
brackets and it would still work.
--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
--
http://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't join() call str() on its arguments?
In article <[EMAIL PROTECTED]>, Leo Breebaart <[EMAIL PROTECTED]> wrote: > What I can't find an explanation for is why str.join() doesn't > automatically call str() on its arguments, so that e.g. > str.join([1,2,4,5]) would yield "1245", and ditto for e.g. > user-defined classes that have a __str__() defined. That would be the wrong thing to do when the arguments are unicodes. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: exercise: partition a list by equivalence
In article <[EMAIL PROTECTED]>,
"John Machin" <[EMAIL PROTECTED]> wrote:
> You don't need to think. This problem has been extensively picked over
> by folk who are a lot smarter than us, starting from 30 years ago.
> Google for "disjoint set" and "union-find". One gets the impression
> that the best possible algorithm would be slightly *worse* than O(n).
It can be solved by union-find
(e.g. with UnionFind from <http://www.ics.uci.edu/~eppstein/PADS/>):
import UnionFind
import sets
def merge(pairs):
uf = UnionFind.UnionFind()
items = sets.Set()
for a,b in pairs:
uf.union(a,b)
items.add(a)
items.add(b)
leaders = {}
for a in items:
leaders.setdefault(uf[a],sets.Set()).add(a)
return [list(component) for component in leaders.values()]
If you do all the unions before all the finds, as in this algorithm,
union-find is O(n).
However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm.
--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
--
http://mail.python.org/mailman/listinfo/python-list
Re: exercise: partition a list by equivalence
In article <[EMAIL PROTECTED]>,
David Eppstein <[EMAIL PROTECTED]> wrote:
> It can be solved by union-find
> (e.g. with UnionFind from <http://www.ics.uci.edu/~eppstein/PADS/>):
Here's a cleaned up version, after I added a proper iter() method to the
UnionFind data structure:
import UnionFind
def merge(pairs):
uf = UnionFind.UnionFind()
for a,b in pairs:
uf.union(a,b)
components = {}
for a in uf:
components.setdefault(uf[a],[]).append(a)
return components.values()
> If you do all the unions before all the finds, as in this algorithm,
> union-find is O(n).
That was a little too sloppy. It is possible to solve the union find
problem, with all unions before all finds, in time O(n). However the
usual union find data structure takes more time, O(n alpha(n)).
> However it might be easier to treat the input pairs as the edges of a
> graph and apply a depth-first-search based graph connected components
> algorithm.
This would be O(n), though.
--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
--
http://mail.python.org/mailman/listinfo/python-list
Re: exercise: partition a list by equivalence
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > David Eppstein: > > However it might be easier to treat the input pairs as the edges of a > > graph and apply a depth-first-search based graph connected components > > algorithm. || This would be O(n), though. > > Is the DFS the best one here (instead of BFS)? I'm not sure it makes much difference. > With the graph implementation that I've just shown here it becomes: > > . import graph > . def merge(l): > . g = graph.Graph() > . g.addArcs(l) > . return g.connectedComponents() > . print merge( [ [1,2], [2,4], [5,6] ] ) > > I presume the complexity is O(n+a); n (the nodes) is proportional to > the number of pairs, and > a (the arcs) depends on the "intricacy" of the input pairs. For a > complete graph, this can > become slow (but there is a high probably that all the pairs are in the > same connected component). It's still linear in the input size, which is the best you could hope for. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: exercise: partition a list by equivalence
In article <[EMAIL PROTECTED]>, "John Machin" <[EMAIL PROTECTED]> wrote: > > it would be nice if the two working programs do not use some package. > > This problem shouldn't need to. > > Look at the newsgroup again. By my count there are apparently-working > versions from SIX people: Reinhold, bear.+, Brian Beck, David Eppstein, > yourself, and me. Only David's uses stuff that's not in the Python 2.4 > off-the-shelf distribution. Where "stuff" means a single 62-line file that I linked to in my post. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] generic equivalence partition
In article <[EMAIL PROTECTED]>, "Xah Lee" <[EMAIL PROTECTED]> wrote: > parti(aList, equalFunc) > > given a list aList of n elements, we want to return a list that is a > range of numbers from 1 to n, partition by the predicate function of > equivalence equalFunc. (a predicate function is a function that > takes two arguments, and returns either True or False.) In Python it is much more natural to use ranges from 0 to n-1. In the worst case, this is going to have to take quadratic time (consider an equalFunc that always returns false) so we might as well do something really simple rather than trying to be clever. def parti(aList,equalFunc): eqv = [] for i in range(len(aList)): print i,eqv for L in eqv: if equalFunc(aList[i],aList[L[0]]): L.append(i) break; else: eqv.append([i]) If you really want the ranges to be 1 to n, add one to each number in the returned list-of-lists. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] generic equivalence partition
In article <[EMAIL PROTECTED]>, David Eppstein <[EMAIL PROTECTED]> wrote: > def parti(aList,equalFunc): > eqv = [] > for i in range(len(aList)): > print i,eqv > for L in eqv: > if equalFunc(aList[i],aList[L[0]]): > L.append(i) > break; > else: > eqv.append([i]) Um, take out the print, that was just there for me to debug my code. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: yield_all needed in Python
In article <[EMAIL PROTECTED]>, Douglas Alan <[EMAIL PROTECTED]> wrote: > > Cetainly, if > iterator> == , I don't see how anything > > is gained except for a few keystrokes. > > What's gained is making one's code more readable and maintainable, > which is the one of the primary reasons that I use Python. I don't see a lot of difference in readability and maintainability between the two versions. And if yield_all is going to expand into the loop, anyway, I'd prefer to make that obvious by using the for-loop version, rather than using a keyword and pretending that passing the iterators on has no overhead. If we're talking about machinery behind the scenes to shortcut chains of yield_all's, so that the time to pass items up through the chain is smaller than it would be in the for-loop case, I'd think that would be a better reason for a keyword, because it's not something that can be done very easily without one in the current language. I don't know how to make such shortcutting machinery faster than logarithmic in the worst case (taking into account the possibility that multiple generators could have yield_all's to the same iterator) but I think it could be made nearly constant time in most situations. On the other hand, I'm not convinced that this would be needed frequently enough to warrant the complexity of trying to optimize it. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
In article <[EMAIL PROTECTED]>, "Xah Lee" <[EMAIL PROTECTED]> wrote: > a absolute requirement in this problem is to minimize the number of > comparison made between files. This is a part of the spec. You need do no comparisons between files. Just use a sufficiently strong hash algorithm (SHA-256 maybe?) and compare the hashes. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
In article <[EMAIL PROTECTED]>, Patrick Useldinger <[EMAIL PROTECTED]> wrote: > > You need do no comparisons between files. Just use a sufficiently > > strong hash algorithm (SHA-256 maybe?) and compare the hashes. > > That's not very efficient. IMO, it only makes sense in network-based > operations such as rsync. Well, but the spec didn't say efficiency was the primary criterion, it said minimizing the number of comparisons was. More seriously, the best I can think of that doesn't use a strong slow hash would be to group files by (file size, cheap hash) then compare each file in a group with a representative of each distinct file found among earlier files in the same group -- that leads to an average of about three reads per duplicated file copy: one to hash it, and two for the comparison between it and its representative (almost all of the comparisons will turn out equal but you still need to check unless you use a strong hash). I'm assuming of course that there are too many files and/or they're too large just to keep them all in core. Anyone have any data on whether reading files and SHA-256'ing them (or whatever other cryptographic hash you think is strong enough) is I/O-bound or CPU-bound? That is, is three reads with very little CPU overhead really cheaper than one read with a strong hash? I guess it also depends on the number of files you expect to have duplicates of. If most of the files exist in only one copy, it's clear that the cheap hash will find them more cheaply than the expensive hash. In that case you could combine the (file size, cheap hash) filtering with the expensive hash and get only two reads per copy rather than three. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: [perl-python] a program to delete duplicate files
In article <[EMAIL PROTECTED]>, Patrick Useldinger <[EMAIL PROTECTED]> wrote: > > Well, but the spec didn't say efficiency was the primary criterion, it > > said minimizing the number of comparisons was. > > That's exactly what my program does. If you're doing any comparisons at all, you're not minimizing the number of comparisons. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
In article <[EMAIL PROTECTED]>, "John Machin" <[EMAIL PROTECTED]> wrote: > Just look at the efficiency of processing N files of the same size S, > where they differ after d bytes: [If they don't differ, d = S] I think this misses the point. It's easy to find the files that are different. Just a file size comparison will get most of them, and most of the rest can be detected in the first few kbytes. So I think it's safe to assume both of these filtering mechanisms would be incorporated into a good duplicate detection code, and that any remaining tests that would be performed are between files that are very likely to be the same as each other. The hard part is verifying that the files that look like duplicates really are duplicates. To do so, for a group of m files that appear to be the same, requires 2(m-1) reads through the whole files if you use a comparison based method, or m reads if you use a strong hashing method. You can't hope to cut the reads off early when using comparisons, because the files won't be different. The question to me is: when you have a group of m>2 likely-to-be-equal files, is 2(m-1) reads and very little computation better than m reads and a strong hash computation? I haven't yet seen an answer to this, but it's a question for experimentation rather than theory. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
In article <[EMAIL PROTECTED]>, Patrick Useldinger <[EMAIL PROTECTED]> wrote: > Shouldn't you add the additional comparison time that has to be done > after hash calculation? Hashes do not give 100% guarantee. When I've been talking about hashes, I've been assuming very strong cryptographic hashes, good enough that you can trust equal results to really be equal without having to verify by a comparison. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: a program to delete duplicate files
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (John J. Lee) wrote: > > If you read them in parallel, it's _at most_ m (m is the worst case > > here), not 2(m-1). In my tests, it has always significantly less than > > m. > > Hmm, Patrick's right, David, isn't he? Yes, I was only considering pairwise comparisons. As he says, simultaneously comparing all files in a group would avoid repeated reads without the CPU overhead of a strong hash. Assuming you use a system that allows you to have enough files open at once... > And I'm not sure what the trade off between disk seeks and disk reads > does to the problem, in practice (with caching and realistic memory > constraints). Another interesting point. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Pre-PEP: Dictionary accumulator methods
In article <[EMAIL PROTECTED]>, "Raymond Hettinger" <[EMAIL PROTECTED]> wrote: > The rationale is to replace the awkward and slow existing idioms for > dictionary > based accumulation: > > d[key] = d.get(key, 0) + qty > d.setdefault(key, []).extend(values) > > In simplest form, those two statements would now be coded more readably as: > >d.count(key) >d.appendlist(key, value) > > In their multi-value forms, they would now be coded as: > > d.count(key, qty) > d.appendlist(key, *values) > > The error messages returned by the new methods are the same as those returned > by > the existing idioms. > > The get() method would continue to exist because it is useful for > applications > other than accumulation. > > The setdefault() method would continue to exist but would likely not make it > into Py3.0. The other dictionary based accumulation I've used but don't see here is with sets as dictionary values, i.e. dictionary.setdefault(key,set()).add(value). I suppose it would be possible to appendlist then make a set from the list, but if you were to take that minimalist philosophy to an extreme you wouldn't need count either, you could just appendlist then use len. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Pre-PEP: Dictionary accumulator methods
In article <[EMAIL PROTECTED]>, "Raymond Hettinger" <[EMAIL PROTECTED]> wrote: > Also, in all of my code base, I've not run across a single opportunity to use > something like unionset(). In my code, this would be roughly tied with appendlist for frequency of use. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Help with generators outside of loops.
In article <[EMAIL PROTECTED]>, "Robert Brewer" <[EMAIL PROTECTED]> wrote: > > But I'm guessing that you can't index into a generator as if > > it is a list. > > row = obj.ExecSQLQuery(sql, args).next() I've made it a policy in my own code to always surround explicit calls to next() with try ... except StopIteration ... guards. Otherwise if you don't guard the call and you get an unexpected exception from the next(), within a call chain that includes a for-loop over another generator, then that other for-loop will terminate without any error messages and the cause of its termination can be very difficult to track down. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Help with generators outside of loops.
In article <[EMAIL PROTECTED]>, Andrea Griffini <[EMAIL PROTECTED]> wrote: > Isn't the handling of StopIteration confined in the very moment of > calling .next() ? This was what I expected... and from a simple test > looks also what is happening... Not if someone farther back in the call chain is looking for StopIteration. Which could be the case if the call chain includes a for-loop that is calling the next() method of another generator. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: guarding for StopIteration (WAS: Help with generators outside of loops.)
In article <[EMAIL PROTECTED]>, Steven Bethard <[EMAIL PROTECTED]> wrote: > David Eppstein wrote: > > I've made it a policy in my own code to always surround explicit calls > > to next() with try ... except StopIteration ... guards. > > > > Otherwise if you don't guard the call and you get an unexpected > > exception from the next(), within a call chain that includes a for-loop > > over another generator, then that other for-loop will terminate without > > any error messages and the cause of its termination can be very > > difficult to track down. > > Just to clarify here, the only time code raising a StopIteration will > cause a for-loop to exit silently is if the StopIteration is raised in > an __iter__ method, e.g.: Sure. But in the situation I was attempting to describe, the __iter__ method calls the code of the outer generator, which (perhaps nested several levels deep) may call an inner generator. In this case, the inner generator's StopIteration can terminate the outer loop. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Pre-PEP: Dictionary accumulator methods
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Aahz) wrote: > >I am surprised nobody suggested we put those two methods into a > >separate module (say dictutils or even UserDict) as functions: > > > >from dictutils import tally, listappend > > > >tally(mydict, key) > >listappend(mydict, key, value) > > That seems like a reasonable compromise. The more messages I see on this thread, the more I think adding a different new method for each commonly used kind of update is the wrong solution. We already have methods that work pretty well and, I think, read better than the new methods: mydict[key] += 1 mydict[key].append(value) The problem is merely that they don't work when key is missing, so we need to resort to setdefault circumlocutions instead. A better solution seems to be the one I've seen suggested here several times, of changing the dict's behavior so that the setdefault is automatic whenever trying to access a missing key. If this would be in a separate module or separate subclass of dict, so much the better. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools to iter transition (WAS: Pre-PEP: Dictionary accumulator methods)
In article <[EMAIL PROTECTED]>, Jack Diederich <[EMAIL PROTECTED]> wrote: > I only included making iter a type to make it more symmetric with str > being a type. iter is currently a function, as a practical matter I wouldn't > mind if it doubled as a namespace but that might make others flinch. iter having the attributes currently residing as methods in itertools sounds just fine to me. I really don't like iter as a type instead of a function, though. It sounds like a cool idea at first glance, but then you think about it and realize that (unlike what happens with any class name) iter(x) is almost never going to return an object of that type. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
Re: math - need divisors algorithm
> > Does anyone have suggested code for a compact, efficient, elegant, most of > > all pythonic routine to produce a list of all the proper divisors of an > > integer (given a list of prime factors/powers) I have code for this in <http://www.ics.uci.edu/~eppstein/numth/egypt/egypt.py> (look for the function called, surprisingly, "divisors"). It's not very compact -- 155 lines from the start of the Pollard Rho factorization (thanks to Kirby Urner) to the end of the divisors function, but it's reasonably efficient even for largish numbers. -- David Eppstein Computer Science Dept., Univ. of California, Irvine http://www.ics.uci.edu/~eppstein/ -- http://mail.python.org/mailman/listinfo/python-list
