Re: grouping a flat list of number by range
[EMAIL PROTECTED] wrote: > i'm looking for a way to have a list of number grouped by consecutive > interval, after a search, for example : > > [3, 6, 7, 8, 12, 13, 15] > > => > > [[3, 4], [6,9], [12, 14], [15, 16]] > > (6, not following 3, so 3 => [3:4] ; 7, 8 following 6 so 6, 7, 8 => > [6:9], and so on) > > i was able to to it without generators/yield but i think it could be > better with them, may be do you an idea? Sure: def group_intervals(it): it = iter(it) val = it.next() run = [val, val+1] for val in it: if val == run[1]: run[1] += 1 else: yield run run = [val, val+1] yield run --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: argmax
David Isaac wrote: > 2. Is this a good argmax (as long as I know the iterable is finite)? > def argmax(iterable): return max(izip( iterable, count() ))[1] Other than the subtle difference that Peter Otten pointed out, that's a good method. However if the iterable is a list, it's cleaner (and more efficient) to use seq.index(max(seq)). That way you won't be creating and comparing all those tuples. def argmax(it): try: it.index except AttributeError: it = list(it) # Or if it would too expensive to convert it to list: #return -max((v, -i) for i, v in enumerate(it))[1] return it.index(max(it)) --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Can Python format long integer 123456789 to 12,3456,789 ?
A.M wrote:
> Is there any built in feature in Python that can format long integer
> 123456789 to 12,3456,789 ?
The locale module can help you here:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>> locale.format('%d', 123456789, True)
'123,456,789'
Be sure to read the caveats for setlocale in the module docs:
http://docs.python.org/lib/node323.html
I'd recommend calling setlocale only once, and always at the start of
your program.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Can Python format long integer 123456789 to 12,3456,789 ?
John Machin wrote: > A.M wrote: > > Hi, > > > > Is there any built in feature in Python that can format long integer > > 123456789 to 12,3456,789 ? > > > > Sorry about my previous post. It would produce 123,456,789. > "12,3456,789" is weird -- whose idea PHB or yours?? If it's not a typo, it's probably a regional thing. See, e.g., http://en.wikipedia.org/wiki/Indian_numbering_system --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Function Verification
Ws wrote:
> I'm trying to write up a module that *safely* sets sys.stderr and
> sys.stdout, and am currently having troubles with the function
> verification. I need to assure that the function can indeed be called
> as the Python manual specifies that sys.stdout and sys.stderr should be
> defined (standard file-like objects, only requiring a function named
> "write").
> My problem is in verifying the class we're trying to redirect output
> to.
> This is what I have so far:
> def _VerifyOutputStream(fh):
> if 'write' not in dir(fh):
> raise AttributeError, "The Output Stream should have a write
> method."
> if not callable(fh.write):
> raise TypeError, "The Output Stream's write method is not
> callable."
> In the above _VerifyOutputStream function, how would I verify that the
> fh.write method requires only one argument, as the built-in file
> objects do?
Why not just call the function with an empty string?
def _VerifyOutputStream(fh):
fh.write('')
Note that you don't need to manually check for AttributeError or
TypeError. Python will do that for you. It's generally better to act
first and ask forgiveness later.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: __getattr__ question
Laszlo Nagy wrote:
> So how can I tell if 'root.item3' COULD BE FOUND IN THE USUAL PLACES, or
> if it is something that was calculated by __getattr__ ?
> Of course technically, this is possible and I could give a horrible
> method that tells this...
> But is there an easy, reliable and thread safe way in the Python
> language to give the answer?
Why are you trying to do this in the first place? If you need to
distinguish between a "real" attribute and something your code returns,
you shouldn't mix them by defining __getattr__ to begin with.
If, as I suspect, you just want an easy way of accessing child objects
by name, why not rename "__getattr__" in your code to something like
"get"?
Then instead of
>>> root.item3
Use
>>> root.get('item3')
Alternately, make self.items an instance of a custom class with
__getattr__ defined. This way, root's attribute space won't be
cluttered up.
>>> root.items.item3
Either way is a few more characters to type, but it's far saner than
trying to distinguish between "real" and "fake" attributes.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: random.jumpahead: How to jump ahead exactly N steps?
Matthew Wilson wrote: > The random.jumpahead documentation says this: > > Changed in version 2.3: Instead of jumping to a specific state, n steps > ahead, jumpahead(n) jumps to another state likely to be separated by > many steps.. This change was necessary because the random module got a new default generator in 2.3. The new generator uses the Mersenne Twister algorithm. Pre 2.3, Wichmann-Hill was used. (For more details, search for "jumpahead" in http://www.python.org/download/releases/2.3/NEWS.txt) Unlike WH, there isn't a way to directly compute the Nth number in the sequence using MT. If you're curious as to why, textbooks/journals/Google are your friends. :-) > I really want a way to get to the Nth value in a random series started > with a particular seed. Is there any way to quickly do what jumpahead > apparently used to do? You can always use the old WH generator. It's still available: >>> import random >>> wh = random.WichmannHill() >>> N, SEED = 100, 0 >>> wh.seed(SEED) >>> for i in range(N): dummy = wh.random() >>> wh.random() 0.68591619673484816 >>> wh.seed(SEED) >>> wh.jumpahead(N) >>> wh.random() 0.68591619673484816 > I devised this function, but I suspect it runs really slowly: Don't just suspect. Experiment, too. :-) > def trudgeforward(n): > '''Advance the random generator's state by n calls.''' > for _ in xrange(n): random.random() > > So any speed tips would be very appreciated. Python's random generator is implemented in C and is quite fast. In my tests, your trudgeforward performs acceptably with n<~10. "import psyco" usually worth a try when improving execution speed, but it won't help you here. All the real work is being done in C; the overhead of the Python interpreter is neglible. Hope that helps, --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: nested dictionary assignment goes too far
Jake Emerson wrote:
> However, when
> the process goes to insert the unique 'char_freq' into a nested
> dictionary the value gets put into ALL of the sub-keys
The way you're currently defining your dict:
rain_raw_dict =
dict.fromkeys(distinctID,{'N':-6999,'char_freq':-6999,...})
Is shorthand for:
tmp = {'N':-6999,'char_freq':-6999,...}
rain_raw_dict = {}
for key in distinctID:
rain_raw_dict[key] = tmp
Note that tmp is a *reference*. Python does not magically create
copies for you; you have to be explicit. Unless you want a shared
value, dict.fromkeys should only be used with an immutable value (e.g.,
int or str).
What you'll need to do is either:
tmp = {'N':-6999,'char_freq':-6999,...}
rain_raw_dict = {}
for key in distinctID:
# explicitly make a (shallow) copy of tmp
rain_raw_dict[key] = dict(tmp)
Or more simply:
rain_raw_dict = {}
for key in distinctID:
rain_raw_dict[key] = {'N':-6999,'char_freq':-6999,...}
Or if you're a one-liner kinda guy,
rain_raw_dict = dict((key, {'N':-6999,'char_freq':-6999,...})
for key in distinctID)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Variables in nested functions
[EMAIL PROTECTED] wrote: > Is it possible to change the value of a variable in the outer function > if you are in a nested inner function? The typical kludge is to wrap the variable in the outer function inside a mutable object, then pass it into the inner using a default argument: def outer(): a = "outer" def inner(wrapa=[a]): print wrapa[0] wrapa[0] = "inner" return inner A cleaner solution is to use a class, and make "a" an instance variable. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Automatic methods in new-style classes
[EMAIL PROTECTED] wrote: > Hey, I have the following code that has to send every command it > receives to a list of backends. > I would like to write each method like: > > flush = multimethod() Here's one way, using a metaclass: class multimethod(object): def transform(self, attr): def dispatch(self, *args, **kw): results = [] for b in self.backends: results.append(getattr(b, attr)(*args, **kw)) return results return dispatch def multimethodmeta(name, bases, dict): """Transform each multimethod object into an actual method""" for attr in dict: if isinstance(dict[attr], multimethod): dict[attr] = dict[attr].transform(attr) return type(name, bases, dict) class MultiBackend(object): __metaclass__ = multimethodmeta def __init__(self, backends): self.backends = backends add = multimethod() class Foo(object): def add(self, x, y): print 'in Foo.add' return x + y class Bar(object): def add(self, x, y): print 'in Bar.add' return str(x) + str(y) m = MultiBackend([Foo(), Bar()]) print m.add(3, 4) # Output: in Foo.add in Bar.add [7, '34'] --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: loop beats generator expr creating large dict!?
George Young wrote:
> I am puzzled that creating large dicts with an explicit iterable of
> key,value pairs seems to be slow. I thought to save time by doing:
>
>palettes = dict((w,set(w)) for w in words)
>
> instead of:
>
>palettes={}
>for w in words:
> palettes[w]=set(w)
>
> where words is a list of 20 english words. But, in fact,
> timeit shows the generator expression takes 3.0 seconds
> and the "for" loop 2.1 seconds. Am I missing something?
Creating those 200,000 (w, set(w)) intermediate tuples isn't free. You
aren't doing that in for loop version. If you were:
# Slowest of all!
palettes={}
for w,s in ((w,set(w)) for w in words):
palettes[w]=s
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: saving an exception
Bryan wrote:
> i would like to save an exception and reraise it at a later time.
>
> something similar to this:
>
> exception = None
> def foo():
> try:
> 1/0
> except Exception, e:
> exception = e
>
> if exception: raise exception
>
> with the above code, i'm able to successfully raise the exception, but the
> line number of the exception is at the place of the explicit raise instead
> of the where the exception originally occurred. is there anyway to fix
> this?
Sure: generate the stack trace when the real exception occurs. Check
out sys.exc_info() and the traceback module.
import sys
import traceback
exception = None
def foo():
global exception
try:
1/0
except Exception:
# Build a new exception of the same type with the inner stack
trace
exctype = sys.exc_info()[0]
exception = exctype('\nInner ' +
traceback.format_exc().strip())
foo()
if exception:
raise exception
# Output:
Traceback (most recent call last):
File "foo.py", line 15, in
raise exception
ZeroDivisionError:
Inner Traceback (most recent call last):
File "foo.py", line 8, in foo
1/0
ZeroDivisionError: integer division or modulo by zero
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: random playing soundfiles according to rating.
[EMAIL PROTECTED] wrote:
> But i am stuck on how to do a random chooser that works according to my
> idea of choosing according to rating system. It seems to me to be a bit
> different that just choosing a weighted choice like so:
...
> And i am not sure i want to have to go through what will be hundreds of
> sound files and scale their ratings by hand so that they all add up to
> 100%. I just want to have a long list that i can add too whenever i
> want, and assign it a grade/rating according to my whims!
Indeed, manually normalizing all those weights would be a downright
sinful waste of time and effort.
The solution (to any problem, really) starts with how you conceptualize
it. For this problem, consider the interval [0, T), where T is the sum
of all the weights. This interval is made up of adjacent subintervals,
one for each weight. Now pick a random point in [0, T). Determine
which subinterval this point is in, and you're done.
import random
def choose_weighted(zlist):
point = random.uniform(0, sum(weight for key, weight in zlist))
for key, weight in zlist: # which subinterval is point in?
point -= weight
if point < 0:
return key
return None # will only happen if sum of weights <= 0
You'll get bogus results if you use negative weights, but that should
be obvious. Also note that by using random.uniform instead of
random.randrange, floating point weights are handled correctly.
Test it:
>>> data = (('foo', 1), ('bar', 2), ('skipme', 0), ('baz', 10))
>>> counts = dict((key, 0) for key, weight in data)
>>> for i in range(1):
... counts[choose_weighted(data)] += 1
...
>>> [(key, counts[key]) for key, weight in data]
[('foo', 749), ('bar', 1513), ('skipme', 0), ('baz', 7738)]
>>>
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: random playing soundfiles according to rating.
kpp9c wrote: > I've been looking at some of the suggested approaches and looked a > little at Michael's bit which works well bisect is a module i > always struggle with (hee hee) > > I am intrigued by Ben's solution and Ben's distilled my problem quite > nicely Thanks!-) Actually, you should use Michael's solution, not mine. It uses the same concept, but it finds the correct subinterval in O(log n) steps (by using bisect on a cached list of cumulative sums). My code takes O(n) steps -- this is a big difference when you're dealing with thousands of items. > but, welli don't understand what "point" is doing with > wieght for key, weight for zlist This line: point = random.uniform(0, sum(weight for key, weight in zlist)) Is shorthand for: total = 0 for key, weight in zlist: total += weight point = random.uniform(0, total) > furthermore, it barfs in my > interpreter... (Python 2.3) Oops, that's because it uses generator expressions (http://www.python.org/peps/pep-0289.html), a 2.4 feature. Try rewriting it longhand (see above). The second line of the test code will have to be changed too, i.e.: >>> counts = dict([(key, 0) for key, weight in data]) --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Self-identifying functions and macro-ish behavior
[EMAIL PROTECTED] wrote: > How do I get some > sort of macro behavior so I don't have to write the same thing over and > over again, but which is also not neatly rolled up into a function, > such as combining the return statements with a printing of ? Decorators: http://www.python.org/peps/pep-0318.html > My application has a bunch of functions that must do different things, > then print out their names, and then each call another function before > returning. I'd like to have the last function call and the return in > one statement, because if I forget to manually type it in, things get > messed up. > > (ok, I'm writing a parser and I keep track of the call level with a tab > count, which gets printed before any text messages. So each text > message has a tab count in accordance with how far down the parser is. > Each time a grammar rule is entered or returned from, the tab count > goes up or down. If I mess up and forget to call tabsup() or tabsdn(), > the printing gets messed up. There are a lot of simple cheesy > production rules, [I'm doing this largely as an exercise for myself, > which is why I'm doing this parsing manually], so it's error-prone and > tedious to type tabsup() each time I enter a function, and tabsdn() > each time I return from a function, which may be from several different > flow branches.) def track(func): """Decorator to track calls to a set of functions""" def wrapper(*args, **kwargs): print " "*track.depth + func.__name__, args, kwargs or "" track.depth += 1 result = func(*args, **kwargs) track.depth -= 1 return result return wrapper track.depth = 0 # Then to apply the decorator to a function, e.g.: def f(x): return True # Add this line somewhere after the function definition: f = track(f) # Alternately, if you're using Python 2.4 or newer, just define f as: @track def f(x): return True # Test it: @track def fact(n): """Factorial of n, n! = n*(n-1)*(n-2)*...*3*2""" assert n >= 0 if n < 2: return 1 return n * fact(n-1) @track def comb(n, r): """Choose r items from n w/out repetition, n!/(r!*(n-r)!)""" assert n >= r return fact(n) / fact(r) / fact(n-r) print comb(5, 3) # Output: """ comb (5, 3) fact (5,) fact (4,) fact (3,) fact (2,) fact (1,) fact (3,) fact (2,) fact (1,) fact (2,) fact (1,) 10 """ --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: a little more help with python server-side scripting
John Salerno wrote:
> I contacted my domain host about how Python is implemented on their
> server, and got this response:
>
> ---
> Hello John,
>
> Please be informed that the implementation of python in our server is
> through mod_python integration with the apache.
>
> These are the steps needed for you to be able to run .py script directly
> from browser for your webpage:
>
> 1. Please use the below mentioned path for python:
> #!/usr/bin/env python
>
> Furthermore, update us with the script path, so that we can set the
> appropriate ownership and permissions of the script on the server.
>
> If you require any further assistance, feel free to contact us.
> ---
>
> Unfortunately, I don't completely understand what it is I need to do
> now. Where do I put the path they mentioned? And what do they mean by my
> script path?
The Python tutorial should fill in the blanks
(http://www.python.org/doc/tut/node4.html):
> 2.2.2 Executable Python Scripts
>
> On BSD'ish Unix systems, Python scripts can be made directly executable,
> like shell scripts, by putting the line
>
> #! /usr/bin/env python
>
> (assuming that the interpreter is on the user's PATH) at the beginning
> of the script and giving the file an executable mode. The "#!" must be
> the first two characters of the file. On some platforms, this first line
> must end with a Unix-style line ending ("\n"), not a Mac OS ("\r") or
> Windows ("\r\n") line ending. Note that the hash, or pound, character,
> "#", is used to start a comment in Python.
This answers your first question. Put the #! bit at the top of your
.py script. This way the web server will know how to run the script.
> The script can be given a executable mode, or permission, using the
> chmod command:
>
> $ chmod +x myscript.py
And this answers your second. Your host needs to know the path to your
script so they can use chmod to make it executable.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: except clause not catching IndexError
Derek Schuff wrote: > I have some code like this: > for line in f: > toks = line.split() > try: > if int(toks[2],16) == qaddrs[i]+0x1000 and toks[0] == > "200": #producer > write > prod = int(toks[3], 16) > elif int(toks[2],16) == qaddrs[i]+0x1002 and toks[0] == > "200": > #consumer write > cons = int(toks[3], 16) > else: > continue > except IndexError: #happens if theres a partial line at the > end of file > print "indexerror" > break > > However, when I run it, it seems that I'm not catching the IndexError: > Traceback (most recent call last): > File "/home/dschuff/bin/speeds.py", line 202, in ? > if int(toks[2],16) == qaddrs[i]+0x1000 and toks[0] == "200": #producer > write > IndexError: list index out of range > > If i change the except IndexError to except Exception, it will catch it (but > i believe it's still an IndexError). > this is python 2.3 on Debian sarge. > > any ideas? Sounds like IndexError has been redefined somewhere, e.g.: IndexError = 'something entirely different' foo = [] try: foo[42] except IndexError: # will not catch the real IndexError; we're shadowing it pass Try adding "print IndexError" right before your trouble spot, and see if it outputs "exceptions.IndexError". --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: str.count is slow
[EMAIL PROTECTED] wrote:
> It seems to me that str.count is awfully slow. Is there some reason
> for this?
> Evidence:
>
> str.count time test
> import string
> import time
> import array
>
> s = string.printable * int(1e5) # 10**7 character string
> a = array.array('c', s)
> u = unicode(s)
> RIGHT_ANSWER = s.count('a')
>
> def main():
> print 'str:', time_call(s.count, 'a')
> print 'array: ', time_call(a.count, 'a')
> print 'unicode:', time_call(u.count, 'a')
>
> def time_call(f, *a):
> start = time.clock()
> assert RIGHT_ANSWER == f(*a)
> return time.clock()-start
>
> if __name__ == '__main__':
> main()
>
> ## end
>
> On my machine, the output is:
>
> str: 0.29365715475
> array: 0.448095498171
> unicode: 0.0243757237303
>
> If a unicode object can count characters so fast, why should an str
> object be ten times slower? Just curious, really - it's still fast
> enough for me (so far).
>
> This is with Python 2.4.1 on WinXP.
>
>
> Chris Perkins
Your evidence points to some unoptimized code in the underlying C
implementation of Python. As such, this should probably go to the
python-dev list (http://mail.python.org/mailman/listinfo/python-dev).
The problem is that the C library function memcmp is slow, and
str.count calls it frequently. See lines 2165+ in stringobject.c
(inside function string_count):
r = 0;
while (i < m) {
if (!memcmp(s+i, sub, n)) {
r++;
i += n;
} else {
i++;
}
}
This could be optimized as:
r = 0;
while (i < m) {
if (s[i] == *sub && !memcmp(s+i, sub, n)) {
r++;
i += n;
} else {
i++;
}
}
This tactic typically avoids most (sometimes all) of the calls to
memcmp. Other string search functions, including unicode.count,
unicode.index, and str.index, use this tactic, which is why you see
unicode.count performing better than str.count.
The above might be optimized further for cases such as yours, where a
single character appears many times in the string:
r = 0;
if (n == 1) {
/* optimize for a single character */
while (i < m) {
if (s[i] == *sub)
r++;
i++;
}
} else {
while (i < m) {
if (s[i] == *sub && !memcmp(s+i, sub, n)) {
r++;
i += n;
} else {
i++;
}
}
}
Note that there might be some subtle reason why neither of these
optimizations are done that I'm unaware of... in which case a comment
in the C source would help. :-)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: changing params in while loop
robin wrote:
> i have this function inside a while-loop, which i'd like to loop
> forever, but i'm not sure about how to change the parameters of my
> function once it is running.
> what is the best way to do that? do i have to use threading or is there
> some simpler way?
Why not just do this inside the function? What exactly are you trying
to accomplish here? Threading could work here, but like regexes,
threads are not only tricky to get right but also tricky to know when
to use in the first place.
That being said, here's some example threading code to get you started
(PrinterThread.run is your function; the ThreadSafeStorage instance
holds your parameters):
import threading
import time
class ThreadSafeStorage(object):
def __init__(self):
object.__setattr__(self, '_lock', threading.RLock())
def acquirelock(self):
object.__getattribute__(self, '_lock').acquire()
def releaselock(self):
object.__getattribute__(self, '_lock').release()
def __getattribute__(self, attr):
if attr in ('acquirelock', 'releaselock'):
return object.__getattribute__(self, attr)
self.acquirelock()
value = object.__getattribute__(self, attr)
self.releaselock()
return value
def __setattr__(self, attr, value):
self.acquirelock()
object.__setattr__(self, attr, value)
self.releaselock()
class PrinterThread(threading.Thread):
"""Prints the data in shared storage once per second."""
storage = None
def run(self):
while not self.storage.killprinter:
self.storage.acquirelock()
print 'message:', self.storage.message
print 'ticks:', self.storage.ticks
self.storage.ticks += 1
self.storage.releaselock()
time.sleep(1)
data = ThreadSafeStorage()
data.killprinter = False
data.message = 'hello world'
data.ticks = 0
thread = PrinterThread()
thread.storage = data
thread.start()
# do some stuff in the main thread
time.sleep(3)
data.acquirelock()
data.message = 'modified ticks'
data.ticks = 100
data.releaselock()
time.sleep(3)
data.message = 'goodbye world'
time.sleep(1)
# notify printer thread that it needs to die
data.killprinter = True
thread.join()
# output:
"""
message: hello world
ticks: 0
message: hello world
ticks: 1
message: hello world
ticks: 2
message: modified ticks
ticks: 100
message: modified ticks
ticks: 101
message: modified ticks
ticks: 102
message: goodbye world
ticks: 103
"""
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Removing .DS_Store files from mac folders
David Pratt wrote: > # Clean mac .DS_Store > if current_file == '.DS_Store': > print 'a DS_Store item encountered' > os.remove(f) ... > I can't figure why > remove is not removing. It looks like your indentation is off. From what you posted, the "print" line is prepended with 9 spaces, while the "os.remove" line is prepended with a single tab. Don't mix tabs and spaces. Also, shouldn't that be "os.remove(current_file)"? --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing .DS_Store files from mac folders
David Pratt wrote: > Hi Ben. Sorry about the cut and paste job into my email. It is part of a > larger script. It is actually all tabbed. This will give you a better idea: > > for f in file_names: > current_file = os.path.basename(f) > print 'Current File: %s' % current_file > > # Clean mac .DS_Store > if current_file == '.DS_Store': > print 'a DS_Store item encountered' > os.remove(f) I'm no Mac expert, but could it be that OSX is recreating .DS_Store? Try putting this above your os.remove call: import os.stat print 'Last modified:', os.stat(f)[ST_MTIME] Then run your script a few times and see if the modified times are different. You might also try verifying that you get an exception when attempting to open the file right after removing it. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing .DS_Store files from mac folders
David Pratt wrote:
> OSError: [Errno 2] No such file or directory: '.DS_Store'
Ah. You didn't mention a traceback earlier, so I assumed the code was
executing but you didn't see the file being removed.
> >>for f in file_names:
> >>current_file = os.path.basename(f)
> >>print 'Current File: %s' % current_file
> >>
> >># Clean mac .DS_Store
> >>if current_file == '.DS_Store':
> >>print 'a DS_Store item encountered'
> >>os.remove(f)
How are you creating file_names? More importantly, does it contain a
path (either absolute or relative to the current working directory)?
If not, you need an os.path.join, e.g.:
import os
for root_path, dir_names, file_names in os.walk('.'):
# file_names as generated by os.walk contains file
# names only (no path)
for f in file_names:
if f == '.DS_Store':
full_path = os.path.join(root_path, f)
os.remove(full_path)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: string stripping issues
orangeDinosaur wrote:
> I am encountering a behavior I can think of reason for. Sometimes,
> when I use the .strip module for strings, it takes away more than what
> I've specified. For example:
>
> >>> a = 'Hughes. John\r\n'
>
> >>> a.strip('')
>
> returns:
>
> 'ughes. John\r\n'
>
> However, if I take another string, for example:
>
> >>> b = 'Kim, Dong-Hyun\r\n'
>
> >>> b.strip('')
>
> returns:
>
> 'Kim, Dong-Hyun\r\n'
>
> I don't understand why in one case it eats up the 'H' but in the next
> case it leaves the 'K' alone.
That method... I do not think it means what you think it means. The
argument to str.strip is a *set* of characters, e.g.:
>>> foo = 'abababaXabbaXababa'
>>> foo.strip('ab')
'XabbaX'
>>> foo.strip('aabababaab') # no difference!
'XabbaX'
For more info, see the string method docs:
http://docs.python.org/lib/string-methods.html
To do what you're trying to do, try this:
>>> prefix = 'hello '
>>> bar = 'hello world!'
>>> if bar.startswith(prefix): bar = bar[:len(prefix)]
...
>>> bar
'world!'
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: string stripping issues
Ben Cartwright wrote:
> orangeDinosaur wrote:
> > I am encountering a behavior I can think of reason for. Sometimes,
> > when I use the .strip module for strings, it takes away more than what
> > I've specified. For example:
> >
> > >>> a = 'Hughes. John\r\n'
> >
> > >>> a.strip('')
> >
> > returns:
> >
> > 'ughes. John\r\n'
> >
> > However, if I take another string, for example:
> >
> > >>> b = 'Kim, Dong-Hyun\r\n'
> >
> > >>> b.strip('')
> >
> > returns:
> >
> > 'Kim, Dong-Hyun\r\n'
> >
> > I don't understand why in one case it eats up the 'H' but in the next
> > case it leaves the 'K' alone.
>
>
> That method... I do not think it means what you think it means. The
> argument to str.strip is a *set* of characters, e.g.:
>
> >>> foo = 'abababaXabbaXababa'
> >>> foo.strip('ab')
> 'XabbaX'
> >>> foo.strip('aabababaab') # no difference!
> 'XabbaX'
>
> For more info, see the string method docs:
> http://docs.python.org/lib/string-methods.html
> To do what you're trying to do, try this:
>
> >>> prefix = 'hello '
> >>> bar = 'hello world!'
> >>> if bar.startswith(prefix): bar = bar[:len(prefix)]
> ...
> >>> bar
> 'world!'
Apologies, that should be:
>>> prefix = 'hello '
>>> bar = 'hello world!'
>>> if bar.startswith(prefix): bar = bar[len(prefix):]
...
>>> bar
'world!'
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: in need of some sorting help
ianaré wrote:
> However, i need the sorting done after the walk, due to the way the
> application works... should have specified that, sorry.
If your desired output is just a sorted list of files, there is no good
reason that you shouldn't be able sort in place. Unless your app is
doing something extremely funky, in which case this should do it:
root = self.path.GetValue() # wx.TextCtrl input
filter = self.fileType.GetValue().lower() # wx.TextCtrl input
not_type = self.not_type.GetValue() # wx.CheckBox input
matched_paths = {}
for base, dirs, walk_files in os.walk(root):
main.Update()
# i only need the part of the filename after the
# user selected path:
base = base.replace(root, '')
matched_paths[base] = []
for entry in walk_files:
entry = os.path.join(base, entry)
if not filter:
match = True
else:
match = filter in entry.lower()
if not_type:
match = not match
if match:
matched_paths[base].append(entry)
def tolower(x): return x.lower()
files = []
# Combine into flat list, first sorting on base path, then full
path
for base in sorted(matched_paths, key=tolower):
files.extend(sorted(matched_paths[base], key=tolower))
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: slicing the end of a string in a list
John Salerno wrote:
> You can probably tell what I'm doing. Read a list of lines from a file,
> and then I want to slice off the '\n' character from each line. But
> after this code runs, the \n is still there. I thought it might have
> something to do with the fact that strings are immutable, but a test
> such as:
>
> switches[0][:-1]
>
> does slice off the \n character.
Actually, it creates a new string instance with the \n character
removed, then discards it. The original switches[0] string hasn't
changed.
>>> foo = 'Hello world!'
>>> foo[:-1]
'Hello world'
>>> foo
'Hello world!'
> So I guess the problem lies in the
> assignment or somewhere in there.
Yes. You are repeated assigning a new string instance to "line", which
is then never referenced again. If you want to update the switches
list, then instead of assigning to "line" inside the loop, you need:
switches[i] = switches[i][:-1]
> Also, is this the best way to index the list?
No, since the line variable is unused. This:
i = 0
for line in switches:
line = switches[i][:-1]
i += 1
Would be better written as:
for i in range(len(switches)):
switches[i] = switches[i][:-1]
For most looping scenarios in Python, you shouldn't have to manually
increment a counter variable.
--Ben
PS - actually, you can accomplish all of the above in a single line of
code:
print [line[:-1] for line in open('C:\\switches.txt')]
--
http://mail.python.org/mailman/listinfo/python-list
Re: A simple question
Tuvas wrote: > Why is the output list [[0, 1], [0, 1]] and not [[0, > 1], [0, 0]]? And how can I make it work right? http://www.python.org/doc/faq/programming.html#how-do-i-create-a-multidimensional-list --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Separating elements from a list according to preceding element
Rob Cowie wrote: > I wish to derive two lists - each containing either tags to be > included, or tags to be excluded. My idea was to take an element, > examine what element precedes it and accordingly, insert it into the > relevant list. However, I have not been successful. > > Is there a better way that I have not considered? Maybe. You could write a couple regexes, one to find the included tags, and one for the excluded, then run re.findall on them both. But there's nothing fundamentally wrong with your method. > If this method is > suitable, how might I implement it? tags = ['tag1', '+', 'tag2', '+', 'tag3', '-', 'tag4'] include, exclude = [], [] op = '+' for cur in tags: if cur in '+-': op = cur else: if op == '+': include.append(cur) else: exclude.append(cur) --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading stdout and stderr of an external program
> I need to be able to read the stdout and stderr streams of an external
> program that I launch from my python script. os.system( 'my_prog' +
> '>& err.log' ) and was planning on monitoring err.log and to display
> its contents. Is this the best way to do this?
from subprocess import Popen
stdout, stderr = Popen('my_prog').communicate()
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: mysteries of urllib/urllib2
On Jul 3, 9:43 am, Adrian Smith <[EMAIL PROTECTED]> wrote:
> The following (pinched
> from Dive Into Python) seems to work perfectly in Idle, but falls at
> the final hurdle when run as a cgi script - can anyone suggest
> anything I may have overlooked?
>
> request = urllib2.Request(some_URL)
> request.add_header('User-Agent', 'some_plausible_string')
> opener = urllib2.build_opener()
> data = opener.open(request).read()
Most likely the account that cgi script is running as does not have
permissions to access the net. Check the traceback to be sure. Put
this at the top of your cgi script:
import cgitb; cgitb.enable()
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: mysteries of urllib/urllib2
On Jul 3, 11:14 am, Adrian Smith <[EMAIL PROTECTED]> wrote:
> > > The following (pinched
> > > from Dive Into Python) seems to work perfectly in Idle, but
> > > falls at the final hurdle when run as a cgi script
> > Put this at the top of your cgi script:
>
> > import cgitb; cgitb.enable()
Did you even try this? Asking for Python help without posting the
traceback is like phoning your mechanic and saying, "My car is making
a generic rattling noise, can you tell me what the problem is without
looking under the hood?"
> Apparently there's a way to change the user-agent string
> by subclassing urllib's URLopener class, but that's beyond my comfort
> zone at present.
Untested:
import urllib
url = 'http://groups.google.com/group/Google-AJAX-Search-API/
browse_thread/thread/a0eb87ad13b11762'
opener = urllib.FancyURLopener()
opener.addheaders = [('User-Agent', 'Fauxzilla 4.0')]
data = opener.open(url).read()
Hope that helps,
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Confused by Python and nested scoping (2.4.3)
Sean Givan wrote: > def outer(): > val = 10 > def inner(): > print val > val = 20 > inner() > print val > > outer() > > ..I expected to print '10', then '20', but instead got an error: > >print val > UnboundLocalError: local variable 'val' referenced before assignment. > > I'm thinking this is some bug where the interpreter is getting ahead of > itself, spotting the 'val = 20' line and warning me about something that > doesn't need warning. Or am I doing something wrong? Short answer: No, it's not a Python bug. If inner() must modify variables defined in outer()'s scope, you'll need to use a containing object. E.g.: class Storage(object): pass def outer(): data = Storage() data.val = 10 def inner(): print data.val data.val = 20 inner() print data.val Long answer: The interpreter (actually, the bytecode compiler) is indeed looking ahead. This is by design, and is why the "global" keyword exists. See http://www.python.org/doc/faq/programming/#what-are-the-rules-for-local-and-global-variables-in-python Things get more complex than that when nested function scopes are involved. But again, the behavior you observed is a design decision, not a bug. By BDFL declaration, there is no "parentscope" keyword analogous to "global". See PEP 227, specifically the "Rebinding names in enclosing scopes" section: http://www.python.org/dev/peps/pep-0227/ Hope that helps, --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Passing data attributes as method parameters
Panos Laganakos wrote: > I'd like to know how its possible to pass a data attribute as a method > parameter. > > Something in the form of: > > class MyClass: > def __init__(self): > self.a = 10 > self.b = '20' > > def my_method(self, param1=self.a, param2=self.b): > pass > > Seems to produce a NameError of 'self' not being defined. Default arguments are statically bound, so you'll need to do something like this: class MyClass: def __init__(self): self.a = 10 self.b = '20' def my_method(self, param1=None, param2=None): if param1 is None: param1 = self.a if param2 is None: param2 = self.b --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple hierarchie and method overloading
Philippe Martin wrote: > I have something like this: > > Class A: > def A_Func(self, p_param): > . > Class B: > def A_Func(self): > . > > Class C (A,B): > A.__init__(self) > B.__init__(self) > > . > > self.A_Func() #HERE I GET AN EXCEPTION "... takes at least 2 > arguments (1 > given). > > > I renamed A_Func(self) to fix that ... but is there a cleaner way around ? When using multiple inheritence, the order of the base classes matters! E.g.: class A(object): def f(self): print 'in A.f()' class B(object): def f(self): print 'in B.f()' class X(A, B): pass class Y(B, A): pass >>> x = X() >>> x.f() in A.f() >>> y = Y() >>> y.f() in B.f() If you want to call B.f() instead of A.f() for an X instance, you can either rename B.f() like you've done, or do this: >>> B.f(x) in B.f() --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Splice two lists
[EMAIL PROTECTED] wrote: > Is there a good way to splice two lists together without resorting to a > manual loop? Say I had 2 lists: > > l1 = [a,b,c] > l2 = [1,2,3] > > And I want a list: > > [a,1,b,2,c,3] as the result. Our good friend itertools can help us out here: >>> from itertools import chain, izip >>> x = ['a', 'b', 'c'] >>> y = [1, 2, 3] >>> list(chain(*izip(x, y))) ['a', 1, 'b', 2, 'c', 3] >>> # You can splice more than two iterables at once too: >>> z = ['x', 'y', 'z'] >>> list(chain(*izip(x, y, z))) ['a', 1, 'x', 'b', 2, 'y', 'c', 3, 'z'] >>> # Cleaner to define it as a function: >>> def splice(*its): return list(chain(*izip(*its))) >>> splice(x, y) ['a', 1, 'b', 2, 'c', 3] >>> splice(x, y, z) ['a', 1, 'x', 'b', 2, 'y', 'c', 3, 'z'] --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Splice two lists
[EMAIL PROTECTED] wrote: > Thanks, this worked great. Welcome. :-) > Can you explain the syntax of the '*' on the > return value of izip? I've only ever seen this syntax with respect to > variable number of args. When used in a function call (as opposed to a function definition), * is the "unpacking" operator. Basically, it "flattens" an iterable into arguments. The docs mention it... http://www.python.org/doc/2.4.2/tut/node6.html#SECTION00674 http://www.python.org/doc/faq/programming/#how-can-i-pass-optional-or-keyword-parameters-from-one-function-to-another ...but not in great detail. You can apply * to an arbitrary expression, e.g.: >>> def f3(a, b, c): pass >>> f3(1, 2, 3) >>> f3(*range(3)) >>> f3(*[1, 2, 3]) --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: i don't understand this RE example from the documentation
John Salerno wrote: > John Salerno wrote: > > Ok, I've been staring at this and figuring it out for a while. I'm close > > to getting it, but I'm confused by the examples: > > > > (?(id/name)yes-pattern|no-pattern) > > Will try to match with yes-pattern if the group with given id or name > > exists, and with no-pattern if it doesn't. |no-pattern is optional and > > can be omitted. > > > > For example, (<)?([EMAIL PROTECTED](?:\.\w+)+)(?(1)>) is a poor email > > matching > > pattern, which will match with '<[EMAIL PROTECTED]>' as well as > > '[EMAIL PROTECTED]', but not with '<[EMAIL PROTECTED]'. New in version 2.4. > > > > group(1) is the email address pattern, right? So why does the above RE > > match '[EMAIL PROTECTED]'. If the email address exists, does the last part > > of the RE: (?(1)>) mean that it has to end with a '>'? > > I think I got it. The group(1) is referring to the opening '<', not the > email address. I had seen an earlier example that used group(0), so I > thought maybe the groups were 0-based. The groups *are* 0-based. The 0th group is the whole match, e.g.: >>> import re >>> m = re.match(r'a(b+)', 'a') >>> m.group(0) 'a' >>> m.group(1) '' And for the pattern you were looking at: >>> m = re.match(r'(<)?([EMAIL PROTECTED](?:\.\w+)+)(?(1)>)', '<[EMAIL PROTECTED]>') >>> m.group(0) '<[EMAIL PROTECTED]>' >>> m.group(1) '<' >>> m.group(2) '[EMAIL PROTECTED]' --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: reusing parts of a string in RE matches?
John Salerno wrote: > So my question is, how can find all occurrences of a pattern in a > string, including overlapping matches? I figure it has something to do > with look-ahead and look-behind, but I've only gotten this far: > > import re > string = 'abababababababab' > pattern = re.compile(r'ab(?=a)') > m = pattern.findall(string) > > This matches all the 'ab' followed by an 'a', but it doesn't include the > 'a'. What I'd like to do is find all the 'aba' matches. A regular > findall() gives four results, but really there are seven. > > Is there a way to do this with just an RE pattern, or would I have to > manually add the 'a' to the end of the matches? Yes, and no extra for loops are needed! You can define groups inside the lookahead assertion: >>> import re >>> re.findall(r'(?=(aba))', 'abababababababab') ['aba', 'aba', 'aba', 'aba', 'aba', 'aba', 'aba'] --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: reusing parts of a string in RE matches?
Murali wrote:
> > Yes, and no extra for loops are needed! You can define groups inside
> > the lookahead assertion:
> >
> > >>> import re
> > >>> re.findall(r'(?=(aba))', 'abababababababab')
> > ['aba', 'aba', 'aba', 'aba', 'aba', 'aba', 'aba']
>
> Wonderful and this works with any regexp, so
>
> import re
>
> def all_occurences(pat,str):
> return re.findall(r'(?=(%s))'%pat,str)
>
> all_occurences("a.a","abacadabcda") returns ["aba","aca","ada"] as
> required.
Careful. That won't work as expected for *all* regexps. Example:
>>> import re
>>> re.findall(r'(?=(a.*a))', 'abaca')
['abaca', 'aca']
Note that this does *not* find 'aba'. You might think that making it
non-greedy might help, but:
>>> re.findall(r'(?=(a.*?a))', 'abaca')
['aba', 'aca']
Nope, now it's not finding 'abaca'.
This is by design, though. From
http://www.regular-expressions.info/lookaround.html (a good read, by
the way):
"""As soon as the lookaround condition is satisfied, the regex engine
forgets about everything inside the lookaround. It will not backtrack
inside the lookaround to try different permutations."""
Moral of the story: keep lookahead assertions simple whenever
possible. :-)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: File attributes
[EMAIL PROTECTED] wrote:
> I know how to "walk" a folder/directory using Python, but I'd like to
> check the archive bit for each file. Can anyone make suggestions on
> how I might do this? Thanks.
Since the archive bit is Windows-specific, your first place to check is
Mark Hammond's Python for Windows Extensions (aka win32all). It's a
quick and painless install; grab it here:
http://python.net/crew/skippy/win32/
Once you have that installed, look in the PyWin32.chm help file for the
function calls you need. If the documentation is too sparse, check
MSDN or google it.
For what you're trying to do:
import win32file
import win32con
def togglefileattribute(filename, fileattribute, value):
"""Turn a specific file attribute on or off, leaving the other
attributes intact.
"""
bitvector = win32file.GetFileAttributes(filename)
if value:
bitvector |= fileattribute
else:
bitvector &= ~fileattribute
win32file.SetFileAttributes(filename, bitvector)
# Sample usage:
togglefileattribute('foo.txt', win32con.FILE_ATTRIBUTE_ARCHIVE, True)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: File attributes
Ben Cartwright wrote:
> [EMAIL PROTECTED] wrote:
> > I know how to "walk" a folder/directory using Python, but I'd like to
> > check the archive bit for each file. Can anyone make suggestions on
> > how I might do this? Thanks.
>
>
> Since the archive bit is Windows-specific, your first place to check is
> Mark Hammond's Python for Windows Extensions (aka win32all). It's a
> quick and painless install; grab it here:
> http://python.net/crew/skippy/win32/
>
> Once you have that installed, look in the PyWin32.chm help file for the
> function calls you need. If the documentation is too sparse, check
> MSDN or google it.
>
> For what you're trying to do:
>
> import win32file
> import win32con
>
> def togglefileattribute(filename, fileattribute, value):
> """Turn a specific file attribute on or off, leaving the other
> attributes intact.
> """
> bitvector = win32file.GetFileAttributes(filename)
> if value:
> bitvector |= fileattribute
> else:
> bitvector &= ~fileattribute
> win32file.SetFileAttributes(filename, bitvector)
>
> # Sample usage:
> togglefileattribute('foo.txt', win32con.FILE_ATTRIBUTE_ARCHIVE, True)
Or to just check the value of the bit:
def fileattributeisset(filename, fileattr):
return bool(win32file.GetFileAttributes(filename) & fileattr)
print fileattributeisset('foo.txt', win32con.FILE_ATTRIBUTE_ARCHIVE)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Bind an instance of a base to a subclass - can this be done?
Lou Pecora wrote:
> I want to subclass a base class that is returned from a Standard Library
> function (particularly, subclass file which is returned from open). I
> would add some extra functionality and keep the base functions, too.
> But I am stuck.
>
> E.g.
>
> class myfile(file):
>def myreadline():
> #code here to return something read from file
>
> Then do something like (I know this isn't right, I'm just trying to
> convey the idea of what I would like)
>
> mf=myfile()
>
> mf=open("Afile","r")
>
> s=mf.myreadline() # Use my added function
>
> mf.close()# Use the original file function
>
>
> Possible in some way? Thanks in advance for any clues.
This:
>>> mf=myfile()
>>> mf=open("Afile","r")
Is actually creating an instance of myfile, then throwing it away,
replacing it with an instance of file. There are no variable type
declarations in Python.
To accomplish what you want, simply instantiate the subclass:
>>> mf=myfile("Afile","r")
You don't need to do anything tricky, like binding the instance of the
base class to a subclass. Python does actually support that, e.g.:
>>> class Base(object):
def f(self):
return 'base'
>>> class Subclass(Base):
def f(self):
return 'subclass'
>>> b = Base()
>>> b.__class__
>>> b.f()
'base'
>>> b.__class__ = Subclass
>>> b.__class__
>>> b.f()
'subclass'
But the above won't work for the built-in file type:
>>> f = file('foo')
>>> f.__class__
>>> f.__class__ = Subclass
TypeError: __class__ assignment: only for heap types
Again though, just instantiate the subclass. Much cleaner.
Or if that's not an option due to the way your module will be used,
just define your custom file methods as global functions that take a
file instance as a parameter. Python doesn't force you to use OOP for
everything.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: list comprehensions put non-names into namespaces!
[EMAIL PROTECTED] wrote: > Lonnie> List comprehensions appear to store their temporary result in a > Lonnie> variable named "_[1]" (or presumably "_[2]", "_[3]" etc for > Lonnie> nested comprehensions) > > Known issue. Fixed in generator comprehensions. Dunno about plans to fix > it in list comprehensions. I believe at some point in the future they may > just go away or become syntactic sugar for a gen comp wrapped in a list() > call. The latter, starting in Python 3.0. It won't be fixed before Python 3.0 because it has the potential to break existing 2.x code. From PEP 289: """List comprehensions also "leak" their loop variable into the surrounding scope. This will also change in Python 3.0, so that the semantic definition of a list comprehension in Python 3.0 will be equivalent to list(). Python 2.4 and beyond should issue a deprecation warning if a list comprehension's loop variable has the same name as a variable used in the immediately surrounding scope.""" Source: http://www.python.org/dev/peps/pep-0289/ Also mentioned in PEP 3100. Doesn't look like the deprecation warning was ever implemented for 2.4, though. On my 2.4.3: >>> def f(): [x for x in range(10)] print x >>> f() 9 >>> # no warning yet.. 2.5 is in alpha now, hopefully the warning will be added. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed up this code?
[EMAIL PROTECTED] wrote:
> I'm creating a program to calculate all primes numbers in a range of 0
> to n, where n is whatever the user wants it to be. I've worked out the
> algorithm and it works perfectly and is pretty fast, but the one thing
> seriously slowing down the program is the following code:
>
> def rmlist(original, deletions):
>return [i for i in original if i not in deletions]
>
> original will be a list of odd numbers and deletions will be numbers
> that are not prime, thus this code will return all items in original
> that are not in deletions. For n > 100,000 or so, the program takes a
> very long time to run, whereas it's fine for numbers up to 10,000.
>
> Does anybody know a faster way to do this? (finding the difference all
> items in list a that are not in list b)?
The "in" operator is expensive for lists because Python has to check,
on average, half the items in the list. Use a better data structure...
in this case, a set will do nicely. See the docs:
http://docs.python.org/lib/types-set.html
http://docs.python.org/tut/node7.html#SECTION00740
Oh, and you didn't ask for it, but I'm sure you're going to get a dozen
pet implementations of prime generators from other c.l.py'ers. So
here's mine. :-)
def primes():
"""Generate prime numbers using the sieve of Eratosthenes."""
yield 2
marks = {}
cur = 3
while True:
skip = marks.pop(cur, None)
if skip is None:
# unmarked number must be prime
yield cur
# mark ahead
marks[cur*cur] = 2*cur
else:
n = cur + skip
while n in marks:
# x already marked as multiple of another prime
n += skip
# first unmarked multiple of this prime
marks[n] = skip
cur += 2
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: __getattr__ and functions that don't exist
Erik Johnson wrote:
> Thanks for your reply, Nick. My first thought was "Ahhh, now I see. That's
> slick!", but after playing with this a bit...
>
> >>> class Foo:
> ... def __getattr__(self, attr):
> ... def intercepted(*args):
> ... print "%s%s" % (attr, args)
> ... return intercepted
> ...
> >>> f = Foo()
> >>> f
> __repr__()
> Traceback (most recent call last):
> File "", line 1, in ?
> TypeError: __repr__ returned non-string (type NoneType)
>
>
> my thought is "Oh... that is some nasty voodoo there!" Especially
> if one wants to also preserve the basic functionality of __getattr__ so that
> it still works to just get an attribute where no arguments were given.
>
> I was thinking it would be clean to maintain an interface where you
> could call things like f.set_Spam('ham') and implement that as self.Spam =
> 'ham' without actually having to define all the set_XXX methods for all the
> different things I would want to set on my object (as opposed to just making
> an attribute assignment), but I am starting to think that is probably an
> idea I should just simply abandon.
Well, you could tweak __getattr__ as follows:
>>> class Foo:
... def __getattr__(self, attr):
... if attr.startswith('__'):
... raise AttributeError
... def intercepted(*args):
... print "%s%s" % (attr, args)
... return intercepted
But abandoning the whole idea is probably a good idea. How is defining
a magic set_XXX method cleaner than just setting the attribute? Python
is not C++/Java/C#. Accessors and mutators for simple attributes are
overkill. Keep it simple, you'll thank yourself for it later when
maintaining your code. :-)
> I guess I don't quite follow the error above though. Can you explain
> exactly what happens with just the evaluation of f?
Sure. (Note, this is greatly simplified, but still somewhat complex.)
The Python interpreter does the following when you type in an
expression:
(1) evaluate the expression, store the result in temporary object
(2) attempt to access the object's __repr__ method
(3) if step 2 didn't raise an AttributeError, call the method, output
the result, and we're done
(4) if __getattr__ is defined for the object, call it with "__repr__"
as the argument
(5) if step 4 didn't raise an AttributeError, call the method, output
the result, and we're done
(6) repeat steps 2 through 5 for __str__
(7) as a last resort, output the default "" string
In your case, the intepreter hit step 4. f.__getattr__("__repr__")
returned the "intercepted" function, which was then called. However,
the "interpreted" function returned None. The interpreter was
expecting a string from __repr__, so it raised a TypeError.
Clear as mud, right? Cutting out the __getattr__ trickery, here's a
simplified scenario (gets to step 3 from above):
>>> class Bar(object):
... def __repr__(self):
... return None
...
>>> b = Bar()
>>> b
Traceback (most recent call last):
File "", line 1, in ?
TypeError: __repr__ returned non-string (type NoneType)
Hope that helps! One other small thing... please avoid top posting.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: genexp surprise (wart?)
Paul Rubin wrote: > I tried to code the Sieve of Erastosthenes with generators: > > def sieve_all(n = 100): > # yield all primes up to n > stream = iter(xrange(2, n)) > while True: > p = stream.next() > yield p > # filter out all multiples of p from stream > stream = (q for q in stream if q%p != 0) > > # print primes up to 100 > print list(sieve_all(100)) > > but it didn't work. I had to replace > > stream = (q for q in stream if q%p != 0) > > with > > def s1(p): > return (q for q in stream if q%p != 0) > stream = s1(p) > > or alternatively > > stream = (lambda p,stream: \ > (q for q in stream if q%p != 0)) (p, stream) You do realize that you're creating a new level of generator nesting with each iteration of the while loop, right? You will quickly hit the maximum recursion limit. Try generating the first 1000 primes. > I had thought that genexps worked like that automatically, i.e. the > stuff inside the genexp was in its own scope. If it's not real > obvious what's happening instead, that's a sign that the current > behavior is a wart. (The problem is that p in my first genexp comes > from the outer scope, and changes as the sieve iterates through the > stream) I don't see how it's a wart. p is accessed (i.e., not set) by the genexp. Consistent with the function scoping rules in... http://www.python.org/doc/faq/programming/#what-are-the-rules-for-local-and-global-variables-in-python ...Python treats p in the genexp as a non-local variable. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: advice on this little script
BartlebyScrivener wrote:
> What about a console beep? How do you add that?
>
> rpd
Just use ASCII code 007 (BEL/BEEP):
>>> import sys
>>> sys.stdout.write('\007')
Or if you're on Windows, use the winsound standard module.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: counting number of (overlapping) occurances
John wrote: > This works but is a bit slow, I guess I'll have to live with it. > Any chance this could be sped up in python? Sure, to a point. Instead of: def countoverlap(s1, s2): return len([1 for i in xrange(len(s1)) if s1[i:].startswith(s2)]) Try this version, which takes smaller slices (resulting in 2x-5x speed increase when dealing with a large s1 and a small s2): def countoverlap(s1, s2): L = len(s2) return len([1 for i in xrange(len(s1)-L+1) if s1[i:i+L] == s2]) And for a minor extra boost, this version eliminates the list comprehension: def countoverlap(s1, s2): L = len(s2) cnt = 0 for i in xrange(len(s1)-L+1): if s1[i:i+L] == s2: cnt += 1 return cnt Finally, if the execution speed of this function is vital to your application, create a C extension. String functions like this one are generally excellent candidates for extensionizing. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: How to pop random item from a list?
flamesrock wrote: > whats the best way to pop a random item from a list?? import random def popchoice(seq): # raises IndexError if seq is empty return seq.pop(random.randrange(len(seq))) --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: global namescape of multilevel hierarchy
Sakcee wrote: > now in package.module.checkID function, i wnat to know what is the ID > defiend in the calling scriipt It's almost always a really bad idea to kludge scopes like this. If you need to access a variable from the caller's scope in a module function, make it an argument to that function. That's what arguments are for in the first place! But, if you must, you can use the inspect module: import inspect def checkID(): ID = inspect.currentframe().f_back.f_locals['ID'] print ID --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Printable string for 'self'
Don Taylor wrote:
> Is there a way to discover the original string form of the instance that
> is represented by self in a method?
>
> For example, if I have:
>
> fred = C()
> fred.meth(27)
>
> then I would like meth to be able to print something like:
>
> about to call meth(fred, 27) or
> about to call fred.meth(27)
>
> instead of:
>
> about to call meth(<__main__.C instance at 0x00A9D238>, 27)
Not a direct answer to your question, but this may be what you want:
If you give class C a __repr__ method you can avoid the default string.
>>> class C(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, self.name)
def meth(self, y):
print 'about to call %r.meth(%r)' % (self, y)
>>> fred = C('Fred')
>>> fred.meth(27)
about to call C('Fred').meth(27)
>>> def meth2(x, y):
print 'about to call meth2(%r, %r)' % (x, y)
>>> meth2(fred, 42)
about to call meth2(C('Fred'), 42)
Of course, this doesn't tell you the name of the variable that points
to the instance. For that (here's your direct answer), you will have
to go to the source:
>>> import inspect
>>> import linecache
>>> def f(a, b):
print 'function call from parent scope:'
caller = inspect.currentframe().f_back
filename = caller.f_code.co_filename
linecache.checkcache(filename)
line = linecache.getline(filename, caller.f_lineno)
print '' + line.strip()
return a*b
>>> fred = 4
>>> x = f(3, fred)
function call from parent scope:
x = f(3, fred)
But defining __repr__ is far easier and more common.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: "pow" (power) function
Russ wrote: > I have a couple of questions for the number crunchers out there: Sure, but the answers depend on the underlying Python implementation. And if we're talking CPython, they also depend on the underlying C implementation of libm (i.e., math.h). > Does "pow(x,2)" simply square x, or does it first compute logarithms > (as would be necessary if the exponent were not an integer)? The former, using binary exponentiation (quite fast), assuming x is an int or long. If x is a float, Python coerces the 2 to 2.0, and CPython's float_pow() function is called. This function calls libm's pow(), which in turn uses logarithms. > Does "x**0.5" use the same algorithm as "sqrt(x)", or does it use some > other (perhaps less efficient) algorithm based on logarithms? The latter, and that algorithm is libm's pow(). Except for a few special cases that Python handles, all floating point exponentation is left to libm. Checking to see if the exponent is 0.5 is not one of those special cases. If you're curious, download the Python source, open up Objects/floatobject.c, and check out float_pow(). The binary exponentation algorithms are in Objects/intobject:int_pow() and Objects/longobject:long_pow(). The 0.5 special check (and any other special case optimizations) could, in theory, be performed in the platform's libm. I'm not familiar enough with any libm implementations to comment on whether this is ever done, or if it's even worth doing... though I suspect that the 0.5 case is not. Hope that helps, --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: "pow" (power) function
Russ wrote:
> Ben Cartwright wrote:
> > Russ wrote:
>
> > > Does "pow(x,2)" simply square x, or does it first compute logarithms
> > > (as would be necessary if the exponent were not an integer)?
> >
> >
> > The former, using binary exponentiation (quite fast), assuming x is an
> > int or long.
> >
> > If x is a float, Python coerces the 2 to 2.0, and CPython's float_pow()
> > function is called. This function calls libm's pow(), which in turn
> > uses logarithms.
>
> I just did a little time test (which I should have done *before* my
> original post!), and 2.0**2 seems to be about twice as fast as
> pow(2.0,2). That seems consistent with your claim above.
Actually, the fact that x**y is faster than pow(x, y) has nothing do to
with the int vs. float issue. It's actually due to do the way Python
parses operators versus builtin functions. Paul Rubin hit the nail on
the head when he suggested you check the bytecode:
>>> import dis
>>> dis.dis(lambda x, y: x**y)
1 0 LOAD_FAST0 (x)
3 LOAD_FAST1 (y)
6 BINARY_POWER
7 RETURN_VALUE
>>> dis.dis(lambda x, y: pow(x,y))
1 0 LOAD_GLOBAL 0 (pow)
3 LOAD_FAST0 (x)
6 LOAD_FAST1 (y)
9 CALL_FUNCTION2
12 RETURN_VALUE
LOAD_GLOBAL + CALL_FUNCTION is more expensive than LOAD_FAST,
especially when you're doing it a million times (which, coincidentally,
timeit does).
Anyway, if you want to see the int vs. float issue in action, try this:
>>> from timeit import Timer
>>> Timer('2**2').timeit()
0.12681011582321844
>>> Timer('2.0**2.0').timeit()
0.6011743438121
>>> Timer('2.0**2').timeit()
0.36681835556112219
>>> Timer('2**2.0').timeit()
0.37949818370600497
As you can see, the int version is much faster than the float version.
The last two cases, which also use the float version, have an
additional performance hit due to type coercion. The relative speed
differences are similar when using pow():
>>> Timer('pow(2, 2)').timeit()
0.33000968869157532
>>> Timer('pow(2.0, 2.0)').timeit()
0.50356362184709269
>>> Timer('pow(2.0, 2)').timeit()
0.55112938185857274
>>> Timer('pow(2, 2.0)').timeit()
0.55198819605811877
> I'm a bit surprised that pow() would use logarithms even if the
> exponent is an integer. I suppose that just checking for an integer
> exponent could blow away the gain that would be achieved by avoiding
> logarithms. On the other hand, I would think that using logarithms
> could introduce a tiny error (e.g., pow(2.0,2) = 3.96 <- made
> up result) that wouldn't occur with multiplication.
These are good questions to ask an expert in floating point arithmetic.
Which I'm not. :-)
> > > Does "x**0.5" use the same algorithm as "sqrt(x)", or does it use some
> > > other (perhaps less efficient) algorithm based on logarithms?
> >
> > The latter, and that algorithm is libm's pow(). Except for a few
> > special cases that Python handles, all floating point exponentation is
> > left to libm. Checking to see if the exponent is 0.5 is not one of
> > those special cases.
>
> I just did another little time test comparing 2.0**0.5 with sqrt(2.0).
> Surprisingly, 2.0**0.5 seems to take around a third less time.
Again, this is because of the operator vs. function lookup issue.
pow(2.0, 0.5) vs. sqrt(2.0) is a better comparison:
>>> from timeit import Timer
>>> Timer('pow(2.0, 0.5)').timeit()
0.51701437102815362
>>> Timer('sqrt(2.0)', 'from math import sqrt').timeit()
0.46649096722239847
> None of these differences are really significant unless one is doing
> super-heavy-duty number crunching, of course, but I was just curious.
> Thanks for the information.
Welcome. :-)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Large algorithm issue -- 5x5 grid, need to fit 5 queens plus some squares
[EMAIL PROTECTED] wrote: > The first named clearbrd() which takes no variables, and will reset the > board to the 'no-queen' position. (snip) > The Code: > #!/usr/bin/env python > brd = [9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] > def clearbrd(): > brd = [9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] clearbrd() isn't doing what you want it to. It should be written as: def clearbrd(): global brd brd = [9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] Explanation: http://www.python.org/doc/faq/programming/#how-do-you-set-a-global-variable-in-a-function --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: "pow" (power) function
Mike Ressler wrote:
> >>> timeit.Timer("pow(111,111)").timeit()
> 10.968398094177246
> >>> timeit.Timer("111**111").timeit()
> 10.04007887840271
> >>> timeit.Timer("111.**111.").timeit()
> 0.36576294898986816
>
> The pow and ** on integers take 10 seconds, but the float ** takes only
> 0.36 seconds. (The pow with floats takes ~ 0.7 seconds). Clearly
> typecasting to floats is coming in here somewhere. (Python 2.4.1 on
> Linux FC4.)
No, there is not floating point math going on when the operands to **
are both int or long. If there were, the following two commands would
have identical output:
>>> 111**111
107362012888474225801214565046695501959850723994224804804775911
17562507619578334702249122617009363462146610374309298696786
330067310159463303558666910091026017785587295539622142057315437
069730229375357546494103400699864397711L
>>> int(111.0**111.0)
107362012888474224720018046104893130890742038145054486592605938
348914231670972887594279283213585412743799339280552157756096410
839752020853099983680499334815422669184408961411319810030383904
886446681757296875373689157536249282560L
The first result is accurate. Work it out by hand if you don't believe
me. ;-) The second suffers from inaccuracies due to floating point's
limited precision.
Of course, getting exact results with huge numbers isn't cheap,
computationally. Because there's no type in C to represent arbitrarily
huge numbers, Python implements its own, called "long". There's a fair
amount of memory allocation, bit shifting, and other monkey business
going on behind the scenes in longobject.c.
Whenever possible, Python uses C's built-in signed long int type (known
simply as "int" on the Python side, and implemented in intobject.c).
On my platform, C's signed long int is 32 bits, so values range from
-2147483648 to 2147483647. I.e., -(2**31) to (2**31)-1.
As long as your exponentiation result is in this range, Python uses
int_pow(). When it overflows, long_pow() takes over. Both functions
use the binary exponentiation algorithm, but long_pow() is naturally
slower:
>>> from timeit import Timer
>>> Timer('2**28').timeit()
0.24572032043829495
>>> Timer('2**29').timeit()
0.25511642791934719
>>> Timer('2**30').timeit()
0.27746782979170348
>>> Timer('2**31').timeit() # overflow: 2**31 > 2147483647
2.8205724462504804
>>> Timer('2**32').timeit()
2.2251812151589547
>>> Timer('2**33').timeit()
2.406713399635
Floating point is a whole 'nother ball game:
>>> Timer('2.0**30.0').timeit()
0.33266301963840306
>>> Timer('2.0**31.0').timeit() # no threshold here!
0.33437446769630697
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Importing an output from another function
James Stroud wrote:
> Try this (I think its called "argument expansion", but I really don't
> know what its called, so I can't point you to docs):
>
> def Func1():
> choice = ('A', 'B', 'C')
> output = random.choice(choice)
> output2 = random.choice(choice)
> return output, output2
>
> def Func2(*items):
> print items
>
> output = Func1()
> Func2(*output1)
Single asterisk == "arbitrary argument list". Useful in certain
patterns, but not something you use every day.
Documentation is in the tutorial:
http://www.python.org/doc/current/tut/node6.html#SECTION00673
PS: Like "self" for class instance methods, "*args" is the
conventional name of the arbitrary argument list.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: xmlrpclib and carriagereturn (\r)
Jonathan Ballet wrote:
> The problem is, xmlrpclib "eats" those carriage return characters when
> loading the XMLRPC request, and replace it by "\n". So I got "bla\n\nbla".
>
> When I sent back those parameters to others Windows clients (they are
> doing some kind of synchronisation throught the XMLRPC server), I send
> to them only "\n\n", which makes problems when rendering strings.
Did you develop the Windows client, too? If so, the client-side fix is
trivial: replace \n with \r\n in all renderable strings. Or update
both the client and the server to encode the strings, also trivial
using the base64 module.
If not, and you're in the unfortunate position of being forced to
support buggy third-party clients, read on.
> It seems that XMLRPC spec doesn't propose to eat carriage return
> characters : (from http://www.xmlrpc.com/spec)
(snip)
> It seems to be a rather strange comportement from xmlrpclib. Is it known ?
> So, what happens here ? How could I solve this problem ?
The XMLRPC spec doesn't say anything about CRs one way or the other.
Newline handling is necessarily left to the XML parser implementation.
In Python's case, xmlrpclib uses the xml.parsers.expat module, which
reads universal newlines and writes Unix-style newlines (\n). There's
no option to disable this feature.
You could modify xmlrpclib to use a different parser, but it would be
much easier to just hack the XML response right before it's sent out.
I'm assuming you used the SimpleXMLRPCServer module. Example:
from SimpleXMLRPCServer import *
class MyServer(SimpleXMLRPCServer):
def _marshaled_dispatch(self, data, dm=None):
response = SimpleXMLRPCDispatcher._marshaled_dispatch(self,
data, dm)
return response.replace('\n', '\r\n')
server = MyServer(('localhost', 8000))
server.register_introspection_functions()
server.register_function(lambda x: x, 'echo')
server.serve_forever()
Hope that helps,
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: xmlrpclib and carriagereturn (\r)
Jonathan Ballet wrote: > The problem is, xmlrpclib "eats" those carriage return characters when > loading the XMLRPC request, and replace it by "\n". So I got "bla\n\nbla". > > When I sent back those parameters to others Windows clients (they are > doing some kind of synchronisation throught the XMLRPC server), I send > to them only "\n\n", which makes problems when rendering strings. Whoops, just realized we're talking about "\n\r" here, not "\r\n". Most of my previous reply doesn't apply to your situation, then. As far as Python's expat parser is concerned, "\n\r" is two newlines: one Unix-style and one Mac-style. It correctly (per XML specs) normalizes both to Unix-style. Is "\n\r" being used as a newline by your Windows clients, or is it a control code? If the former, I'd sure like to know why. If the latter, then you're submitting binary data and you shouldn't be using to begin with. Try . If worst comes to worst and you have to stick with sending "\n\r" intact in a param, you'll need to modify xmlrpclib to use a different (and technically noncompliant) XML parser. Here's an ugly hack to do that out of the box: # In your server code: import xmlrpclib # This forces xmlrpclib to fall back on the obsolete xmllib module: xmlrpclib.ExpatParser = None xmllib doesn't normalize newlines, so it's noncompliant. But this is actually what you want. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: filter list fast
lars_woetmann wrote: > I have a list I filter using another list and I would like this to be > as fast as possible > right now I do like this: > > [x for x in list1 if x not in list2] > > i tried using the method filter: > > filter(lambda x: x not in list2, list1) > > but it didn't make much difference, because of lambda I guess > is there any way I can speed this up Both of these techniques are O(n^2). You can reduce it to O(n log n) by using sets: >>> set2 = set(list2) >>> [x for x in list1 if x not in set2] Checking to see if an item is in a set is much more efficient than a list. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Can I use a conditional in a variable declaration?
[EMAIL PROTECTED] wrote: > I've done this in Scheme, but I'm not sure I can in Python. > > I want the equivalent of this: > > if a == "yes": >answer = "go ahead" > else: >answer = "stop" > > in this more compact form: > > > a = (if a == "yes": "go ahead": "stop") > > > is there such a form in Python? I tried playing around with lambda > expressions, but I couldn't quite get it to work right. There will be, in Python 2.5 (final release scheduled for August 2006): >>> answer = "go ahead" if a=="yes" else "stop" See: http://mail.python.org/pipermail/python-dev/2005-September/056846.html http://www.python.org/doc/peps/pep-0308/ --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Function params with **? what do these mean?
Dave Hansen wrote: > On 20 Mar 2006 15:45:36 -0800 in comp.lang.python, > [EMAIL PROTECTED] (Aahz) wrote: > >Personally, I think it's a Good Idea to stick with the semi-standard > >names of *args and **kwargs to make searching easier... > > Agreed (though "kwargs" kinda makes my skin crawl). Coincidentally, "kwargs" is the sound my cat makes when coughing up a hairball. Fortunately, **kw is also semi-standard. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: what's the general way of separating classes?
John Salerno wrote: > bruno at modulix wrote: > > >> It seems like this can > >> get out of hand, since modules are separate from one another and not > >> compiled together. You'd end up with a lot of import statements. > > > > Sorry, but I don't see the correlation between compilation and import > > here ? > > I meant that in a language like C#, which compiles all the separate > files into one program, it is not necessary to have the equivalent of an > import/include type of statement. Er? Surely you've used C#'s "using" statement? Apples and oranges, but: C#'s "using Foo.Bar;" is roughly analogous to Python's "from foo.bar import *". C#'s "int x = Foo.Bar.f();" is roughly analogous to Python's "import foo.bar; x = foo.bar.f()". > You can just refer to the classes from > any other file. Iff they're in the same namespace. You can have multiple namespaces in the same .NET assembly, you know. > But in Python, without this behavior, you must > explicitly import any external files. That's true. Each Python file is essentially its own namespace. And when, say, __init__.py does a "from submodule import *" it essentially merges submodule's namespace into its own. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Simple py script to calc folder sizes
Caleb Hattingh wrote: > Unless you have a nice tool handy, calculating many folder sizes for > clearing disk space can be a click-fest nightmare. Looking around, I > found Baobab (gui tool); the "du" linux/unix command-line tool; the > extremely impressive tkdu: http://unpythonic.net/jeff/tkdu/ ; a python > script I didn't really understand at > http://vsbabu.org/webdev/zopedev/foldersize.html (are these "folder > objects" zope thingies?); there are also tools that can add a > "foldersize" column into Explorer on Windows > (foldersize.sourceforge.net, for example); the superb freeCommander > file-manager (win32) has the functionality built in, and so on. You also might want to take a look at KDirStat (http://kdirstat.sourceforge.net/) and its win32 counterpart, WinDirStat (http://windirstat.sourceforge.net/). > "du" is closest to what I was looking for, but is not immediately > cross-platform: I know I can probably get it through Cygwin, and there > is probably a win32 binary or clone around somewhere Try http://unxutils.sourceforge.net/ ... much quicker to set up than Cygwin. A pure Python port of du (and other unix utilities) would be cool, though. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Using Dictionaries in Sets - dict objects are unhashable?
Gregory Piñero wrote:
> Hey guys,
>
> I don't understand why this isn't working for me. I'd like to be able
> to do this. Is there another short alternative to get this
> intersection?
>
> [Dbg]>>> set([{'a':1},{'b':2}]).intersection([{'a':1}])
> Traceback (most recent call last):
> File "", line 1, in ?
> TypeError: dict objects are unhashable
Assuming you're using Python 2.4+:
>>> d1 = {'a':1, 'b':2, 'c':3, 'd':5}
>>> d2 = {'a':1, 'c':7, 'e':6}
>>> dict((k, v) for k, v in d1.iteritems() if k in d2)
{'a': 1, 'c': 3}
Or if you're comparing key/value pairs instead of just keys:
>>> dict((k, v) for k, v in d1.iteritems() if k in d2 and d2[k]==v)
{'a': 1}
Finally, if you're on Python 2.3, use these versions (less efficient
but still functional):
>>> dict([(k, v) for k, v in d1.iteritems() if k in d2])
{'a': 1, 'c': 3}
>>> dict([(k, v) for k, v in d1.iteritems() if k in d2 and d2[k]==v])
{'a': 1}
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: TypeError coercing to Unicode with field read from XML file
Randall Parker wrote: > My problem is that once I parse the file with minidom and a field from > it to another variable as shown with this line: > IPAddr = self.SocketSettingsObj.IPAddress > > I get this error: [...] > if TargetIPAddrList[0] <> "" and TargetIPPortList[0] <> > 0: > StillNeedSettings = False > > TestSettingsStore.SettingsDictionary['TargetIPAddr'] = > TargetIPAddrList[0] > > TestSettingsStore.SettingsDictionary['TargetIPPort'] = > TargetIPPortList[0] TargetIPAddrList[0] and TargetIPPortList[0] are *not* a string and an int, respectively. They're both DOM elements. If you want an int, you have to explicitly cast the variable as an int. Type matters in Python: >>> '0' == 0 False Back to your code: try a couple debugging print statements to see exactly what your variables are. The built-in type() function should help. To fix the problem, you need to dig a little deeper in the DOM, e.g.: addr = TargetIPAddrList[0].firstChild.nodeValue try: port = int(TargetIPPortList[0].firstChild.nodeValue) except ValueError: # safely handle invalid strings for int port = 0 if addr and port: StillNeedSettings = False TestSettingsStore.SettingsDictionary['TargetIPAddr'] = addr TestSettingsStore.SettingsDictionary['TargetIPPort'] = port --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: adding a new line of text in Tk
nigel wrote: > w =Label(root, text="Congratulations you have made it this far,just a few more > questions then i will be asking you some") > > The problem i have is where i have started to write some text"Congratulations > you have made it this far,just a few more questions then i will be asking you > some") > I would actually like to add some text but it puts it all on one line.I would > like to be able to tell it to start a new line. Just use \n in your string, e.g.: w = Label(root, text="Line 1\nLine 2\nLine 3") Or a triple-quoted string will do the trick: w = Label(root, text="""Line 1 Line 2 Line 3""") --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Simple py script to calc folder sizes
Caleb Hattingh wrote:
> Your code works on some folders but not others. For example, it works
> on my /usr/lib/python2.4 (the example you gave), but on other folders
> it terminates early with StopIteration exception on the
> os.walk().next() step.
>
> I haven't really looked at this closely enough yet, but it looks as
> though there may be an issue with permissions (and not having enough)
> on subfolders within a tree.
You're quite correct. Here's a version of John's code that handles
such cases:
import warnings
def foldersize(fdir):
"""Returns the size of all data in folder fdir in bytes"""
try:
root, dirs, files = os.walk(fdir).next()
except StopIteration:
warnings.warn("Could not access " + fdir)
return 0
files = [os.path.join(root, x) for x in files]
dirs = [os.path.join(root, x) for x in dirs]
return sum(map(os.path.getsize, files)) + sum(map(foldersize, dirs))
There's also another bug in the prettier() function that barfs on empty
directories, as it's taking the log of 0. The fix:
exponent = int(math.log(max(1, bytesize), 1024))
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: Newbie: splitting dictionary definition across two .py files
[EMAIL PROTECTED] wrote:
> I like to define a big dictionary in two
> files and use it my main file, build.py
>
> I want the definition to go into build_cfg.py and build_cfg_static.py.
>
> build_cfg_static.py:
> target_db = {}
> target_db['foo'] = 'bar'
>
> build_cfg.py
> target_db['xyz'] = 'abc'
>
> In build.py, I like to do
> from build_cfg_static import *
> from build_cfg import *
>
> ...now use target_db to access all elements. The problem looks like, I
> can't
> have the definition of target_db split across two files. I think they
> reside in different name spaces?
Yes. As it stands, build_cfg.py will not compile to bytecode
(NameError: name 'target_db' is not defined).
Unless you're doing something ugly like exec() on the its contents, .py
files need to be valid before they can be imported.
> Is there any way I can have the same
> dictionary definition split across two files?
Try this:
# build_cfg_static.py:
target_db = {}
target_db['foo'] = 'bar'
# build_cfg.py:
target_db = {}
target_db['xyz'] = 'abc'
# build.py:
from build_cfg_static import target_db
from build_cfg import target_db as merge_db
target_db.update(merge_db)
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: pre-PEP: The create statement
Michael Ekstrand wrote: > Is there a natural way > to extend this to other things, so that function creation can be > modified? For example: > > create tracer fib(x): > # Return appropriate data here > pass > > tracer could create a function that logs its entry and exit; behavior > could be modifiable at run time so that tracer can go away into oblivion. > > Given the current semantics of create, this wouldn't work. What would be > reasonable syntax and semantics to make something like this possible? The standard idiom is to use a function wrapper, e.g. def tracer(f): def wrapper(*args): print 'call', f, args result = f(*args) print f, args, '=', result return result return wrapper def fact(x): if not x: return 1 return x * fact(x-1) fact = tracer(fact) # wrap it The decorator syntax was added in Python 2.4 to make the wrapper application clearer: @tracer def fact(x): if not x: return 1 return x * fact(x-1) http://www.python.org/dev/peps/pep-0318 --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: confusing behaviour of os.system
Todd wrote:
> I'm trying to run the following in python.
>
> os.system('/usr/bin/gnuclient -batch -l htmlize -eval "(htmlize-file
> \"test.c\")"')
Python is interpreting the \"s as "s before it's being passed to
os.system. Try doubling the backslashes.
>>> print '/usr/bin/gnuclient -batch -l htmlize -eval "(htmlize-file>
>>> \"test.c\")"'
/usr/bin/gnuclient -batch -l htmlize -eval "(htmlize-file> "test.c")"
>>> print '/usr/bin/gnuclient -batch -l htmlize -eval "(htmlize-file>
>>> \\"test.c\\")"'
/usr/bin/gnuclient -batch -l htmlize -eval "(htmlize-file> \"test.c\")"
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: how to make a generator use the last yielded value when it regains control
John Salerno wrote: > It > is meant to take a number and generate the next number that follows > according to the Morris sequence. It works for a single number, but what > I'd like it to do is either: > > 1. repeat indefinitely and have the number of times controlled elsewhere > in the program (e.g., use the morris() generator in a for loop and use > that context to tell it when to stop) > > 2. just make it a function that takes a second argument, that being the > number of times you want it to repeat itself and create numbers in the > sequence Definitely go for (1). The Morris sequence is a great candidate to implement as a generator. As a generator, it will be more flexible and efficient than (2). def morris(num): """Generate the Morris sequence starting at num.""" num = str(num) yield num while True: result, cur, run = [], None, 0 for digit in num+'\n': if digit == cur: run += 1 else: if cur is not None: result.append(str(run)) result.append(cur) cur, run = digit, 1 num = ''.join(result) yield num # Example usage: from itertools import islice for n in islice(morris(1), 10): print n # Output: """ 1 11 21 1211 111221 312211 13112221 1113213211 31131211131221 13211311123113112211 """ --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: how to make a generator use the last yielded value when it regains control
John Salerno wrote:
> Actually I was just thinking about this and it seems like, at least for
> my purpose (to simply return a list of numbers), I don't need a
> generator.
Yes, if it's just a list of numbers you need, a generator is more
flexibility than you need. A generator would only come in handy if,
say, you wanted to give your users the option of getting the next N
items in the sequence, *without* having to recompute everything from
scratch.
> My understanding of a generator is that you do something to
> each yielded value before returning to the generator (so that you might
> not return at all),
A generator is just an object that spits out values upon request; it
doesn't care what the caller does with those values.
There's many different ways to use generators; a few examples:
# Get a list of the first 10
from itertools import islice
m = [n for n in islice(morris(1), 10)]
# Prompt user between each iteration
for n in morris(1):
if raw_input('keep going? ') != 'y':
break
print n
# Alternate way of writing the above
g = morris(1)
while raw_input('keep going? ') == 'y':
print g.next()
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: very strange problem in 2.4
John Zenger wrote: > Your list probably contains several references to the same object, > instead of several different objects. This happens often when you use a > technique like: > > list = [ object ] * 100 This is most likely what's going on. To the OP: please post the relevant code, including how you create mylist and the definitions of change_var_a and return_var_a. I suspect you're doing something like this: >>> \ class C(object): def __init__(self, x): self.x = x def __repr__(self): return 'C(%r)' % self.x >>> mylist = [C(0)]*3 + [C(1)]*3 >>> mylist [C(0), C(0), C(0), C(1), C(1), C(1)] >>> mylist[0].x = 2 [C(2), C(2), C(2), C(1), C(1), C(1)] When you should do something like: >>> mylist = [C(0) for i in range(3)] + [C(1) for i in range(3)] [C(0), C(0), C(0), C(1), C(1), C(1)] >>> mylist[0].x = 2 [C(2), C(0), C(0), C(1), C(1), C(1)] --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression intricacies: why do REs skip some matches?
Tim Chase wrote:
> > In [1]: import re
> >
> > In [2]: aba_re = re.compile('aba')
> >
> > In [3]: aba_re.findall('abababa')
> > Out[3]: ['aba', 'aba']
> >
> > The return is two matches, whereas, I expected three. Why does this
> > regular expression work this way?
It's just the way regexes work. You may disagree, but it's more
intuitive that iterated pattern searching be non-overlapping by
default. See also:
>>> 'abababa'.count('aba')
2
> Well, if you don't need the actual results, just their
> count, you can use
>
> how_many = len(re.findall('(?=aba)', 'abababa')
>
> which will return 3. However, each result is empty:
>
> >>> print re.findall('(?=aba)', 'abababa')
> ['','','']
>
> You'd have to do some chicanary to get the actual pieces:
(snip)
Actually, you can just define a group inside the lookahead assertion:
>>> re.findall('(?=(aba))', 'abababa')
['aba', 'aba', 'aba']
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: finish_endtag in sgmllib.py [Python 2.4]
Richard Hsu wrote: > code:- > ># Internal -- finish processing of end tag > def finish_endtag(self, tag): > if not tag: # < i am confused about this > found = len(self.stack) - 1 > if found < 0: > self.unknown_endtag(tag) # < and this > return > > I am a little confused as to what is intended by " if not tag: " > does it mean > if tag == None or tag == "": # ? Technically, not quite. See http://docs.python.org/lib/truth.html In practice, tag will indeed be a string type (shouldn't ever be None), so 'tag == ""' would work just as well* as 'not tag'. However, it's cleaner and clearer to use the latter. * = barring some contrived custom string type > tag here is suppose to be a string. > > so the only way it will be True is when its either None or its "", then > we are essentially passing None or "" to self.unknown_endtag(tag) ?? Yes, a string of length zero will always be passed to unknown_endtag here. Answering your implicit question, there's no good reason to write it as "self.unknown_endtag(None)" instead. --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: RIIA in Python 2.5 alpha: "with... as"
Terry Reedy wrote: > "Alexander Myodov" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > > and even list comprehensions: > > b1 = [l for l in a1] > > print "l: %s" % l > > This will go away in 3.0. For now, del l if you wish. Or use a generator expression: >>> b1 = list(l for l in a1) >>> l NameError: name 'l' is not defined --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: About classes and OOP in Python
Michele Simionato wrote:
> Roy Smith wrote:
>
> > That being said, you can indeed have private data in Python. Just prefix
> > your variable names with two underscores (i.e. __foo), and they effectively
> > become private. Yes, you can bypass this if you really want to, but then
> > again, you can bypass private in C++ too.
>
> Wrong, _foo is a *private* name (in the sense "don't touch me!"), __foo
> on the contrary is a *protected* name ("touch me, touch me, don't worry
> I am protected against inheritance!").
> This is a common misconception, I made the error myself in the past.
Sure, if you only consider "private" and "protected" as they're defined
in a dictionary. But then you'd be ignoring the meanings of the
public/private/protected keywords in virtually every language that has
them. http://www.google.com/search?q=public+private+protected
Python doesn't have these keywords, but most Python programmers are at
least somewhat familiar with a language that does use them. For the
sake of clarity:
__foo ~= private = used internally by base class only
_foo ~= protected = used internally by base and derived classes
The Python docs use the above definitions.
--Ben
--
http://mail.python.org/mailman/listinfo/python-list
Re: How can I determine the property attributes on a class or instance?
mrdylan wrote: > class TestMe(object): > def get(self): > pass > def set(self, v): > pass > > p = property( get, set ) > > t = TestMe() > type(t.p) #returns NoneType, what??? > t.p.__str__ #returns > --- > > What is the best way to determine that the attribute t.p is actually a > property object? Obviously I can test the __str__ or __repr__ > attributes using substring comparison but there must be a more elegant > idiom. Check the class instead of the instance: >>> type(TestMe.p) >>> type(t.__class__.p) >>> isinstance(t.__class__.p, property) True --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: object instance after if isalpha()
Marcelo Urbano Lima wrote: > class abc: > def __init__(self): > name='marcelo' > print x.name > Traceback (most recent call last): > File "1.py", line 12, in ? > print x.name > AttributeError: abc instance has no attribute 'name' In Python, you explicitly include a reference to an object when setting or accessing the object's attributes... even when you're inside one of that objects methods. I.e.: class abc: def __init__(self): self.name='marcelo' When you omit the "self." bit, Python creates a variable local to __init__() named "name", and the attribute is never set. This is different from some other OO languages (e.g. C++/Java/C#'s "this"), may take some getting used to. Hope that helps, --Ben -- http://mail.python.org/mailman/listinfo/python-list
Re: CGI module: get form name
ej wrote: > I'm not seeing how to get at the 'name' attribute of an HTML element. > > form = cgi.FieldStorage() > > gives you a dictionary-like object that has keys for the various named > elements *within* the form... > > I could easily replicate the form name in a hidden field, but there ought to > be some way to get directly at the form name but I'm just not seeing it. There isn't. This is a limitation of the CGI protocol, due to the way HTTP requests work. I.e., the name attribute of is *not* included in form submissions. Regardless of whether the method is GET or POST, it's only the fields' key/value pairs that are encoded and sent off to the server. If you need it, a hidden field is a good place for the form name. Or you could use cookies. Hope that helps, --Ben -- http://mail.python.org/mailman/listinfo/python-list
