range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread harrismh777
The following is intended as a helpful small extension to the xrange() 
range() discussion brought up this past weekend by Billy Mays...


With Python2 you basically have two ways to get a range of numbers:
   range() , which returns a list,  and
  xrange() , which returns an iterator.

With Python3 you must use range(), which produces an iterator; while 
xrange() does not exist at all (at least not on 3.2).


   I have been doing some research in number theory related to Mersenne 
Primes and perfect numbers (perfects, those integers whose primary 
divisors when summed result in the number, not including the number 
itself)... the first few of those being  6, 28, 496, 8128, 33550336, etc


   Never mind, but you know... are there an infinite number of them? 
... and of course, are there any "odd" perfect numbers... well not under 
10^1500   I digress, sorry ...


   This brought up the whole range() xrange() thing for me again 
because Python in any case is just not fast enough (no brag, just fact). 
So my perfect number stuff is written in C, for the moment. But, what 
about the differences in performance (supposing we were to stay in 
Python for small numbers) between xrange() vs range() [on Python2] 
versus range() [on Python3]?   I have put my code snips below, with some 
explanation below that...  these will run on either Python2 or 
Python3... except that if you substitute xrange() for range() for 
Python2  they will throw an exception on Python3... doh.


So, here is PyPerfectNumbers.py 

def PNums(q):
for i in range(2, q):
m = 1
s = 0
while m <= i/2:
if not i%m:
s += m
m += 1
if i == s:
 print(i)
return 0

def perf(n):
sum = 0
for i in range(1, n):
if n % i == 0:
sum += i
return sum == n

fperf = lambda n: n == sum(i for i in range(1, n) if n % i == 0)

-/end---

PNums(8200) will crunch out the perfect numbers below 8200.

perf(33550336) will test to see if 33550336 is a perfect number

fperf(33550336) is the lambda equivalent of perf()


   These are coded with range().  The interesting thing to note is that 
xrange() on Python2 runs "considerably" faster than the same code using 
range() on Python3. For large perfect numbers (above 8128) the 
performance difference for perf() is orders of magnitude. Actually, 
range() on Python2 runs somewhat slower than xrange() on Python2, but 
things are much worse on Python3.
   This is something I never thought to test before Billy's question, 
because I had already decided to work in C for most of my integer 
stuff... like perfects. But now that it sparked my interest, I'm 
wondering if there might be some focus placed on range() performance in 
Python3 for the future, PEP?


kind regards,
--
m harris

FSF  ...free as in freedom/
http://webpages.charter.net/harrismh777/gnulinux/gnulinux.htm
--
http://mail.python.org/mailman/listinfo/python-list


Re: Notifications when process is killed

2011-08-02 Thread AndDM
On Aug 1, 5:39 pm, Andrea Di Mario  wrote:
> Thanks Thomas, it is what i'm looking for.
>
> Regards
>
> --
> Andrea Di Mario

Hi, i've a little problem, here the code that i use:

def receive_signal(signum, stack):
logging.info('Received: %s' % signum)
reactor.stop()
signal.signal(signal.SIGTERM, receive_signal)
signal.signal(signal.SIGHUP, receive_signal)
signal.signal(signal.SIGINT, receive_signal)

The function works for SIGHUP and SIGINT, but it doesn't work for
SIGTERM. I've tried with simple killall and with -15 option.
Have you some ideas?

Thanks, regards.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread garabik-news-2005-05
harrismh777  wrote:
 these will run on either Python2 or 
> Python3... except that if you substitute xrange() for range() for 
> Python2  they will throw an exception on Python3... doh.

if 'xrange' not in dir(__builtins__):
xrange = range 

at the beginning of your program will fix that.

-- 
 ---
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__garabik @ kassiopeia.juls.savba.sk |
 ---
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python reading file memory cost

2011-08-02 Thread Peter Otten
Chris Rebert wrote:

>> The running result was that read a 500M file consume almost 2GB RAM, I
>> cannot figure it out, somebody help!
> 
> If you could store the floats themselves, rather than their string
> representations, that would be more space-efficient. You could then
> also use the `array` module, which is more space-efficient than lists
> (http://docs.python.org/library/array.html ). Numpy would also be
> worth investigating since multidimensional arrays are involved.
> 
> The next obvious question would then be: do you /really/ need /all/ of
> the data in memory at once?

This is what you (OP) should think about really hard before resorting to the 
optimizations mentioned above. Perhaps you can explain what you are doing 
with the data once you've loaded it into memory?

> Also, just so you're aware:
> http://docs.python.org/library/sys.html#sys.getsizeof

To give you an idea how memory usage explodes:

>>> line = "1.23 4.56 7.89 0.12\n"
>>> len(line) # size in the file
20
>>> sys.getsizeof(line)
60
>>> formatted = ["%2.6E" % float(x) for x in line.split()]
>>> sys.getsizeof(formatted) + sum(sys.getsizeof(s) for s in formatted)
312


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Notifications when process is killed

2011-08-02 Thread Thomas Rachel

Am 02.08.2011 09:30 schrieb AndDM:


The function works for SIGHUP and SIGINT, but it doesn't work for
SIGTERM. I've tried with simple killall and with -15 option.
Have you some ideas?


SIGTERM cannot be caught - it kills the process the hard way.

HTH,


Thomas
--
http://mail.python.org/mailman/listinfo/python-list


Re: Notifications when process is killed

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 8:30 AM, AndDM  wrote:
>        def receive_signal(signum, stack):
>                logging.info('Received: %s' % signum)
>                reactor.stop()
>        signal.signal(signal.SIGTERM, receive_signal)
>        signal.signal(signal.SIGHUP, receive_signal)
>        signal.signal(signal.SIGINT, receive_signal)
>
> The function works for SIGHUP and SIGINT, but it doesn't work for
> SIGTERM. I've tried with simple killall and with -15 option.
> Have you some ideas?

You won't be able to catch SIGTERM, as Thomas said, but if you need to
know what caused a process to end, the best way is to have code in the
parent process to catch SIGCHLD. When the child ends, for any reason,
its parent is sent SIGCHLD with some parameters, including the signal
number that caused the termination; you can then log anything you
want.

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread Peter Otten
harrismh777 wrote:

> The following is intended as a helpful small extension to the xrange()
> range() discussion brought up this past weekend by Billy Mays...
> 
> With Python2 you basically have two ways to get a range of numbers:
> range() , which returns a list,  and
>xrange() , which returns an iterator.
> 
> With Python3 you must use range(), which produces an iterator; while
> xrange() does not exist at all (at least not on 3.2).
> 
> I have been doing some research in number theory related to Mersenne
> Primes and perfect numbers (perfects, those integers whose primary
> divisors when summed result in the number, not including the number
> itself)... the first few of those being  6, 28, 496, 8128, 33550336, etc
> 
> Never mind, but you know... are there an infinite number of them?
> ... and of course, are there any "odd" perfect numbers... well not under
> 10^1500   I digress, sorry ...
> 
> This brought up the whole range() xrange() thing for me again
> because Python in any case is just not fast enough (no brag, just fact).
> So my perfect number stuff is written in C, for the moment. But, what
> about the differences in performance (supposing we were to stay in
> Python for small numbers) between xrange() vs range() [on Python2]
> versus range() [on Python3]?   I have put my code snips below, with some
> explanation below that...  these will run on either Python2 or
> Python3... except that if you substitute xrange() for range() for
> Python2  they will throw an exception on Python3... doh.

try:
range = xrange
except NameError:
pass
> 
> So, here is PyPerfectNumbers.py 
> 
> def PNums(q):
>  for i in range(2, q):
>  m = 1
>  s = 0
>  while m <= i/2:

i/2 returns a float in Python 3; you should use i//2 for consistency.

>  if not i%m:
>  s += m
>  m += 1
>  if i == s:
>   print(i)
>  return 0
> 
> def perf(n):
>  sum = 0
>  for i in range(1, n):
>  if n % i == 0:
>  sum += i
>  return sum == n
> 
> fperf = lambda n: n == sum(i for i in range(1, n) if n % i == 0)
> 
> -/end---
> 
> PNums(8200) will crunch out the perfect numbers below 8200.
> 
> perf(33550336) will test to see if 33550336 is a perfect number
> 
> fperf(33550336) is the lambda equivalent of perf()
> 
> 
> These are coded with range().  The interesting thing to note is that
> xrange() on Python2 runs "considerably" faster than the same code using
> range() on Python3. For large perfect numbers (above 8128) the
> performance difference for perf() is orders of magnitude. 

Python 3's range() is indeed slower, but not orders of magnitude:

$ python3.2 -m timeit -s"r = range(33550336)" "for i in r: pass"
10 loops, best of 3: 1.88 sec per loop
$ python2.7 -m timeit -s"r = xrange(33550336)" "for i in r: pass"
10 loops, best of 3: 1.62 sec per loop

$ cat tmp.py
try:
range = xrange
except NameError:
pass

def fperf(n):
return n == sum(i for i in range(1, n) if not n % i)

if __name__ == "__main__":
print(fperf(33550336))
$ time python2.7 tmp.py
True

real0m6.481s
user0m6.100s
sys 0m0.000s
$ time python3.2 tmp.py
True

real0m7.925s
user0m7.520s
sys 0m0.040s

I don't know what's causing the slowdown, maybe the int/long unification is 
to blame.

> Actually,
> range() on Python2 runs somewhat slower than xrange() on Python2, but
> things are much worse on Python3.
> This is something I never thought to test before Billy's question,
> because I had already decided to work in C for most of my integer
> stuff... like perfects. But now that it sparked my interest, I'm
> wondering if there might be some focus placed on range() performance in
> Python3 for the future, PEP?


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread Stefan Behnel

harrismh777, 02.08.2011 09:12:

With Python2 you basically have two ways to get a range of numbers:
range() , which returns a list, and
xrange() , which returns an iterator.

With Python3 you must use range(), which produces an iterator; while
xrange() does not exist at all (at least not on 3.2).


That's a good thing. There should be one - and preferably only one - 
obvious way to do it.


iterable: range(N)
list: list(range(N))
tuple: tuple(range(N))
set: set(range(N))
...

Less special cases in the language.



This brought up the whole range() xrange() thing for me again because
Python in any case is just not fast enough (no brag, just fact). So my
perfect number stuff is written in C, for the moment.


Or use Cython or PyPy, both of which are simpler ways to get your code up 
to speed.




The interesting thing to note is that
xrange() on Python2 runs "considerably" faster than the same code using
range() on Python3.


Are you sure that's due to Py3 range() vs. Py2 xrange()? Py3 has a 
different implementation for integers (which is still being optimised, 
BTW). That's much more likely to make a difference here.


What version of Py3 were you using? If you used the latest, maybe even the 
latest hg version, you will notice that that's substantially faster for 
integers than, e.g. 3.1.x.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread Thomas Rachel

Am 02.08.2011 09:12 schrieb harrismh777:

The following is intended as a helpful small extension to the xrange()
range() discussion brought up this past weekend by Billy Mays...

With Python2 you basically have two ways to get a range of numbers:
range() , which returns a list, and
xrange() , which returns an iterator.


No. An iterable. As already said, iterators are the ones stopping 
forever when the end is reached.


Generally, iterables are allowed to iterate multiple times.

xrange() resp. range() yield iterables.



> The interesting thing to note is that

xrange() on Python2 runs "considerably" faster than the same code using
range() on Python3. For large perfect numbers (above 8128) the
performance difference for perf() is orders of magnitude. Actually,
range() on Python2 runs somewhat slower than xrange() on Python2, but
things are much worse on Python3.


That sounds strange at the first glance.



This is something I never thought to test before Billy's question,
because I had already decided to work in C for most of my integer
stuff... like perfects. But now that it sparked my interest, I'm
wondering if there might be some focus placed on range() performance in
Python3 for the future, PEP?


I'm sure it is a matter of implementation. You cannot define by PEP what 
performance the implementations should have. Maybe range() in 3 is 
defined differently to xrange() in 2. I'm not so familiar with 3 to 
definitely confirm or decline that. The behaviour, though, seems to be 
the same, but range3 (as I call it now) has some more methods than 
xrange, like the rich comparison ones. Indexing works with all of them.



Thomas
--
http://mail.python.org/mailman/listinfo/python-list


Hardlink sub-directories and files

2011-08-02 Thread loial
I am trying to hardlink all files in a directory structure using
os.link.

This works fine for files, but the directory also contains sub-
directories (which themselves contain files and sub-directories).
However I do not think it is possible to hard link directories ?

So presumably I would need to do a mkdir for each sub-directory
encountered?
Or is there an easier way to hardlink everything in a directory
structure?.

The requirement is for hard links, not symbolic links

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 10:20 AM, Stefan Behnel  wrote:
> What version of Py3 were you using? If you used the latest, maybe even the
> latest hg version, you will notice that that's substantially faster for
> integers than, e.g. 3.1.x.
>

I just tried this out, using a slightly modified script but the same guts:

-
import sys
print(sys.version)

import time
def PNums(q):
   start=time.clock()
   for i in range(2, q):
   m = 1
   s = 0
   while m <= i/2:
   if not i%m:
   s += m
   m += 1
   if i == s:
print(i)
   print("Time: %f"%(time.clock()-start))
   return

# PNums(33550337)
PNums(1)

-

On my dual-core Windows laptop (it always saturates one core with
this, leaving the other for the rest of the system), the results show
no statistically significant difference between xrange and range in
Python 2, but notably slower overall performance in Python 3:

2.4.5 (#1, Dec 15 2009, 16:41:19)
[GCC 4.1.1]
Time: 14.474343
Using xrange: 14.415412

2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)]
Time: 8.990142
Using xrange: 9.015566

3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)]
Time: 24.401461

Since I don't have a build environment in which to test the latest
from hg, I switched to a Linux box. The following timings therefore
cannot be compared with the above ones.

3.3a0 (default:b95096303ed2, Jun  2 2011, 20:43:01)
[GCC 4.4.5]
Time: 34.39

2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5]
Time: 13.73
Using xrange: 13.67

My 3.3a0 is freshly pulled from hg, although I'm not sure if
sys.version has been correctly built. Once again, 2.6.6 shows no
significant difference between range and xrange, but 3 is noticeably
slower than 2. (I did several runs, but the variance between the runs
wasn't significant.)

Of course, this is all fairly moot; if you're doing really heavy
number crunching, CPython isn't the platform to use.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hardlink sub-directories and files

2011-08-02 Thread Peter Otten
loial wrote:

> I am trying to hardlink all files in a directory structure using
> os.link.
> 
> This works fine for files, but the directory also contains sub-
> directories (which themselves contain files and sub-directories).
> However I do not think it is possible to hard link directories ?
> 
> So presumably I would need to do a mkdir for each sub-directory
> encountered?
> Or is there an easier way to hardlink everything in a directory
> structure?.
> 
> The requirement is for hard links, not symbolic links

You cannot make hardlinks for directories, that's a restriction imposed by 
the operating system.

Look for shutil.copytree() for a template of what you are trying to do; you 
might even monkepatch it

# untested
copy2 = shutil.copy2
shutil.copy2 = os.link
try:
shutil.copytree(source, dest)
finally:
shutil.copy2 = copy2

if you are OK with a quick-and-dirty solution.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hardlink sub-directories and files

2011-08-02 Thread Thomas Jollans
On 02/08/11 11:32, loial wrote:
> I am trying to hardlink all files in a directory structure using
> os.link.
> 
> This works fine for files, but the directory also contains sub-
> directories (which themselves contain files and sub-directories).
> However I do not think it is possible to hard link directories ?
> 
> So presumably I would need to do a mkdir for each sub-directory
> encountered?
> Or is there an easier way to hardlink everything in a directory
> structure?.
> 
> The requirement is for hard links, not symbolic links
> 

Yes, you have to mkdir everything. However, there is an easier way:

subprocess.Popen(['cp','-Rl','target','link'])

This is assuming that you're only supporting Unices with a working cp
program, but as you're using hard links, that's quite a safe bet, I
should think.

- Thomas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 10:05 AM, Peter Otten <[email protected]> wrote:
> i/2 returns a float in Python 3; you should use i//2 for consistency.
>

And I forgot to make this change before doing my tests. Redoing the
Python 3 ones with // quite drastically changes things!

3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)]
Time: 17.917331

That's a lot closer to the 10-14 seconds that Python 2 was doing, but
still somewhat worse. Of course, no surprises that 2.6 is faster than
2.4.

But now, here's a fairly small change that makes a LOT of difference,
and reveals the value of range/xrange. Notice that m is just being a
classic iteration counter (it starts at 1, increments to a top
limit)...

-
   for m in xrange(1,i//2+1):
   if not i%m:
   s += m
-

This brings 2.6.5 down to 5.359383 seconds, or 4.703364 with xrange.
(I'm now changing two references from range to xrange.) Meanwhile, 3.2
has come down to 7.096237 seconds, and 2.4.5 to 8.61.

Comparing the latest 3.3a0 and 2.6.6 on the other box shows that
there's still a difference. Both of them improve with range instead of
manual incrementing, but 2.6.6 takes 6.95 seconds and 3.3a0 takes
13.83 (down from 13.73 and 34.39 in the previous test).

Conclusion: Python 3 is notably slower comparing floating point and
integer than Python 2 is comparing int and int. No surprises there!
But get everything working with integers, and use range() instead of
manually incrementing a variable, and things come much more even.

But as I said, CPython isn't the ideal language for heavy number
crunching. On the same Windows box, a Pike program using the same
algorithm took only 2.055 seconds. And a C program took 0.328 seconds.
But if you have other reasons for keeping it in Python, do keep it to
integers!

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Deeply nested dictionaries - should I look into a database or am I just doing it wrong?

2011-08-02 Thread BlueBird
I love named tuples, they rock for this kind of task: storing
complicated structure in a python compatible way, without too much
hassle.

And as far as load/save on disk is concerned, I simply use regular
python structure with safe eval [1]. I get all the flexibility that I
need for the file format, without the annoyance of writing a
conversion layer.

[1]: http://code.activestate.com/recipes/364469-safe-eval/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Complex sort on big files

2011-08-02 Thread Alistair Miles
Hi Dan,

Thanks for the reply.

On Mon, Aug 1, 2011 at 5:45 PM, Dan Stromberg  wrote:
>
> Python 2.x, or Python 3.x?

Currently Python 2.x.

> What are the types of your sort keys?

Both numbers and strings.

> If you're on 3.x and the key you need reversed is numeric, you can negate
> the key.

I did wonder about that. Would that not be doable also in Python 2.7,
using sorted(key=...)?

> If you're on 2.x, you can use an object with a __cmp__ method to compare
> objects however you require.

OK, right.

Looking at the HowTo/Sorting again [1] and the bit about cmp_to_key,
could you also achieve the same effect by returning a key with custom
implementations of rich comparison functions?

> You probably should timsort the chunks (which is the standard list_.sort() -
> it's a very good in-memory sort), and then merge them afterward using the
> merge step of merge sort.

Yes, that's what I understood by the activestate recipe [2].

So I guess my question boils down to, how do you do the merge step for
a complex sort? (Assuming each chunk had been completely sorted
first.)

Maybe the answer is also to construct a key with custom implementation
of rich comparisons?

Now I'm also wondering about the best way to sort each chunk. The
examples in [1] of complex sorts suggest the best way to do it is to
first sort by the secondary key, then sort by the primary key, relying
on the stability of the sort to get the desired outcome. But would it
not be better to call sorted() once, supplying a custom key function?

(As an aside, at the end of the section in [1] on Sort Stability and
Complex sorts, it says "The Timsort algorithm used in Python does
multiple sorts efficiently because it can take advantage of any
ordering already present in a dataset." - I get that that's true, but
I don't see how that's relevant to this strategy for doing complex
sorts. I.e., if you sort first by the secondary key, you don't get any
ordering that's helpful when you subsequently sort by the primary key.
...?)

(Sorry, another side question, I'm guessing reading a chunk of data
into a list and using Timsort, i.e., calling list.sort() or
sorted(mylist), is quicker than using bisect to keep the chunk sorted
as you build it?)

> heapq's not unreasonable for the merging, but I think it's more common to
> use a short list.

Do you mean a regular Python list, and calling min()?

> I have a bunch of Python sorts at
> http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - if you find your
> need is specialized (EG, 3.x sorting by a secondary key that's a string, in
> reverse), you could adapt one of these to do what you need.

Thanks. Re 3.x sorting by a secondary key that's a string, in reverse,
which one were you thinking of in particular?

> heapq is not bad, but if you find you need a logn datastructure, you might
> check out http://stromberg.dnsalias.org/~dstromberg/treap/ - a treap is also
> logn, but has a very good amortized cost because it sacrifices keeping
> things perfectly balanced (and rebalancing, and rebalancing...) to gain
> speed.  But still, you might be better off with a short list and min.

Thanks, that's really helpful.

Cheers,

Alistair

[1] http://wiki.python.org/moin/HowTo/Sorting/
[2] 
http://code.activestate.com/recipes/576755-sorting-big-files-the-python-26-way/

> On Mon, Aug 1, 2011 at 8:33 AM, aliman  wrote:
>>
>> Hi all,
>>
>> Apologies I'm sure this has been asked many times, but I'm trying to
>> figure out the most efficient way to do a complex sort on very large
>> files.
>>
>> I've read the recipe at [1] and understand that the way to sort a
>> large file is to break it into chunks, sort each chunk and write
>> sorted chunks to disk, then use heapq.merge to combine the chunks as
>> you read them.
>>
>> What I'm having trouble figuring out is what to do when I want to sort
>> by one key ascending then another key descending (a "complex sort").
>>
>> I understand that sorts are stable, so I could just repeat the whole
>> sort process once for each key in turn, but that would involve going
>> to and from disk once for each step in the sort, and I'm wondering if
>> there is a better way.
>>
>> I also thought you could apply the complex sort to each chunk before
>> writing it to disk, so each chunk was completely sorted, but then the
>> heapq.merge wouldn't work properly, because afaik you can only give it
>> one key.
>>
>> Any help much appreciated (I may well be missing something glaringly
>> obvious).
>>
>> Cheers,
>>
>> Alistair
>>
>> [1]
>> http://code.activestate.com/recipes/576755-sorting-big-files-the-python-26-way/
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
>



-- 
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health 
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: [email protected]
Tel: +44 (0)1865 287669
-- 
http://mail.python.org/mailman/listinfo

Notifications when process is killed

2011-08-02 Thread Andrea Di Mario
> You won't be able to catch SIGTERM, as Thomas said, but if you need to
> know what caused a process to end, the best way is to have code in the
> parent process to catch SIGCHLD. When the child ends, for any reason,
> its parent is sent SIGCHLD with some parameters, including the signal
> number that caused the termination; you can then log anything you
> want.

Hi, i understand, i've read that SIGKILL can't catch, but nothing
about SIGTERM.
If i use SIGCHLD, i will have difficult when parent receive a SIGTERM, or not?

Thanks, regards.

--
Andrea Di Mario
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Notifications when process is killed

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 11:36 AM, Andrea Di Mario  wrote:
> If i use SIGCHLD, i will have difficult when parent receive a SIGTERM, or not?

What you would do is create two processes. Set up your signal
handlers, then fork; in the parent, just watch for the child's death -
in the child, do all your work. When the parent receives SIGCHLD, it
can ascertain the cause of death.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: python reading file memory cost

2011-08-02 Thread 张彤
Thanks Peter! Your explanation is great!
And one more question:
Why it is still keeping the memory even when I del the large array in
interactive python mode?

-Original Message-
From: Peter Otten [mailto:[email protected]] 
Sent: Tuesday, August 02, 2011 4:26 PM
To: [email protected]
Subject: Re: python reading file memory cost

Chris Rebert wrote:

>> The running result was that read a 500M file consume almost 2GB RAM, 
>> I cannot figure it out, somebody help!
> 
> If you could store the floats themselves, rather than their string 
> representations, that would be more space-efficient. You could then 
> also use the `array` module, which is more space-efficient than lists 
> (http://docs.python.org/library/array.html ). Numpy would also be 
> worth investigating since multidimensional arrays are involved.
> 
> The next obvious question would then be: do you /really/ need /all/ of 
> the data in memory at once?

This is what you (OP) should think about really hard before resorting to the
optimizations mentioned above. Perhaps you can explain what you are doing
with the data once you've loaded it into memory?

> Also, just so you're aware:
> http://docs.python.org/library/sys.html#sys.getsizeof

To give you an idea how memory usage explodes:

>>> line = "1.23 4.56 7.89 0.12\n"
>>> len(line) # size in the file
20
>>> sys.getsizeof(line)
60
>>> formatted = ["%2.6E" % float(x) for x in line.split()]
>>> sys.getsizeof(formatted) + sum(sys.getsizeof(s) for s in formatted)
312




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python reading file memory cost

2011-08-02 Thread Thomas Jollans
On 02/08/11 13:00, 张彤 wrote:
> Thanks Peter! Your explanation is great!
> And one more question:
> Why it is still keeping the memory even when I del the large array in
> interactive python mode?

This is an optimisation of the way the Python interpreter allocates
memory: it holds on to memory it's not using any more for a while so it
can be easily re-used for new objects --- this is more efficient than
giving the memory back to the operating system only to request it again
shortly afterwards.

> 
> -Original Message-
> From: Peter Otten [mailto:[email protected]] 
> Sent: Tuesday, August 02, 2011 4:26 PM
> To: [email protected]
> Subject: Re: python reading file memory cost
> 
> Chris Rebert wrote:
> 
>>> The running result was that read a 500M file consume almost 2GB RAM, 
>>> I cannot figure it out, somebody help!
>>
>> If you could store the floats themselves, rather than their string 
>> representations, that would be more space-efficient. You could then 
>> also use the `array` module, which is more space-efficient than lists 
>> (http://docs.python.org/library/array.html ). Numpy would also be 
>> worth investigating since multidimensional arrays are involved.
>>
>> The next obvious question would then be: do you /really/ need /all/ of 
>> the data in memory at once?
> 
> This is what you (OP) should think about really hard before resorting to the
> optimizations mentioned above. Perhaps you can explain what you are doing
> with the data once you've loaded it into memory?
> 
>> Also, just so you're aware:
>> http://docs.python.org/library/sys.html#sys.getsizeof
> 
> To give you an idea how memory usage explodes:
> 
 line = "1.23 4.56 7.89 0.12\n"
 len(line) # size in the file
> 20
 sys.getsizeof(line)
> 60
 formatted = ["%2.6E" % float(x) for x in line.split()]
 sys.getsizeof(formatted) + sum(sys.getsizeof(s) for s in formatted)
> 312
> 
> 
> 
> 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to solve it?

2011-08-02 Thread TheSaint
守株待兔 wrote:

> from matplotlib.matlab import *
maybe you didn't install it

http://matplotlib.sourceforge.net/

BTW you haven't mention what version of python you're running.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Notifications when process is killed

2011-08-02 Thread Kushal Kumaran
On Tue, Aug 2, 2011 at 1:56 PM, Thomas Rachel

wrote:
> Am 02.08.2011 09:30 schrieb AndDM:
>
>> The function works for SIGHUP and SIGINT, but it doesn't work for
>> SIGTERM. I've tried with simple killall and with -15 option.
>> Have you some ideas?
>
> SIGTERM cannot be caught - it kills the process the hard way.
>

You must mean SIGKILL.  There's nothing special about SIGTERM, except
that it's the default for the kill command.

-- 
regards,
kushal
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Notifications when process is killed

2011-08-02 Thread Thomas Rachel

Am 02.08.2011 10:26 schrieb Thomas Rachel:

Am 02.08.2011 09:30 schrieb AndDM:


The function works for SIGHUP and SIGINT, but it doesn't work for
SIGTERM. I've tried with simple killall and with -15 option.
Have you some ideas?


SIGTERM cannot be caught - it kills the process the hard way.


Thank you for pointing out (via email) that this is wrong:

SIGTERM is a normal signal which can be caught in the normal way.
SIGKILL is (besides SIGSTOP) the one which cannot be caught, blocked or 
ignored.



Thomas
--
http://mail.python.org/mailman/listinfo/python-list


Please code review.

2011-08-02 Thread Karim


Hello,

I need a generator to create the cellname in a excell (using pyuno) 
document to assign value to
the correct cell. The following code does this but do you have some 
optimizations

on it, for instance to get the alphabetic chars instead of hard-coding it.

Cheers
karim

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def _xrange_cellnames(rows, cols):
... """Internal iterator function to compute excell table cellnames."""
... cellnames = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
... for row in xrange(1, rows+1):
... for char in cellnames.replace('', ' ').split()[:cols]:
... yield char + str(row)
...
>>> list( _xrange_cellnames(rows=3,cols=4))
['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3']


--
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Paul Kölle

Am 02.08.2011 13:45, schrieb Karim:


Hello,

I need a generator to create the cellname in a excell (using pyuno)
document to assign value to
the correct cell. The following code does this but do you have some
optimizations
on it, for instance to get the alphabetic chars instead of hard-coding it.

you can use:
import string
cellnames = string.ascii_uppercase

not sure why you need the .replace().split() stuff...


def _xrange_cellnames(rows, cols):
  cellnames = string.ascii_uppercase
  for row in xrange(1, rows+1):
for char in cellnames[:rows]:
  yield char + str(row)

cheers
 Paul




Cheers
karim

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> def _xrange_cellnames(rows, cols):
... """Internal iterator function to compute excell table cellnames."""
... cellnames = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
... for row in xrange(1, rows+1):
... for char in cellnames.replace('', ' ').split()[:cols]:
... yield char + str(row)
...
 >>> list( _xrange_cellnames(rows=3,cols=4))
['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3']





--
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 12:45 PM, Karim  wrote:
> ...         for char in cellnames.replace('', ' ').split()[:cols]:

for char in cellnames[:cols]:

Strings are iterable over their characters. Alternatively, you could
use chr and ord, but it's probably cleaner and simpler to have the
string there. It also automatically and implicitly caps your columns
at 26. On the other hand, if you want to support more than 26 columns,
you may want to make your own generator function to yield 'A', 'B',...
'Z', 'AA', 'AB', etc.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR

2011-08-02 Thread Billy Mays

On 08/01/2011 06:06 PM, Steven D'Aprano wrote:

Does your definition of "fixed" mean "gives wrong results for n>= 4 "?


fibo(4) == 3

False



Well, I don't know if you're trolling or just dumb:

http://en.wikipedia.org/wiki/Fibonacci_number

In [2]: for i in range(10):
   ...: print fibo(i)
   ...:
   ...:
0.0
1.0
1.0
2.0
3.0
5.0
8.0
13.0
21.0
34.0


--
Bill
--
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Martin Gracik
On Tue, Aug 2, 2011 at 1:45 PM, Karim  wrote:

>
> Hello,
>
> I need a generator to create the cellname in a excell (using pyuno)
> document to assign value to
> the correct cell. The following code does this but do you have some
> optimizations
> on it, for instance to get the alphabetic chars instead of hard-coding it.
>
> Cheers
> karim
>
> Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
> [GCC 4.5.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> def _xrange_cellnames(rows, cols):
> ... """Internal iterator function to compute excell table cellnames."""
> ... cellnames = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> ... for row in xrange(1, rows+1):
> ... for char in cellnames.replace('', ' ').split()[:cols]:
> ... yield char + str(row)
> ...
> >>> list( _xrange_cellnames(rows=3,cols=**4))
> ['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3']
>
>
> --
> http://mail.python.org/**mailman/listinfo/python-list
>

You could use something like this:

Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from itertools import imap, product
>>> import string
>>>
>>> def get_cellnames(rows, cols):
... return imap(str().join, product(string.ascii_uppercase[:cols],
... imap(str, range(1, rows + 1
...
>>> print list(get_cellnames(rows=3, cols=4))
['A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3', 'D1', 'D2', 'D3']
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Peter Otten
Karim wrote:

> I need a generator to create the cellname in a excell (using pyuno)
> document to assign value to the correct cell. 

Isn't there a way to use a (row, column) tuple directly? If so I'd prefer 
that. Also, there used to be an alternative format to address a spreadsheet 
cell with something like "R1C2".

> The following code does this but do you have some
> optimizations
> on it, for instance to get the alphabetic chars instead of hard-coding it.
> 
> Cheers
> karim
> 
> Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
> [GCC 4.5.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> def _xrange_cellnames(rows, cols):
> ... """Internal iterator function to compute excell table
> cellnames."""
> ... cellnames = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> ... for row in xrange(1, rows+1):
> ... for char in cellnames.replace('', ' ').split()[:cols]:

That is interesting ;) But for maximum clarity use

for char in cellnames[:cols]:

instead.

> ... yield char + str(row)
> ...
>  >>> list( _xrange_cellnames(rows=3,cols=4))
> ['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3']

Here's my (untested) attempt to handle columns beyond "Z":

from itertools import chain, count, imap, islice, product
from string import ascii_uppercase

def columnnames():
alpha = (ascii_uppercase,)
return imap("".join, chain.from_iterable(product(*alpha*i) for i in 
count(1)))

def cellnames(columns, rows):
for row in xrange(1, rows+1):
for column in islice(columnnames(), columns):
yield column + str(row)


if __name__ == "__main__":
import sys
print list(cellnames(*map(int, sys.argv[1:])))

I think the subject has come up before; goo^h^h^h the search engine of your 
choice is your friend.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hardlink sub-directories and files

2011-08-02 Thread Tim Chase

On 08/02/2011 04:32 AM, loial wrote:

I am trying to hardlink all files in a directory structure using
os.link.
Or is there an easier way to hardlink everything in a directory
structure?.

The requirement is for hard links, not symbolic links


While Peter & Thomas gave good answers, also be aware that 
hard-links can't cross mount-points (an OS limitation).  So if 
you have something mounted under the directory you're trying to 
hard-link-copy, attempting to create a hard-link will fail for 
things within that mount.


-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Karim


Thanks Paul,

I never used string module.
In fact, cellnames.split('') gives a syntax error w/ empty string. 
That's why I use the intermediate replace() which accept that.


Cheers
Karim

On 08/02/2011 02:04 PM, Paul Kölle wrote:

Am 02.08.2011 13:45, schrieb Karim:


Hello,

I need a generator to create the cellname in a excell (using pyuno)
document to assign value to
the correct cell. The following code does this but do you have some
optimizations
on it, for instance to get the alphabetic chars instead of 
hard-coding it.

you can use:
import string
cellnames = string.ascii_uppercase

not sure why you need the .replace().split() stuff...


def _xrange_cellnames(rows, cols):
  cellnames = string.ascii_uppercase
  for row in xrange(1, rows+1):
for char in cellnames[:rows]:
  yield char + str(row)

cheers
 Paul




Cheers
karim

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def _xrange_cellnames(rows, cols):
... """Internal iterator function to compute excell table cellnames."""
... cellnames = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
... for row in xrange(1, rows+1):
... for char in cellnames.replace('', ' ').split()[:cols]:
... yield char + str(row)
...
>>> list( _xrange_cellnames(rows=3,cols=4))
['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3']







--
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Karim


Thanks Chris!

It seems I am blind I should have seen it...
In fact I started with the need (imaginary) to use enumerate() to get 
some indices

but ended in a more simple code. Indeed, your's is simpler.

For the double chars extension I will see if I need it in the future.

Cheers
Karim

On 08/02/2011 02:07 PM, Chris Angelico wrote:

On Tue, Aug 2, 2011 at 12:45 PM, Karim  wrote:

... for char in cellnames.replace('', ' ').split()[:cols]:

for char in cellnames[:cols]:

Strings are iterable over their characters. Alternatively, you could
use chr and ord, but it's probably cleaner and simpler to have the
string there. It also automatically and implicitly caps your columns
at 26. On the other hand, if you want to support more than 26 columns,
you may want to make your own generator function to yield 'A', 'B',...
'Z', 'AA', 'AB', etc.

ChrisA


--
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Karim

On 08/02/2011 02:27 PM, Peter Otten wrote:

Karim wrote:


I need a generator to create the cellname in a excell (using pyuno)
document to assign value to the correct cell.

Isn't there a way to use a (row, column) tuple directly? If so I'd prefer
that. Also, there used to be an alternative format to address a spreadsheet
cell with something like "R1C2".



In fact, in pyuno I get the following code:


values = ( (22.5,21.5,121.5),
   (5615.3,615.3,-615.3),
   (-2315.7,315.7,415.7) )
table.getCellByName("A2").setValue(22.5)
table.getCellByName("B2").setValue(5615.3)
table.getCellByName("C2").setValue(-2315.7)

Indeed the values tuple is formated like (row, column).
I want to write simply and get cellname ondemand via an iterator 
function like that:


values = ( (22.5,21.5,121.5),
   (5615.3,615.3,-615.3),
   (-2315.7,315.7,415.7) )

it = _xrange_cellnames(rows=len(value), cols=len(values[0]))

table.getCellByName(it.next()).setValue(22.5)
table.getCellByName(it.next()).setValue(5615.3)
table.getCellByName(it.next()).setValue(-2315.7)



The following code does this but do you have some
optimizations
on it, for instance to get the alphabetic chars instead of hard-coding it.

Cheers
karim

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
  >>>  def _xrange_cellnames(rows, cols):
... """Internal iterator function to compute excell table
cellnames."""
... cellnames = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
... for row in xrange(1, rows+1):
... for char in cellnames.replace('', ' ').split()[:cols]:

That is interesting ;) But for maximum clarity use

for char in cellnames[:cols]:

instead.
Yes I am blind ;o) I did not see simplification. Simple is better than 
complicate...



... yield char + str(row)
...
  >>>  list( _xrange_cellnames(rows=3,cols=4))
['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3']

Here's my (untested) attempt to handle columns beyond "Z":

from itertools import chain, count, imap, islice, product
from string import ascii_uppercase

def columnnames():
 alpha = (ascii_uppercase,)
 return imap("".join, chain.from_iterable(product(*alpha*i) for i in
count(1)))

def cellnames(columns, rows):
 for row in xrange(1, rows+1):
 for column in islice(columnnames(), columns):
 yield column + str(row)


if __name__ == "__main__":
 import sys
 print list(cellnames(*map(int, sys.argv[1:])))

I think the subject has come up before; goo^h^h^h the search engine of your
choice is your friend.


I will study this one and itertools modules, many thanks.

Cheers
Karim
--
http://mail.python.org/mailman/listinfo/python-list


Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR

2011-08-02 Thread Alain Ketterlin
Billy Mays
<[email protected]> writes:

> On 08/01/2011 06:06 PM, Steven D'Aprano wrote:
>> Does your definition of "fixed" mean "gives wrong results for n>= 4 "?

> Well, I don't know if you're trolling or just dumb:

Steven is right, and you look dumb.

>>> fibo(4)
3.0004

Even though the math is correct, your program is wrong. It doesn't even
produce integers. And it will fail with overflow for big values.

-- Alain.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Karim


Thanks Martin,

This is the generator expression version.
I can use both function generator or generator expression version 
correction.


Cheers
Karim

On 08/02/2011 02:47 PM, Martin Gracik wrote:

def get_cellnames2(rows, cols):
rows = range(1, rows + 1)
cols = string.ascii_uppercase[:cols]
return ('%s%s' % (col, row) for row in rows for col in cols)


--
http://mail.python.org/mailman/listinfo/python-list


Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR

2011-08-02 Thread Billy Mays

On 08/02/2011 08:45 AM, Alain Ketterlin wrote:

produce integers. And it will fail with overflow for big values.


If it would make you feel better I can use decimal.

Also, perhaps I can name my function billy_fibo(n), which is defined as 
billy_fibo(n) +error(n) = fibo(n), where error(n) can be made 
arbitrarily small.  This runs in constant time rather than linear 
(memoized) or exponential (fully recursive) at the cost of a minutia of 
accuracy.  I find this tradeoff acceptable.


--
Bill
--
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Peter Otten
Karim wrote:

> values = ( (22.5,21.5,121.5),
> (5615.3,615.3,-615.3),
> (-2315.7,315.7,415.7) )
> 
> it = _xrange_cellnames(rows=len(value), cols=len(values[0]))
> 
> table.getCellByName(it.next()).setValue(22.5)
> table.getCellByName(it.next()).setValue(5615.3)
> table.getCellByName(it.next()).setValue(-2315.7)

Some googling suggests that there exists a getCellByPosition() method. With 
that the above would become (untested):

for x, column in enumerate(values):
for y, value in enumerate(column):
table.getCellByPosition(x, y).setValue(value)


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Please code review.

2011-08-02 Thread Karim

On 08/02/2011 03:59 PM, Peter Otten wrote:

Karim wrote:


values = ( (22.5,21.5,121.5),
(5615.3,615.3,-615.3),
(-2315.7,315.7,415.7) )

it = _xrange_cellnames(rows=len(value), cols=len(values[0]))

table.getCellByName(it.next()).setValue(22.5)
table.getCellByName(it.next()).setValue(5615.3)
table.getCellByName(it.next()).setValue(-2315.7)

Some googling suggests that there exists a getCellByPosition() method. With
that the above would become (untested):

for x, column in enumerate(values):
 for y, value in enumerate(column):
 table.getCellByPosition(x, y).setValue(value)


Thanks for the tip I will check com.sun.star.text.TextTable API.

Regards
Karim
--
http://mail.python.org/mailman/listinfo/python-list


'Use-Once' Variables and Linear Objects

2011-08-02 Thread Neal Becker
I thought this was an interesting article

http://www.pipeline.com/~hbaker1/Use1Var.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR

2011-08-02 Thread Steven D'Aprano
Billy Mays wrote:

> On 08/02/2011 08:45 AM, Alain Ketterlin wrote:
>> produce integers. And it will fail with overflow for big values.
> 
> If it would make you feel better I can use decimal.
> 
> Also, perhaps I can name my function billy_fibo(n), which is defined as
> billy_fibo(n) +error(n) = fibo(n), where error(n) can be made
> arbitrarily small.

So you say, but I don't believe it. Given fibo, the function you provided
earlier, the error increases with N:

>>> fibo(82) - fib(82)  # fib returns the accurate Fibonacci number
160.0
>>> fibo(182) - fib(182)
2.92786721937918e+23

Hardly "arbitrarily small".


Your function also overflows for N = 1475:

>>> fibo(1475)
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 3, in fibo
OverflowError: (34, 'Numerical result out of range')


The correct value only has 307 digits, so it's not that large a number for
integer math. I won't show them all, but it starts and ends like this:

8077637632...87040886025


> This runs in constant time rather than linear  
> (memoized) 

A good memoisation scheme will run in constant time (amortised).


> or exponential (fully recursive) 

Good heavens no. Only the most naive recursive algorithm is exponential.
Good ones (note plural) are linear. Combine that with memoisation, and you
have amortised constant time.


> at the cost of a minutia of accuracy.

I'm reminded of a time my wife was travelling across the US with her band's
roadies. At some point crossing the miles and miles of highway through the
desert, she pointed out that they were lost and nowhere even close to the
city where they were supposed to be playing. The driver answered, "Who
cares, we're making great time!"

(True story.)


> I find this tradeoff acceptable. 

Given that Fibonacci numbers are mostly of interest to number theorists, who
care about the *actual* Fibonacci numbers and not almost-but-not-quite
Fibonacci numbers, I'm having a lot of difficulty imagining what sort of
application you have in mind that could legitimately make that trade-off.



-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Require information on python API for Subversion related work

2011-08-02 Thread Shambhu Rajak
Hi ,

I need an api that can be used to do following operations on Subversion  
repository tool:

1.   Create branch

2.   Check out

3.   Check in

4.   Merge

Regards,
Shambhu

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: where the function has problem? n = 900 is OK , but n = 1000 is ERROR

2011-08-02 Thread Billy Mays

On 08/02/2011 10:15 AM, Steven D'Aprano wrote:

So you say, but I don't believe it. Given fibo, the function you provided
earlier, the error increases with N:


fibo(82) - fib(82)  # fib returns the accurate Fibonacci number

160.0

fibo(182) - fib(182)

2.92786721937918e+23

Hardly "arbitrarily small".


Perhaps the individual number is big, but compare that to:

(fibo(n) - fib(n)) / fib(n)

The number is still quite close to the actual answer.


Your function also overflows for N = 1475:


fibo(1475)

Traceback (most recent call last):
   File "", line 1, in
   File "", line 3, in fibo
OverflowError: (34, 'Numerical result out of range')


The correct value only has 307 digits, so it's not that large a number for
integer math. I won't show them all, but it starts and ends like this:

8077637632...87040886025


Yes, I mentioned possibly using the decimal class, which I suppose does 
lose the constant access time depending on how its implemented.



A good memoisation scheme will run in constant time (amortised).


Amortized perhaps, but this assumes the call happening a number of 
times.  Also, this requires linear memory to store previous values.



Good heavens no. Only the most naive recursive algorithm is exponential.
Good ones (note plural) are linear. Combine that with memoisation, and you
have amortised constant time.



Not all recursive functions can be memoized (or they can but for 
practically no benefit).  What I was getting at was that a closed form 
expression of a recurrence might be significantly faster at an 
acceptable loss in accuracy.  For an example, see the Ackermann function.



Given that Fibonacci numbers are mostly of interest to number theorists, who
care about the *actual* Fibonacci numbers and not almost-but-not-quite
Fibonacci numbers, I'm having a lot of difficulty imagining what sort of
application you have in mind that could legitimately make that trade-off.


I was trying to show that there is an alternate method of calculation. 
Accuracy losses are really a problem with the underlying machinery 
rather than the high level code.  If the recursive form of fib() were 
written in c, the integers would have overflown a long while ago 
compared to float.


One other note, Fibonacci numbers grow exponentially fast (with respect 
to the number of bits), and python's integer multiplication takes 
exponential time (karatsuba rather than fft).  If we are going to 
discuss the behavior of python's numeric types, then lets talk about how 
slow python will become for the nth Fibonacci integer and how much space 
it will take compared to the floating point short concise and almost as 
close form.


--
Bill
--
http://mail.python.org/mailman/listinfo/python-list


Re: What Programing Language are the Largest Website Written In?

2011-08-02 Thread ccc31807
On Jul 31, 2:38 pm, gavino  wrote:
> facebook is php
>
> myspace is microsoft
>
> aol was tcl and aolserver c embedding tcl interp
>
> priceline is lisp
>
> reddit is python was lisp orig
>
> amazon was perl
>
> livejournal was perl

Most of these are tech companies. Tech companies are very important,
but so are other kinds of companies. What do manufacturing companies
use, like Ford and Toyota, energy companies like BP and Exxon,
pharmaceutical companies, consumer product companies, and so on? What
about the big retailers, Sears, WalMart, Target, etc.?

CC.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Require information on python API for Subversion related work

2011-08-02 Thread Tim Golden

On 02/08/2011 14:02, Shambhu Rajak wrote:

I need an api that can be used to do following operations on Subversion
repository tool:

1.Create branch

2.Check out

3.Check in

4.Merge



http://pysvn.tigris.org/

(which is, by the way, the first Google hit for "Python Subversion 
bindings")


TJG
--
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread Steven D'Aprano
harrismh777 wrote:

> The following is intended as a helpful small extension to the xrange()
> range() discussion brought up this past weekend by Billy Mays...
> 
> With Python2 you basically have two ways to get a range of numbers:
> range() , which returns a list,  and
>xrange() , which returns an iterator.


xrange does not return an iterator. It returns a lazy iterable object, which
is not the same thing.

In Python, "iterator" is not merely a generic term for something which can
be iterated over. An iterator is an object which obeys the iterator
protocol, which depends on at least two properties:

* the object must have a next() method, which returns values until the
iterator is exhausted, and then raises StopIteration;

(In Python 3, the method is __next__.)

* and the object must have an __iter__() method which returns itself.

(Also, "non-broken" iterators will continue to raise StopIteration once they
do so once. That is, they can't be reset or repeated.)

xrange objects fail on both accounts. (Likewise for range objects in Python
3.)


[...]
> These are coded with range().  The interesting thing to note is that
> xrange() on Python2 runs "considerably" faster than the same code using
> range() on Python3. For large perfect numbers (above 8128) the
> performance difference for perf() is orders of magnitude. Actually,
> range() on Python2 runs somewhat slower than xrange() on Python2, but
> things are much worse on Python3.

I find these results surprising, at least for numbers as small as 8128, and
suspect your timing code is inaccurate.

(But don't make the mistake of doing what I did, which was to attempt to
produce range(29000) in Python 2. After multiple *hours* of swapping, I
was finally able to kill the Python process and get control of my PC again.
Sigh.)

I would expect that, in general, Python 3.1 or 3.2 is slightly slower than
Python 2.6 or 2.7. (If you're using Python 3.0, stop now!) But in Python 2,
I wouldn't expect orders of magnitude difference in range and xrange. Using
the function perf(N) you gave, and a modified copy perfx(N) which simply
replaces xrange for range, I get these timings in Python2.6:

>>> from timeit import Timer
>>> t1 = Timer('perf(1)', 'from __main__ import perf')
>>> t2 = Timer('perfx(1)', 'from __main__ import perfx')
>>> min(t1.repeat(number=1000, repeat=5))
3.0614659786224365
>>> min(t2.repeat(number=1000, repeat=5))
2.8787298202514648

A small difference, but not an order of magnitude. In Python 3.1, I get
this:

>>> min(t1.repeat(number=1000, repeat=5))
3.4577009677886963


> This is something I never thought to test before Billy's question,
> because I had already decided to work in C for most of my integer
> stuff... like perfects. But now that it sparked my interest, I'm
> wondering if there might be some focus placed on range() performance in
> Python3 for the future, PEP?

Oh indubitably. I doubt it will need a PEP. Python 3.x is still quite young,
and the focus is on improving unicode support, but performance improvements
will usually be welcome.

However, at some point I would expect adding hand-crafted optimizations to
CPython will cease to be worthwhile. Guido is already talking about CPython
becoming the reference implementation, and PyPy the production
implementation because it's faster. PyPy's optimizing compiler is already
about twice as fast as CPython, and for at least one specially crafted
example, faster than C:

http://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-carefully-crafted.html



-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Notifications when process is killed

2011-08-02 Thread Hansmeet Singh
you shouldn't have anything to worry about SIGTERM if you send out a SIGCHLD

On Tue, Aug 2, 2011 at 3:44 AM, Chris Angelico  wrote:

> On Tue, Aug 2, 2011 at 11:36 AM, Andrea Di Mario 
> wrote:
> > If i use SIGCHLD, i will have difficult when parent receive a SIGTERM, or
> not?
>
> What you would do is create two processes. Set up your signal
> handlers, then fork; in the parent, just watch for the child's death -
> in the child, do all your work. When the parent receives SIGCHLD, it
> can ascertain the cause of death.
>
> ChrisA
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [ANN] IPython 0.11 is officially out

2011-08-02 Thread Robert Kern

On 8/1/11 8:54 AM, Thorsten Kampe wrote:


The documentation[1] says "If you are upgrading to version 0.11 of
IPython, you will need to migrate your old ipythonrc or ipy_user_conf.py
configuration files to the new system. Read on for information on how to
do this." Unfortunately there is no more mentioning of "migration", so
the developers' approach seems to be: "read all about the new
configuration system and see if you can somehow duplicate your old
ipythonrc settings. Good luck!".


Or you can ask nicely on ipython-user, and we can help you migrate your old 
ipythonrc.


  http://mail.scipy.org/mailman/listinfo/ipython-user

You can basically start with the ipython_config.py that is generated the first 
time and edit it. It is fully commented and demonstrates every configurable 
option with the defaults commented out. You just uncomment the appropriate lines 
and put in your values.


You are right that the HOWTO migrate documentation is missing. It's an oversight 
that you can help remedy.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


python import error, what's wrong?

2011-08-02 Thread smith jack
I am using pydev plugin in eclipse, all things works just as well
but now i have confronted with a confusing problem, that is i can
import a module write by myself successfully, but when i try to run
this program,
error just shows up, what's wrong?

the directory structure is as follows:

src
  org.test
  A.py
  org.lab
  B.py


contents of A seems like:
class A
...

contents of B seems like:  // I try to run B.py,
python import error just appears, why?

from org.test.A import A
a = A()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: range() vs xrange() Python2|3 issues for performance

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 3:45 PM, Steven D'Aprano
 wrote:
> (But don't make the mistake of doing what I did, which was to attempt to
> produce range(29000) in Python 2. After multiple *hours* of swapping, I
> was finally able to kill the Python process and get control of my PC again.
> Sigh.)
>

That is sad. (And, I am only slightly ashamed to admit it, quite
funny.) But on the other hand, quite impressive that it didn't bomb
anywhere!

Whenever I'm about to do something really dangerous, I like to first
open an SSH session from another computer - bash running inside there
can usually kill any process fairly efficiently, without need for UI
interaction. Failing that, good ol' Ctrl-Alt-F1 to get to a console is
usually the next best, but sometimes logging in requires a lot of
resources.

This is actually one place where I'm really glad Python doesn't do
much with multiple threads (for instance, it won't gc on another
thread). A dual core CPU keeps everything happy!

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyWart: os.path needs immediate attention!

2011-08-02 Thread rantingrick
On Aug 1, 3:19 am, Teemu Likonen  wrote:
> * 2011-07-30T10:57:29+10:00 * Steven D'Aprano wrote:
>
> > Teemu Likonen wrote:
> >> Pathnames and the separator for pathname components should be
> >> abstracted away, to a pathname object.
>
> > Been there, done that, floundered on the inability of people to work
> > out the details.
>
> >http://www.python.org/dev/peps/pep-0355/
>
> I'm very much a Lisp person and obviously got the idea of pathname
> objects from Common Lisp. Lazily I'm also learning Python too but at the
> moment I can't comment on the details of that PEP. Yet, generally I
> think that's the way to improve pathnames, not the "rantinrick's".

This thread was intended to expose another PyWart and get the
community juices flowing. os.path is broken and cannot be repaired
because os.path was an improper API to begin with. The only way to
solve this problem is to introduce a new Path object.

A new Path object is the answer.

Some have said "been there, done that" with a sarcastic and defeatist
point of view. I say we need to re-visit the proposal of PEP-0355 and
hash out something quickly. We also need to realize that one day or
another this Path object is going to become reality and the longer we
drag our feet getting it implemented the more painful the transition
is going to be.

I feel Python community is in an awkward teenage stage at this point
not really sure of it's self or direction. Living only for today with
no ability to project the future and wasting too much time arguing
over minutiae. We need a collective wake-up-call in the form of a slap
on the face. We need to start making the hard choices necessary to
clean up this library.

Python3000 was only the beginning! ONLY THE BEGINNING!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python import error, what's wrong?

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 4:52 PM, smith jack  wrote:
> from org.test.A import A

This is going to look for org/test/A.py but not for org.test/A.py -
are you able to rename your directories to not have dots?

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


How to define repeated string when using the re module?

2011-08-02 Thread smith jack
if it's for a single character, this should be very easy, such as
c{m,n}   the occurrence of c is between m and n,

if i want to define the occurrence of (.*?)  how should make it
done?  ((.*?)){1,3}  seems not work, any method to define repeat
string using python regex?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyWart: os.path needs immediate attention!

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 5:03 PM, rantingrick  wrote:
> This thread was intended to expose another PyWart and get the
> community juices flowing. os.path is broken and cannot be repaired
> because os.path was an improper API to begin with. The only way to
> solve this problem is to introduce a new Path object.
>
> A new Path object is the answer.

http://xkcd.com/927/

> I feel Python community is in an awkward teenage stage at this point
> not really sure of it's self or direction. Living only for today with
> no ability to project the future and wasting too much time arguing
> over minutiae. We need a collective wake-up-call in the form of a slap
> on the face. We need to start making the hard choices necessary to
> clean up this library.
>
> Python3000 was only the beginning! ONLY THE BEGINNING!

Some of us have reached the level of maturity necessary to understand
that stability is valuable. Also to notice when requirements
internally conflict - how are we going to develop the One Perfect API
without spending a lot of time arguing minutiae?

One thing I have learned in life is that mature products have their
warts for a reason, and that reason is usually compatibility. That's
not necessarily a good thing, but nor is it necessarily bad. For
instance, the Python source code is managed by automake. We could save
ourselves a LOT of trouble by simply moving to the future - a future
in which Linux is the only operating system we bother with, that
64-bit hardware and 64-bit OSes are everything, and so on. Why bother
supporting the past? But that "past" is actually a huge part of the
world today, too.

Large-scale adoption is an incredibly valuable thing, and you are
narrowing your adoption potential considerably if you do not support
these things. As an example, have you ever noticed how horribly
useless and skeletal the Python documentation is? Neither have I. It's
used by so many people that it gets eyeballs, and therefore time, to
fix up its failings. Compare with Pike, a much more obscure language
(syntactically similar to C, but under the covers quite similar to
Python); scroll down this list of constants from its Stdio module:

http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Stdio.html

A good number of them simply say FIXME, and even those that _are_
documented have only brief explanations. For quite a few things, you
need to go direct to the language's source code. (Do a docs search for
FIXME and you'll find that this is not an isolated case.) That doesn't
happen with Python, largely a consequence (if somewhat indirectly) of
its being so widely used.

Sure you can make your life easier. But is it really better?

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


Early binding as an option

2011-08-02 Thread Chris Angelico
As I understand it, Python exclusively late-binds names; when you
define a function, nothing is ever pre-bound. This allows a huge
amount of flexibility (letting you "reach into" someone else's
function and change its behaviour), but it's flexibility that most
programs use seldom if at all.

First off: Is there any way to request/cause some names to be bound
early? and secondly, should there be?

Argument against: Late binding is a Good Thing, and having some things
bound early would be confusing.

Argument in favour: Efficiency is also a Good Thing, and with staples
like 'len', it's unlikely anyone will want to change them - yet the
interpreter still has to do a namespace lookup every time.

I would want the end programmer to have the ultimate power here (not
the module designer). Something along the lines of: This global name
will never change, so don't bother looking it up every time.

As an example of this difference, Pike uses early binding for some
things; when I did the perfect numbers testing in the other thread
(discussion thread, not thread of execution!), Pike performed
significantly better; I believe this is in part due to the formal
declarations of variables, and the consequential simplification of
local code, although since there are no globals being looked up here,
there's little to be gained from those.

Is this the realm of JIT compilation, or can it be done in regular CPython?

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Complex sort on big files

2011-08-02 Thread Dan Stromberg
On Tue, Aug 2, 2011 at 3:25 AM, Alistair Miles wrote:

> Hi Dan,
>
> Thanks for the reply.
>
> On Mon, Aug 1, 2011 at 5:45 PM, Dan Stromberg  wrote:
> >
> > Python 2.x, or Python 3.x?
>
> Currently Python 2.x.
>

So it sounds like you may want to move this code to 3.x in the future.


> > What are the types of your sort keys?
>
> Both numbers and strings.
>
> > If you're on 3.x and the key you need reversed is numeric, you can negate
> > the key.
>
> I did wonder about that. Would that not be doable also in Python 2.7,
> using sorted(key=...)?
>

Yes, probably, though in 2.x it's easy to just use __cmp__.


> > If you're on 2.x, you can use an object with a __cmp__ method to compare
> > objects however you require.
>
> OK, right.
>
> Looking at the HowTo/Sorting again [1] and the bit about cmp_to_key,
> could you also achieve the same effect by returning a key with custom
> implementations of rich comparison functions?
>

Yes, I believe so,. though then you're back to a bit slower sorting.


> > You probably should timsort the chunks (which is the standard
> list_.sort() -
> > it's a very good in-memory sort), and then merge them afterward using the
> > merge step of merge sort.
>
> Yes, that's what I understood by the activestate recipe [2].
>
> So I guess my question boils down to, how do you do the merge step for
> a complex sort? (Assuming each chunk had been completely sorted
> first.)
>

Open one file for each sorted list on disk.  Save these files in a
file_list, indexed by an integer (of course, since this is a list) - doing
this means you have a mapping from an integer to a file.

Read the first value from each such file.  Note that one of these values
will become the least value in the result list - the least value in the
result file has to be the least value from one of the sorted sublists.

Fill some datastructure (list, heap, treap, etc.) with tuples
(value_from_sorted_list1, index_into_file_dict1), (value_from_sorted_list2,
index_into_file_dict2), etc.  Then pull  out  the least value from the
list/heap/treap, and add it to the result file,  and read another value from
the _same_ file (which you can get back from your file_list) you got back
and add a replacement tuple to your list/heap/treap.

In this way, each step of the way, when merging n files, your
list/heap/treap always has n values - until you reach the end of one or more
of your sorted files.  Then you always have less than n.

When you hit EOF on some file, you just don't add anything back for that
file.  Resist the temptation to remove it from your file_list - that'll mess
up the file indexes in the list/heap/treap.

Continue as long as you have an open file.

We're using a tuple instead of a class, because comparing tuples tends to be
pretty fast.  We're using file indexes instead of files, because I don't
want to think about what happens when you compare two files with the <
operator.  :)

Maybe the answer is also to construct a key with custom implementation
> of rich comparisons?
>

Could be related.


> Now I'm also wondering about the best way to sort each chunk. The
> examples in [1] of complex sorts suggest the best way to do it is to
> first sort by the secondary key, then sort by the primary key, relying
> on the stability of the sort to get the desired outcome. But would it
> not be better to call sorted() once, supplying a custom key function?
>

Function calls in CPython are expensive, so if you can sort (twice) using a
key (decorate, sort, undecorate), that's probably a performance win.  It's
kind of a Software Engineering lose though, so using a class with one or
more comparison methods can be a benefit too.


> (As an aside, at the end of the section in [1] on Sort Stability and
> Complex sorts, it says "The Timsort algorithm used in Python does
> multiple sorts efficiently because it can take advantage of any
> ordering already present in a dataset." - I get that that's true, but
> I don't see how that's relevant to this strategy for doing complex
> sorts. I.e., if you sort first by the secondary key, you don't get any
> ordering that's helpful when you subsequently sort by the primary key.
> ...?)
>

Agreed.  Sounds irrelevant.


> (Sorry, another side question, I'm guessing reading a chunk of data
> into a list and using Timsort, i.e., calling list.sort() or
> sorted(mylist), is quicker than using bisect to keep the chunk sorted
> as you build it?)
>

Oh yeah.  The bisection itself  is just O(logn), but inserting into the
correct place afterward is O(n).  So unless n is tiny, you're better off
with a heap or treap or red-black tree.  It's the difference between an
overall O(nlogn) algorithm and an O(n^2) algorithm - for large n, it's a big
difference.


> > heapq's not unreasonable for the merging, but I think it's more common to
> > use a short list.
>
> Do you mean a regular Python list, and calling min()?
>

Yes.  Depending on how many lists you wish to merge at a time.

Note that in terms of file I/O, it may

Re: How to define repeated string when using the re module?

2011-08-02 Thread MRAB

On 02/08/2011 17:20, smith jack wrote:

if it's for a single character, this should be very easy, such as
c{m,n}   the occurrence of c is between m and n,

if i want to define the occurrence of (.*?)   how should make it
done?  ((.*?)){1,3}  seems not work, any method to define repeat
string using python regex?


Why do you think it's not working?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Thomas Jollans
On 02/08/11 18:55, Chris Angelico wrote:
> As I understand it, Python exclusively late-binds names; when you
> define a function, nothing is ever pre-bound. This allows a huge
> amount of flexibility (letting you "reach into" someone else's
> function and change its behaviour), but it's flexibility that most
> programs use seldom if at all.
> 
> First off: Is there any way to request/cause some names to be bound
> early? and secondly, should there be?
> 
> Argument against: Late binding is a Good Thing, and having some things
> bound early would be confusing.

Also, simplicity is a good thing, and Pytho name binding and scoping is
very simple.

> 
> Argument in favour: Efficiency is also a Good Thing, and with staples
> like 'len', it's unlikely anyone will want to change them - yet the
> interpreter still has to do a namespace lookup every time.
> 
> I would want the end programmer to have the ultimate power here (not
> the module designer). Something along the lines of: This global name
> will never change, so don't bother looking it up every time.

What you can do, as a module author, is bind builtins/globals as default
function arguments, making them local variables.

def uses_len (arg0, len=len):
# you could probably write a decorator that strips away this kind of
# "builtin" argument

As the module user, there's no way AFAIK. However, is this really
useful? I suppose it would be possible to introduce a kind of "constant
globals" namespace that a JIT compiler could then use to optimise, but
how much would this help? If the content of this namespace is unknown at
the time the module is compiled (and it would be), then only a JIT
compiler could use the information - which doesn't mean it couldn't
improve performance significantly in some cases. More importantly, how
can the module user know which globals should be declared constant? Only
the module author can know which names are looked up often enough for
optimisation to make sense, or even which names are looked up at all.

I think this effect can only, and best, be achieved in Python by binding
relevant globals as locals in the module, and documenting which these
are for the benefit of users who might want to change builtins, and
would have to do it before importing the module.

Thomas

> 
> As an example of this difference, Pike uses early binding for some
> things; when I did the perfect numbers testing in the other thread
> (discussion thread, not thread of execution!), Pike performed
> significantly better; I believe this is in part due to the formal
> declarations of variables, and the consequential simplification of
> local code, although since there are no globals being looked up here,
> there's little to be gained from those.
> 
> Is this the realm of JIT compilation, or can it be done in regular CPython?
> 
> Chris Angelico

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to define repeated string when using the re module?

2011-08-02 Thread Chris Rebert
On Tue, Aug 2, 2011 at 9:20 AM, smith jack  wrote:
> if it's for a single character, this should be very easy, such as
> c{m,n}   the occurrence of c is between m and n,
>
> if i want to define the occurrence of (.*?)  how should make it
> done?  ((.*?)){1,3}  seems not work, any method to define repeat
> string using python regex?

Don't parse HTML using regexes; use an HTML parser!
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Here's a survey of Python HTML parsing libraries:
http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Cheers,
Chris
--
http://rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: 'Use-Once' Variables and Linear Objects

2011-08-02 Thread Chris Rebert
On Tue, Aug 2, 2011 at 7:19 AM, Neal Becker  wrote:
> I thought this was an interesting article
>
> http://www.pipeline.com/~hbaker1/Use1Var.html

See also:
http://en.wikipedia.org/wiki/Uniqueness_type

Cheers,
Chris
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 6:18 PM, Thomas Jollans  wrote:
> I suppose it would be possible to introduce a kind of "constant
> globals" namespace that a JIT compiler could then use to optimise, but
> how much would this help?

Surely it must help a lot; looking up names is string operations. If
"len" could be replaced with "@10794928" where 10794928 is the actual
address of the len object, then it'd be doing no work that isn't
normally done, and would go straight to the object and call it.

But I don't really know how to go about profiling this to be sure. Any ideas?

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


how to sort a hash list without generating a new object?

2011-08-02 Thread smith jack
the source code is as follows

x={}
x['a'] = 11
x['c'] = 19
x['b'] = 13
print x

tmp = sorted(x.items(), key = lambda x:x[0])#  increase order by
default, if i want to have a descending order, what should i do?
# after sorted is called, a list will be generated, and the hash list
x is not changed at all, how to convert x to a sorted hash list
without generating a new object?
print tmp
print x
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Chris Rebert
On Tue, Aug 2, 2011 at 9:55 AM, Chris Angelico  wrote:
> As I understand it, Python exclusively late-binds names; when you
> define a function, nothing is ever pre-bound. This allows a huge
> amount of flexibility (letting you "reach into" someone else's
> function and change its behaviour), but it's flexibility that most
> programs use seldom if at all.
>
> First off: Is there any way to request/cause some names to be bound
> early?

Nope. Even if you "freeze" a variable's value via a closure, I don't
believe it gets particularly optimized.


> As an example of this difference, Pike uses early binding for some
> things; when I did the perfect numbers testing in the other thread
> (discussion thread, not thread of execution!), Pike performed
> significantly better; I believe this is in part due to the formal
> declarations of variables, and the consequential simplification of
> local code, although since there are no globals being looked up here,
> there's little to be gained from those.

"in part". There are very likely additional factors at work here.
Also, have you looked at Cython? I would guess that it can bypass a
lot of the late binding.

> Is this the realm of JIT compilation, or can it be done in regular CPython?

Smart enough JITers can infer that late binding is not being exploited
for certain variables and thus optimize them accordingly. Look how
fast some of the JavaScript VMs are, despite JavaScript also being
highly dynamic.

The CPython devs are reluctant to accept the increased complexity and
size of a JIT engine (see Unladen Swallow).
Anything else would probably involve a similarly unpalatable level of
complexity or require changing Python-the-language.

I'm pretty sure optional early binding has been proposed in the past;
try trawling the list archives.

Cheers,
Chris
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Thomas Jollans
On 02/08/11 19:42, Chris Angelico wrote:
> On Tue, Aug 2, 2011 at 6:18 PM, Thomas Jollans  wrote:
>> I suppose it would be possible to introduce a kind of "constant
>> globals" namespace that a JIT compiler could then use to optimise, but
>> how much would this help?
> 
> Surely it must help a lot; looking up names is string operations. If
> "len" could be replaced with "@10794928" where 10794928 is the actual
> address of the len object, then it'd be doing no work that isn't
> normally done, and would go straight to the object and call it.
> 
> But I don't really know how to go about profiling this to be sure. Any ideas?

Well, you could run a direct comparison of one function where len is
global, and an identitical function where len is local (which does not
do a dict lookup, am I right?)

Of course the global dict lookup takes time, but I doubt that many
modules look up the same name often enough for it to actually be
significant.

I don't really know enough about either profiling or the CPython
interpreter, but I assume there's some function that's called to look up
globals; you could profile the Python interpreter (don't ask me with
which tool) and see how much time that function uses.

Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to sort a hash list without generating a new object?

2011-08-02 Thread Thomas Jollans
On 02/08/11 20:02, smith jack wrote:
> the source code is as follows
> 
> x={}
> x['a'] = 11
> x['c'] = 19
> x['b'] = 13
> print x
> 
> tmp = sorted(x.items(), key = lambda x:x[0])#  increase order by
> default, if i want to have a descending order, what should i do?
> # after sorted is called, a list will be generated, and the hash list
> x is not changed at all, how to convert x to a sorted hash list
> without generating a new object?
> print tmp
> print x

Python dictionaries are never ordered. Perhaps the
collections.OrderedDict class can do what you're looking for.

http://docs.python.org/py3k/library/collections.html#collections.OrderedDict
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Spam

2011-08-02 Thread David
If you click the "more options" link, there is an option in the sub-
menu to report a post as spam.  You can also forward it along with the
offending e-mail address to [email protected]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Alain Ketterlin
Chris Angelico  writes:

> As I understand it, Python exclusively late-binds names; when you
> define a function, nothing is ever pre-bound. This allows a huge
> amount of flexibility (letting you "reach into" someone else's
> function and change its behaviour), but it's flexibility that most
> programs use seldom if at all.

I agree with you on your last remark, but unfortunately it's part of the
language. Therefore, there *are* programs that rely on the ability to
rebind 'let' and others. Changing this would require changing the
language, basically turning some builtins into keywords.

(BTW, the dynamic binding also has implications for security.)

[...]
> Argument in favour: Efficiency is also a Good Thing, and with staples
> like 'len', it's unlikely anyone will want to change them - yet the
> interpreter still has to do a namespace lookup every time.

Yes, and it can't do common subexpression elimination, code hoisting,
etc. Basically, nothing can be optimized, and the interpreter has to
execute bytecode that exactly represents source code.

> I would want the end programmer to have the ultimate power here (not
> the module designer). Something along the lines of: This global name
> will never change, so don't bother looking it up every time.

Maybe some module could provide specialized, "use-at-your-own-risk"
versions of some functions/operators. An example is '+' which can mean
so many things that any use of it probably spends more time finding the
right version than actually doing the work.

The problem with such pre-bound identifiers is that anybody with
performance problems would start peppering his/her code with things like
plus_float_float(x,y), leading to unreadable code, to all kinds of
strange errors, etc. Nobody really wants this probably.

[...]
> Is this the realm of JIT compilation, or can it be done in regular
> CPython?

No, it's a matter of language definition. A JIT can't do much here
(actually jitting is almost orthogonal to that question), at least it
couldn't do much better than CPython. It just has to go through all the
lookups. IIRC, unladden-swallow has tried the JIT route, using LLVM as
the backend. It seems they gave up.

-- Alain.
-- 
http://mail.python.org/mailman/listinfo/python-list


ANN: eGenix mx Base Distribution 3.2.1 (mxDateTime, mxTextTools, etc.)

2011-08-02 Thread eGenix Team: M.-A. Lemburg


ANNOUNCING

   eGenix.com mx Base Distribution

  Version 3.2.1 for Python 2.4 - 2.7

   Open Source Python extensions providing
 important and useful services
for Python programmers.

This announcement is also available on our web-site for online reading:
http://www.egenix.com/company/news/eGenix-mx-Base-Distribution-3.2.1-GA.html



ABOUT

The eGenix.com mx Base Distribution for Python is a collection of
professional quality software tools which enhance Python's usability
in many important areas such as fast text searching, date/time
processing and high speed data types.

The tools have a proven record of being portable across many Unix and
Windows platforms. You can write applications which use the tools on
Windows and then run them on Unix platforms without change due to the
consistent platform independent interfaces.

Contents of the distribution:

 * mxDateTime - Easy to use Date/Time Library for Python
 * mxTextTools - Fast Text Parsing and Processing Tools for Python
 * mxProxy - Object Access Control for Python
 * mxBeeBase - On-disk B+Tree Based Database Kit for Python
 * mxURL - Flexible URL Data-Type for Python
 * mxUID - Fast Universal Identifiers for Python
 * mxStack - Fast and Memory-Efficient Stack Type for Python
 * mxQueue - Fast and Memory-Efficient Queue Type for Python
 * mxTools - Fast Everyday Helpers for Python

All available packages have proven their stability and usefulness in
many mission critical applications and various commercial settings all
around the world.

For more information, please see the distribution page:

http://www.egenix.com/products/python/mxBase/



NEWS

The 3.2.1 release of the eGenix mx Base Distribution is the latest
release of our open-source Python extensions.

The new patch-level version includes a few important fixes:

* Fixed a segfault in mxDateTime.
* Fixed a possible buffer overflow in the mxDebugPrintf()
  function.
* Fixed a problem in mxSetup mx_autoconf: Python.h was not
  found by some tests.

If you are upgrading from eGenix mx Base 3.1.x, please also see the
eGenix mx Base Distribution 3.2.0 release notes for details on what
has changed and which new features are available:

http://www.egenix.com/company/news/eGenix-mx-Base-Distribution-3.2.0-GA.html

As always, we are providing pre-built binaries for all common
platforms: Windows 32/64-bit, Linux 32/64-bit, FreeBSD 32/64-bit, Mac
OS X 32/64-bit. Source code archives are available for installation on
all other Python platforms, such as Solaris, AIX, HP-UX, etc.

To simplify installation in Zope/Plone and other egg-based systems, we
have also precompiled egg distributions for all platforms. These are
available on our own PyPI-style index server for easy and automatic
download.

Whether you are using a pre-built package or the source distribution,
installation is a simple "python setup.py install" command in all
cases. The only difference is that the pre-built packages do not
require a compiler or the Python development packages to be installed.

For a list of changes, please refer to the eGenix mx Base Distribution
change log at

http://www.egenix.com/products/python/mxBase/changelog.html

and the change logs of the various included Python packages.



DOWNLOADS

The download archives and instructions for installing the packages can
be found on the eGenix mx Base Distribution page:

http://www.egenix.com/products/python/mxBase/



LICENSE

The eGenix mx Base package is distributed under the eGenix.com Public
License 1.1.0 which is an Open Source license similar to the Python
license. You can use the packages in both commercial and non-commercial
settings without fee or charge.

The package comes with full source code



SUPPORT

Commercial support for this product is available from eGenix.com.
Please see

http://www.egenix.com/services/support/

for details about our support offerings.

Enjoy,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 02 2011)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Lang

Re: Early binding as an option

2011-08-02 Thread Teemu Likonen
* 2011-08-02T11:03:24-07:00 * Chris Rebert wrote:

> Smart enough JITers can infer that late binding is not being exploited
> for certain variables and thus optimize them accordingly. Look how
> fast some of the JavaScript VMs are, despite JavaScript also being
> highly dynamic.

Or Common Lisp. It has "packages" (namespaces for symbols). All the
standard symbols are in the COMMON-LISP (CL) package which is locked and
can't be modified. When the Lisp reader is parsing the source code
character stream it knows which package symbols belong to. So, for
example, at compile time there is the information that symbol + refers
to the standard CL:+ add function. There's no need to be smart.

The CL package is static and set to stone but programmer is free to
control symbols in other packages. For example, in some other package
programmer can import all symbols from the standard CL package except
shadow the CL:+ symbol. Then she can write her own extended version of +
function (which possibly falls back to CL:+ in some situations). The
dynamic effect with symbols is achieved through namespace tricks and yet
the compiler can always trust the symbols of the CL package.

I'm too much a beginner to tell if similar symbol concept is applicable
to Python.
-- 
http://mail.python.org/mailman/listinfo/python-list


what is the advantage of Django when comparing with LAMP and J2EE platform?

2011-08-02 Thread smith jack
There are so many choice to do the same thing, so is there any special
advantage Django brings to user?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: what is the advantage of Django when comparing with LAMP and J2EE platform?

2011-08-02 Thread Redcat
On Wed, 03 Aug 2011 03:28:14 +0800, smith jack wrote:

> There are so many choice to do the same thing, so is there any special
> advantage Django brings to user?

The ability to code in Python instead of Java is the biggest one to me.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Terry Reedy

On 8/2/2011 12:55 PM, Chris Angelico wrote:

As I understand it, Python exclusively late-binds names; when you
define a function, nothing is ever pre-bound.


By 'pre-bound' you presumably mean bound at definition time rather than 
call time. Default arg objects *are* pre-computed and pre-bound to 
internal slots at definition time.



Argument in favour: Efficiency is also a Good Thing, and with staples
like 'len', it's unlikely anyone will want to change them - yet the
interpreter still has to do a namespace lookup every time.


Three approaches to machine efficiency.

1. Better algorithm: Python's people efficiency makes this easier than 
in most other languages.


2. Hand-optimize the code that actually chew up time (as revealed by 
profiler). This often means removing repeated expressions *and* global 
names from inner loops.


_len = len
for line in somefile:
n = _len(line)

*might* give a worthwhile speedup in a function if not too much else 
happends in the loop. But the CPython global name lookup code (in C) has 
been thoroughly examined and optimized as best as several developers 
could think of.


3. Convert critical code to native language (or C).

The idea of 'early binding' comes up periodically but I do not remember 
many concrete proposals.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


m2crypto https, xmlrpc, keep_alive

2011-08-02 Thread Gelonida N
Hi,


Just started playing with m2cryptos xmlrpc

The code I'm is:

import xmlrpclib
from M2Crypto.m2xmlrpclib import Server, SSL_Transport
from M2Crypto.SSL.Context import Context

ctx = Context()
# modify context
svr = Server(rpc_url, SSL_Transport(ctx), encoding='utf-8')
svr.mymethod1(1)
svr.mynethod2(2)


What I wondered is following.

Will each RPC call go through all the ssl negotiation with certificate
verification, etc?

If yes, is there something like the concept of a keep_alive connection,
that would avoid this overhead?

If yes.
What would I have to change in my above code.


Thanks in advance for some ideas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: what is the advantage of Django when comparing with LAMP and J2EE platform?

2011-08-02 Thread Tim Johnson
* smith jack  [110802 11:37]:
> There are so many choice to do the same thing, so is there any special
> advantage Django brings to user?

Django is a python framework, J2EE is a java platform (my apologies
if I use 'framework' incorrectly). Our customers want PHP,perl or python,
not java. 

The definition for LAMP given at
http://en.wikipedia.org/wiki/LAMP_(software_bundle) - for what it is
worth includes python and defines LAMP as sort of generic (as I read
it). 

Thus django *could* be considered a LAMP bundle, perhaps.

-- 
Tim 
tim at johnsons-web dot com or akwebsoft dot com
http://www.akwebsoft.com
-- 
http://mail.python.org/mailman/listinfo/python-list


m2crypto https, xmlrpc, and cookies

2011-08-02 Thread Gelonida N
Hi,


Just started playing with m2cryptos xmlrpc

The code I'm using is:

import xmlrpclib
from M2Crypto.m2xmlrpclib import Server, SSL_Transport
from M2Crypto.SSL.Context import Context

ctx = Context()
# modify context
svr = Server(rpc_url, SSL_Transport(ctx), encoding='utf-8')
svr.mymethod1(1)
svr.mynethod2(2)


What I wondered is following:

When using my web browser certain cookies are sent with the request.

Is there any way to change my above code such, that cookies would be
collected and sent?


Thanks in advance for any pointers.


-- 
http://mail.python.org/mailman/listinfo/python-list


m2crypto https, xmlrpc and ignore server name mismatch

2011-08-02 Thread Gelonida N
Hi,


Just started playing with m2crypto's xmlrpc

The code I'm using is:

import xmlrpclib
from M2Crypto.m2xmlrpclib import Server, SSL_Transport
from M2Crypto.SSL.Context import Context

ctx = Context()
# modify context
svr = Server(rpc_url, SSL_Transport(ctx), encoding='utf-8')
svr.mymethod1(1)
svr.mymethod2(2)


What I wondered is following:

For testing I would like to ignore the fact, that the hostname in the
request is different from the hostname in the server certificate.

On the other hand I would like to verify that the server name from the
server's certidicate matches a certain criteria.

What would be the code to do this.

import xmlrpclib
from M2Crypto.m2xmlrpclib import Server, SSL_Transport
from M2Crypto.SSL.Context import Context


def check_func(server_certificate):
 hostname = get_hostname_from_cert()
 return hostname.endswith('.mydomain.com')

ctx = Context()
# modify context
# add code to ignore server name mismatch
# add code to call check_func. accept request only if it returns True
svr = Server(rpc_url, SSL_Transport(ctx), encoding='utf-8')
svr.mymethod1(1)
svr.mymethod2(2)


Thanks in advance for any pointers.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 9:23 PM, Terry Reedy  wrote:
> On 8/2/2011 12:55 PM, Chris Angelico wrote:
>>
>> As I understand it, Python exclusively late-binds names; when you
>> define a function, nothing is ever pre-bound.
>
> By 'pre-bound' you presumably mean bound at definition time rather than call
> time. Default arg objects *are* pre-computed and pre-bound to internal slots
> at definition time.

Of course; that's a different issue altogether. No, I'm talking about
the way a tight loop will involve repeated lookups for the same name.

Unfortunately, there is no way - by definition - to guarantee that a
binding won't change. Even in the example of getting the lengths of
lines in a file, it's entirely possible for __len__ to rebind the
global name "len" - so you can't rely on the repeated callings of
len() to be calling the same function.

But who WOULD do that? It's somewhat ridiculous to consider, and
there's a huge amount of code out there that does these repeated calls
and does not do stupid rebindings in the middle. So Python permits
crazy behaviour at the cost of the performance of normal behaviour.

With the local-variable-snapshot technique ("len = len"), can anything
be optimized, since the parser can guarantee that nothing ever
reassigns to it? If not, perhaps this would be a place where something
might be implemented:

@const(len,max) # maybe this
def maxline(f):
   @const len,max # or this
   n = 0
   for line in somefile:
   n = max(n,len(line))
   return n

Some notation like this could tell the interpreter, "I don't expect
'len' or 'max' to be rebound during the execution of this function.
You're free to produce wrong results if either is."

So... Would this potentially produce wrong results? Would it be of any
use, or would its benefit be only valued in times when the whole
function needs to be redone in C?

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Gelonida N
On 08/03/2011 12:08 AM, Chris Angelico wrote:
> With the local-variable-snapshot technique ("len = len"), can anything
> be optimized, since the parser can guarantee that nothing ever
> reassigns to it? If not, perhaps this would be a place where something
> might be implemented:
> 
> @const(len,max) # maybe this
> def maxline(f):
>@const len,max # or this
>n = 0
>for line in somefile:
>n = max(n,len(line))
>return n
> 
> Some notation like this could tell the interpreter, "I don't expect
> 'len' or 'max' to be rebound during the execution of this function.
> You're free to produce wrong results if either is."
> 
> So... Would this potentially produce wrong results? Would it be of any
> use, or would its benefit be only valued in times when the whole
> function needs to be redone in C?
> 
> Chris Angelico

I think the idea of having pragmas / directives to tell the interpreter
that certain symbols could be bound early is intersting and might help
optimizing some inner loops without having to explicitely assign to
local vars.

On the other hand: It might be interesting, that the early binding would
just take place when python is invoked with -O

Thus you could still do a lot of debug / tracing magic during
development and only production code would do the late binding.




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Chris Angelico
On Tue, Aug 2, 2011 at 11:21 PM, Gelonida N  wrote:
> On the other hand: It might be interesting, that the early binding would
> just take place when python is invoked with -O
>

This could be an excellent safety catch, but on the other hand, it
might destroy all value of the feature - once again, it would be
optimizing in the sole case where the code is probably better
rewritten in C.

Or would this be a sort of "half-way house" - this is where we need
more performance, let's spend two minutes tweaking it in Python rather
than dropping to C - to get some of the performance gains?

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Gelonida N
On 08/03/2011 12:26 AM, Chris Angelico wrote:
> On Tue, Aug 2, 2011 at 11:21 PM, Gelonida N  wrote:
>> On the other hand: It might be interesting, that the early binding would
>> just take place when python is invoked with -O
>>
> 
> This could be an excellent safety catch, but on the other hand, it
> might destroy all value of the feature - once again, it would be
> optimizing in the sole case where the code is probably better
> rewritten in C.
> 
> Or would this be a sort of "half-way house" - this is where we need
> more performance, let's spend two minutes tweaking it in Python rather
> than dropping to C - to get some of the performance gains?

Not really sure.

I would guess, that really tight inner loops should be rewritten with
Cython (never tried it though) or in C if performance is really that
critical.

On the other hand it could be nice to get a certain performance increase
with some pragmas without rendering the code completely unreadable.

I have loads of places in my code, where name lookups were required only
at compile time or at the first time when code is executed.

For every consecutive call a cached object-reference could be used.




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-02 Thread Gelonida N
On 08/03/2011 12:26 AM, Chris Angelico wrote:
> On Tue, Aug 2, 2011 at 11:21 PM, Gelonida N  wrote:
>> On the other hand: It might be interesting, that the early binding would
>> just take place when python is invoked with -O
>>
> 
> This could be an excellent safety catch, but on the other hand, it
> might destroy all value of the feature - once again, it would be
> optimizing in the sole case where the code is probably better
> rewritten in C.
> 
> Or would this be a sort of "half-way house" - this is where we need
> more performance, let's spend two minutes tweaking it in Python rather
> than dropping to C - to get some of the performance gains?

Not really sure.

I would guess, that really tight inner loops should be rewritten with
Cython (never tried it though) or in C if performance is really that
critical.

On the other hand it could be nice to get a certain performance increase
with some pragmas without rendering the code completely unreadable.

I have loads of places in my code, where name lookups were required only
at compile time or at the first time when code is executed.

For every consecutive call a cached object-reference could be used.




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What Programing Language are the Largest Website Written In?

2011-08-02 Thread Xah Lee
On Jul 31, 11:38 am, gavino  wrote:
> On Jul 13, 1:04 pm, ccc31807  wrote:
>
>
>
>
>
>
>
>
>
> > On Jul 12, 7:54 am, Xah Lee  wrote:
>
> > > maybe this will be of interest.
>
> > > 〈What Programing Language Are the Largest Website Written 
> > > In?〉http://xahlee.org/comp/website_lang_popularity.html
>
> > About five years ago, I did some pretty extensive research, and
> > concluded that the biggest sites were written serverside with JSP.
> > Obviously, this wouldn't include The Biggest site, but if you were a
> > big, multinational corporation, or listed on the NYSE, you used JSP
> > for your web programming.
>
> > I doubt very seriously PHP is used for the biggest sites -- I'd still
> > guess JSP, or maybe a MS technology (not VB), but it's only a guess.
>
> > CC.
>
> facebook is php
>
> myspace is microsoft
>
> aol was tcl and aolserver c embedding tcl interp
>
> priceline is lisp
>
> reddit is python was lisp orig
>
> amazon was perl
>
> livejournal was perl

thanks Kevin. Rarely seen you useful. :)

 Xah
-- 
http://mail.python.org/mailman/listinfo/python-list


pygtk

2011-08-02 Thread 守株待兔
please see my attachment ,which  widget the region1,region2 is?
how to make it??

re2
Description: Binary data
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Spam

2011-08-02 Thread Steven D'Aprano
David wrote:

> If you click the "more options" link, there is an option in the sub-
> menu to report a post as spam.  You can also forward it along with the
> offending e-mail address to [email protected]

What "more options" link? Are you referring to a specific program? If so,
which?

Remember, people are reading this via email: Thunderbird, Outlook, Outlook
Express, Gmail, Yahoo Mail, Kmail, mutt, and hundreds of other programs;
also via Usenet, again with multiple programs; on the web, via Google
Groups or any of a dozen or so different websites. They don't all have
a "more options" link.


-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to sort a hash list without generating a new object?

2011-08-02 Thread Chris Rebert
On Tue, Aug 2, 2011 at 11:02 AM, smith jack  wrote:
> the source code is as follows
>
> x={}
> x['a'] = 11
> x['c'] = 19
> x['b'] = 13
> print x
>
> tmp = sorted(x.items(), key = lambda x:x[0])    #  increase order by
> default, if i want to have a descending order, what should i do?

Pass reverse=True. Read the find documentation for sorted().

tmp = sorted(x.items(), key=lambda x:x[0], reverse=True)

> # after sorted is called, a list will be generated, and the hash list

It's not a hash list (http://en.wikipedia.org/wiki/Hash_list ), it's a
hash table. In Python, we call them dictionaries or dicts.

> x is not changed at all, how to convert x to a sorted hash list
> without generating a new object?

There is no such thing as a sorted hash table (unless you're using an
exotic variant).
Why do you care whether it generates a new object or not?

If you /really/ need a sorted mapping datatype, google for
"sorteddict" (which is quite distinct from OrderedDict).
Or look for a binary search tree or skip list implementation of some
sort; but these aren't commonly used in Python, so it may be hard to
find a good one.

Cheers,
Chris
--
http://rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Syntactic sugar for assignment statements: one value to multiple targets?

2011-08-02 Thread gc
Hi everyone! Longtime lurker, hardly an expert, but I've been using
Python for various projects since 2007 and love it.

I'm looking for either (A) suggestions on how to do a very common
operation elegantly and Pythonically, or (B) input on whether my
proposal is PEP-able, assuming there's no answer to A. (The proposal
is sort of like the inverse of PEP 3132; I don't think it has been
proposed before, sorry if I missed it.)

Anyway, I frequently need to initialize several variables to the same
value, as I'm sure many do. Sometimes the value is a constant, often
zero; sometimes it's more particular, such as defaultdict(list). I use
dict() below.

Target lists using comma separation are great, but they don't work
very well for this task. What I want is something like

a,b,c,d,e = *dict()

where * in this context means something like "assign separately to
all." I'm not sure that * would the best sugar for this, but the
normal meaning of * doesn't seem as if it would ever be valid in this
case, and it somehow feels right (to me, anyway).

Statements fitting the form above would get expanded during parsing to
a sequence of separate assignments (a = dict(); b = dict(); c = dict()
and so forth.) That's all there is to it. Compared to the patterns
below, it's svelte, less copy-paste-y (so it removes an opportunity
for inconsistency, where I remember to change a-d to defaultdict(list)
but forget with e), and it doesn't require me to keep count of the
number of variables I'm initializing.

This would update section 6.2 of the language reference and require a
small grammar expansion.

But: Is there already a good way to do this that I just don't know?
Below, I compare four obvious patterns, three of which are correct but
annoying and one of which is incorrect in a way which used to surprise
me when I was starting out.

# Option 1 (separate lines)
# Verbose and annoying, particularly when the varnames are long and of
irregular length

a = dict()
b = dict()
c = dict()
d = dict()
e = dict()

# Option 2 (one line)
# More concise but still pretty annoying, and hard to read (alternates
variables and assignments)

a = dict(); b = dict(); c = dict(); d = dict(); e = dict()

# Option 3 (multiple target list: this seems the most Pythonic, and is
normally what I use)
# Concise, separates variables from assignments, but somewhat
annoying; have to change individually and track numbers on both sides.

a,b,c,d,e = dict(),dict(),dict(),dict(),dict()

# Option 4 (iterable multiplication)
# Looks better, and if the dict() should be something else, you only
have to change it once, but the extra brackets are ugly and you still
have to keep count of the targets...

a,b,c,d,e = [dict()] * 5

# and it will bite you...

>>> a[1] = 1
>>> b
{1: 1}
>>> id(a) == id(b)
True

# Gotcha!

# Other forms of 4 also have this behavior:

a,b,c,d,e = ({},) * 5
>>> a[1] = 1
>>> b
{1: 1}

Alternatively, is there a version of iterable multiplication that
creates new objects rather than just copying the reference? That would
solve part of the problem, though it would still look clunky and you'd
still have to keep count.

Any thoughts? Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hardlink sub-directories and files

2011-08-02 Thread Dan Stromberg
On Tue, Aug 2, 2011 at 3:13 AM, Thomas Jollans  wrote:

> On 02/08/11 11:32, loial wrote:
> > I am trying to hardlink all files in a directory structure using
> > os.link.
> >
> > However I do not think it is possible to hard link directories ?
>

That is pretty true.  I've heard of hardlinked directories on Solaris, but
that's kind of an exception to the general rule.


> > So presumably I would need to do a mkdir for each sub-directory
> > encountered?
> > Or is there an easier way to hardlink everything in a directory
> > structure?.
> >
> > The requirement is for hard links, not symbolic links
> >
>
> Yes, you have to mkdir everything. However, there is an easier way:
>
> subprocess.Popen(['cp','-Rl','target','link'])
>
> This is assuming that you're only supporting Unices with a working cp
> program, but as you're using hard links, that's quite a safe bet, I
> should think.
>

A little more portable way:

$ cd from; find . -print | cpio -pdlv ../to
cpio: ./b linked to ../to/./b
../to/./b
cpio: ./a linked to ../to/./a
../to/./a
cpio: ./c linked to ../to/./c
../to/./c
../to/./d
cpio: ./d/1 linked to ../to/./d/1
../to/./d/1
cpio: ./d/2 linked to ../to/./d/2
../to/./d/2
cpio: ./d/3 linked to ../to/./d/3
../to/./d/3
0 blocks

However, you could do it without a shell command (IOW in pure python) using
os.path.walk().
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Syntactic sugar for assignment statements: one value to multiple targets?

2011-08-02 Thread Chris Angelico
On Wed, Aug 3, 2011 at 2:45 AM, gc  wrote:
> Anyway, I frequently need to initialize several variables to the same
> value, as I'm sure many do. Sometimes the value is a constant, often
> zero; sometimes it's more particular, such as defaultdict(list). I use
> dict() below.

If it's an immutable value (such as a constant integer), you can use
syntax similar to C's chained assignment:

a=b=c=0

If you do this with dict(), though, it'll assign the same dictionary
to each of them - not much use.

> # Option 3 (multiple target list: this seems the most Pythonic, and is
> normally what I use)
> # Concise, separates variables from assignments, but somewhat
> annoying; have to change individually and track numbers on both sides.
>
> a,b,c,d,e = dict(),dict(),dict(),dict(),dict()

I think this is probably the best option, although I would be inclined
to use dictionary-literal syntax:
a,b,c,d,e = {},{},{},{},{}

It might be possible to do something weird with map(), but I think
it'll end up cleaner to do it this way.

Chris Angelico
-- 
http://mail.python.org/mailman/listinfo/python-list


with statement and context managers

2011-08-02 Thread Steven D'Aprano
I'm not greatly experienced with context managers and the with statement, so
I would like to check my logic.

Somebody (doesn't matter who, or where) stated that they frequently use this
idiom:

spam = MyContextManager(*args)
for ham in my_iter:
with spam:
 # do stuff


but to me that looks badly wrong. Surely the spam context manager object
will exit after the first iteration, and always raise an exception on the
second? But I don't quite understand context managers enough to be sure.


I've tested it with two examples:

# Simple example using built-in file context manager.

>>> spam = open('aaa')
>>> for ham in range(5):
... with spam:
... print ham
...
0
Traceback (most recent call last):
  File "", line 2, in 
ValueError: I/O operation on closed file


# Slightly more complex example.

>>> from contextlib import closing
>>> import urllib
>>> spam = closing(urllib.urlopen('http://www.python.org'))
>>> for ham in range(5):
... with spam as page:
... print ham, sum(len(line) for line in page)
...
0 18486
1
Traceback (most recent call last):
  File "", line 3, in 
  File "", line 3, in 
  File "/usr/local/lib/python2.7/socket.py", line 528, in next
line = self.readline()
  File "/usr/local/lib/python2.7/socket.py", line 424, in readline
recv = self._sock.recv
AttributeError: 'NoneType' object has no attribute 'recv'




Am I right to expect that the above idiom cannot work? If not, what sort of
context managers do work as shown?




-- 
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to sort a hash list without generating a new object?

2011-08-02 Thread Dan Stromberg
On Tue, Aug 2, 2011 at 5:53 PM, Chris Rebert  wrote:

> On Tue, Aug 2, 2011 at 11:02 AM, smith jack  wrote:
> > the source code is as follows
> >
> > x={}
> > x['a'] = 11
> > x['c'] = 19
> > x['b'] = 13
> > print x
>
> If you /really/ need a sorted mapping datatype, google for
> "sorteddict" (which is quite distinct from OrderedDict).
> Or look for a binary search tree or skip list implementation of some
> sort; but these aren't commonly used in Python, so it may be hard to
> find a good one.
>

I've found a need for such a thing a couple of times.

Anyway, here are some other possibilities:

http://stromberg.dnsalias.org/~dstromberg/treap/
http://pypi.python.org/pypi/bintrees/0.3.0

The treap code is considered by its author (me) production-quality, has an
extensive test suite, and is known to work on CPython 2.x, CPython 3.x, PyPy
and Jython.  The CPython's can optionally be sped up with a Cython variant
of the same code (autogenerated from a single source file using m4), and
while I test on all 4 regularly, lately I mostly run it in production using
the pure python version on PyPy.

EG:

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import treap
>>> t = treap.treap()
>>> import random
>>> for i in xrange(10):
...t[random.random()] = random.random()
...
>>> print list(t.items())
[(0.049542221325585611, 0.60627903220498502), (0.26787423324282511,
0.95374362416785075), (0.45599886628328978, 0.57612454878587427),
(0.46375501394309371, 0.28130836755784228), (0.54144253493651362,
0.47941229743653202), (0.54584770558330997, 0.49062231291462766),
(0.5592476615748635, 0.39138521009523863), (0.73976131715214732,
0.99783565376628391), (0.7638117918732078, 0.55600393733208187),
(0.88094790991949967, 0.90033960217787801)]
>>>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: with statement and context managers

2011-08-02 Thread Jack Diederich
On Tue, Aug 2, 2011 at 10:15 PM, Steven D'Aprano
 wrote:
> I'm not greatly experienced with context managers and the with statement, so
> I would like to check my logic.
>
> Somebody (doesn't matter who, or where) stated that they frequently use this
> idiom:
>
> spam = MyContextManager(*args)
> for ham in my_iter:
>    with spam:
>         # do stuff
>
[snip]
> # Simple example using built-in file context manager.
>
 spam = open('aaa')
 for ham in range(5):
> ...     with spam:
> ...             print ham
> ...
> 0
> Traceback (most recent call last):
>  File "", line 2, in 
> ValueError: I/O operation on closed file

file_context = lambda: open('aaa')
for i in range(3):
with file_context():
   print "hello"

.. but if the context is short it is clearer and time saving to _not_
alias it.  If the context is sufficiently complicated then it is worth
it to make the complex code into a first class context manager -
contextlib.contextmanager makes this very easy and extremely readable.

-Jack
-- 
http://mail.python.org/mailman/listinfo/python-list


code generation

2011-08-02 Thread Rita
Hello,

This isn't much of a python question but a general algorithm question.

I plan to input the following string and I would like to generate something
like this.

input: a->(b,c)->d
output:
parent a, child b c
parent b c child d

Are there any libraries or tools which will help me evaluate items like this
better? I am mostly looking for syntax and sanity checking.

Any thoughts?





-- 
--- Get your facts first, then you can distort them as you please.--
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: code generation

2011-08-02 Thread Dan Stromberg
Perhaps:

http://code.google.com/p/python-graph/

On Tue, Aug 2, 2011 at 8:03 PM, Rita  wrote:

> Hello,
>
> This isn't much of a python question but a general algorithm question.
>
> I plan to input the following string and I would like to generate something
> like this.
>
> input: a->(b,c)->d
> output:
> parent a, child b c
> parent b c child d
>
> Are there any libraries or tools which will help me evaluate items like
> this better? I am mostly looking for syntax and sanity checking.
>
> Any thoughts?
>
>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: with statement and context managers

2011-08-02 Thread Nobody
On Wed, 03 Aug 2011 12:15:44 +1000, Steven D'Aprano wrote:

> I'm not greatly experienced with context managers and the with statement, so
> I would like to check my logic.
> 
> Somebody (doesn't matter who, or where) stated that they frequently use this
> idiom:
> 
> spam = MyContextManager(*args)
> for ham in my_iter:
> with spam:
>  # do stuff
> 
> 
> but to me that looks badly wrong. Surely the spam context manager object
> will exit after the first iteration, and always raise an exception on the
> second? But I don't quite understand context managers enough to be sure.

It depends upon the implementation of MyContextManager. If it's
implemented using the contextlib.contextmanager decorator, then you're
correct: you can only use it once. OTOH, if you implement your own class
with __enter__ and __exit__ methods, you can use the same context manager
object multiple times.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Hardlink sub-directories and files

2011-08-02 Thread Kushal Kumaran
On Wed, Aug 3, 2011 at 7:29 AM, Dan Stromberg  wrote:
>
> On Tue, Aug 2, 2011 at 3:13 AM, Thomas Jollans  wrote:
>>
>> On 02/08/11 11:32, loial wrote:
>> > I am trying to hardlink all files in a directory structure using
>> > os.link.
>> >
>> > However I do not think it is possible to hard link directories ?
>
> That is pretty true.  I've heard of hardlinked directories on Solaris, but
> that's kind of an exception to the general rule.
>

In APUE, Richard Stevens says only root could do this, if it is
supported by the system at all.  In a footnote, he additionally
mentions he screwed up his filesystem by creating a loop of hardlinked
directories while writing that section of the book.

I suppose it is a good thing systems don't allow that now.

-- 
regards,
kushal
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PyWart: os.path needs immediate attention!

2011-08-02 Thread alex23
Andrew Berg  wrote:
> He hasn't replied to his last two troll threads, though. It does seem
> odd to write a wall of text and then not respond to replies. To be fair,
> though, most replies either mock him or point out that he's a troll. :D

His recent rants do seem a lot more Xah-Lee-like; a huge nonsensical
diatribe that badly applies logic to try and mask its emotive agenda.
A lot of the same "great mind struggling against the tyranny of
idiots" subtext too.

Not so much swearing though.

-- 
http://mail.python.org/mailman/listinfo/python-list