Re: [Python-Dev] Python Benchmarks

2006-06-03 Thread john . m . camara
Here are my suggestions:

- While running bench marks don't listen to music, watch videos, use the 
keyboard/mouse, or run anything other than the bench mark code.  Seams like 
common sense to me.

- I would average the timings of runs instead of taking the minimum value as 
sometimes bench marks could be running code that is not deterministic in its 
calculations (could be using random numbers that effect convergence).

- Before calculating the average number I would throw out samples outside 3 
sigmas (the outliers).  This would eliminate the samples that are out of wack 
due to events that are out of our control.  To use this approach it would be 
necessary to run some minimum number of times.  I believe 30-40 samples would 
be necessary but I'm no expert in statistics.  I base this on my recollection  
of a study on this I did some time in the late 90s.  I use to have a better 
feel for the number of samples that is required based on the number of sigmas 
that is used to determine the outliers but I have to confess that I just 
normally use a minimum of 100 samples to play it safe.  I'm sure with a little 
experimentation with bench marks the proper number of samples could be 
determined.

Here is a passage I found at 
http://www.statsoft.com/textbook/stbasic.html#Correlationsf that is related.

'''Quantitative Approach to Outliers. Some researchers use quantitative methods 
to exclude outliers. For example, they exclude observations that are outside 
the range of �2 standard deviations (or even �1.5 sd's) around the group or 
design cell mean. In some areas of research, such "cleaning" of the data is 
absolutely necessary. For example, in cognitive psychology research on reaction 
times, even if almost all scores in an experiment are in the range of 300-700 
milliseconds, just a few "distracted reactions" of 10-15 seconds will 
completely change the overall picture. Unfortunately, defining an outlier is 
subjective (as it should be), and the decisions concerning how to identify them 
must be made on an individual basis (taking into account specific experimental 
paradigms and/or "accepted practice" and general research experience in the 
respective area). It should also be noted that in some rare cases, the relative 
frequency of outliers across a number of groups or cells!
  of a d
esign can be subjected to analysis and provide interpretable results. For 
example, outliers could be indicative of the occurrence of a phenomenon that is 
qualitatively different than the typical pattern observed or expected in the 
sample, thus the relative frequency of outliers could provide evidence of a 
relative frequency of departure from the process or phenomenon that is typical 
for the majority of cases in a group.'''

Now I personally feel that using 1.5 or 2 sigma approach is rather loose for 
the case of bench marks and the suggestion I gave of 3 might be too tight.  
From experimentation we might find that 2.5 is more appropriate. I usually use 
this approach while reviewing data obtained by fairly accurate sensors so being 
being conservative using 3 sigmas works well for these cases.

The last statement in the passage is worthy to note as a high ratio of outliers 
could be used as an indication that the bench mark results for a particular run 
are invalid.  This could be used to throw out bad results due to some one 
starting to listen to music while the bench marks are running, anti virus 
software starts to run, etc.

- Another improvement to bench marks can be obtained when both the old and new 
code is available to be benched mark together.  By running the bench marks of 
both codes together we could eliminate effects of noise if we assume noise at a 
given point of time would be applied to both sets of code.  Here is a modified 
version of the code that Andrew wrote previously to show this clearer than my 
words.

def compute_old():
x = 0
for i in range(1000):
for j in range(1000):
x = x + 1

def compute_new():
x = 0
for i in range(1000):
for j in range(1000):
x += 1

def bench():
t1 = time.clock()
compute_old()
t2 = time.clock()
compute_new()
t3 = time.clock()
return t2-t1, t3-t2

times_old = []
times_new = []
for i in range(1000):
time_old, time_new = bench()
times_old.append(time_old)
times_new.append(time_new)

John
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev Digest, Vol 27, Issue 44

2005-10-11 Thread john . m . camara

 
> Date: Tue, 11 Oct 2005 09:51:06 -0400 > From: Tim Peters <[EMAIL PROTECTED]>> Subject: Re: [Python-Dev] PythonCore\CurrentVersion > To: Martin v. L?wis <[EMAIL PROTECTED]>> Cc: python-dev@python.org > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=ISO-8859-1 > > [Tim Peters] > >>> never before this year -- maybe sys.path _used_ to contain the current > >>> directory on Linux?). > > [Fred L. Drake, Jr.] > >> It's been a long time since this was the case on Unix of any variety; I > >> *think* this changed to the current state back before 2.0. > > [Martin v. L?wis] > > Please check again: > > > > [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 > > Type "help", "copyright", "credits" or "license" !
 for more information. > > >>> import sys > > >>> sys.path > > ['', '/usr/lib/python23.zip', '/usr/lib/python2.3', > > '/usr/lib/python2.3/plat-linux2', '/usr/lib/python2.3/lib-tk', > > '/usr/lib/python2.3/lib-dynload', > > '/usr/local/lib/python2.3/site-packages', > > '/usr/lib/python2.3/site-packages', > > '/usr/lib/python2.3/site-packages/Numeric', > > '/usr/lib/python2.3/site-packages/gtk-2.0', '/usr/lib/site-python'] > > > > We still have the empty string in sys.path, and it still > > denotes the current directory. > > Well, that's in interactive mode, and I see sys.path[0] == "" on both > Windows and Linux then. I don't see "" in sys.path on either box in > batch mode, although I do see the absolutized path to the current > directory in sys.path in batch mode on Windows but not on Linux -- but &!
 gt; Mark Hammond says he doesn't see (any form of) the current directo
ry > in sys.path in batch mode on Windows. > > It's a bit confusing ;-) > Been bit by this in the past.  On windows, it's a relative path in interactive mode and absolute path in non-interactive mode.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)

2005-10-12 Thread john . m . camara
Greg Ewing wrote:

> 
> Guido van Rossum wrote:
> 
> > I see no need. Code that *doesn't* need Queue but does use threading
> > shouldn't have to pay for loading Queue.py.
> 
> However, it does seem awkward to have a whole module
> providing just one small class that logically is so
> closely related to other threading facilities.
> 
> What we want in this kind of situation is some sort
> of autoloading mechanism, so you can import something
> from a module and have it trigger the loading of another
> module behind the scenes to provide it.
> 

Bad idea unless it is tied to a namespace.  So that users knows where this 
auto-loaded functionality is coming from.  Otherwise it's just as bad as 'from 
xxx import *'.

John M. Camara
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)

2005-10-12 Thread john . m . camara
> > Guido van Rossum writes:
> > Code that *doesn't* need Queue but does use threading
> > shouldn't have to pay for loading Queue.py.
> 
> Greg Ewing responds:
> > What we want in this kind of situation is some sort
> > of autoloading mechanism, so you can import something
> > from a module and have it trigger the loading of another
> > module behind the scenes to provide it.
> 
> John Camera comments:
> > Bad idea unless it is tied to a namespace.  So that users knows
> > where this auto-loaded functionality is coming from.  Otherwise
> > it's just as bad as 'from xxx import *'.
> 
> Michael Chermside comments:
> John, I think what Greg is suggesting is that we include Queue
> in the threading module, but that we use a Clever Trick(TM) to
> address Guido's point by not actually loading the Queue code
> until the first time (if ever) that it is used.
> 
> I'm not familiar with the clever trick Greg is proposing, but I
> do agree that _IF_ everything else were equal, then Queue seems
> to belong in the threading module. My biggest reason is that I
> think anyone who is new to threading probably shouldn't use any
> communication mechanism OTHER than Queue or something similar
> which has been carefully designed by someone knowlegable.
> 
I guess from Greg’s comments I’m not sure if he wants to

import threading

and as a result

‘Queue’ becomes available in the local namespace and bound/loaded when it is 
first needed and thus saves himself from typing ‘import Queue’ immediately 
after ‘import threading’

or

Queue becomes part of the threading namespace and bound/loaded when it is first 
needed.  Queue then becomes accessible through ‘threading.Queue’

When Greg says

> However, it does seem awkward to have a whole module
> providing just one small class that logically is so
> closely related to other threading facilities.

It sounds like he feels Queue should just be part of threading but queues can 
be used in other contexts besides threading.  So having separate modules is a 
good thing.

The idea of delaying an import until it’s needed sounds like a great idea and 
having built in support for this would be great.  Here are 2 possible 
suggestions for the import statements

import Queue asneeded
delayedimport Queue # can't think of a better name at this time

But auto loading a module by a module on behalf of a client just doesn’t sit 
too well for me.  How about the confusion it would cause.  Is Queue in treading 
module a reference to a Queue in a Queue module or a new class all together?  
If we go down this slippery slope we will see modules like array, struct, etc 
getting referenced and getting auto loaded on behalf of the client.  Where will 
it end.

John M. Camara


> > Guido van Rossum writes:
> > Code that *doesn't* need Queue but does use threading
> > shouldn't have to pay for loading Queue.py.
> 
> Greg Ewing responds:
> > What we want in this kind of situation is some sort
> > of autoloading mechanism, so you can import something
> > from a module and have it trigger the loading of another
> > module behind the scenes to provide it.
> 
> John Camera comments:
> > Bad idea unless it is tied to a namespace.  So that users knows
> > where this auto-loaded functionality is coming from.  Otherwise
> > it's just as bad as 'from xxx import *'.
> 
> John, I think what Greg is suggesting is that we include Queue
> in the threading module, but that we use a Clever Trick(TM) to
> address Guido's point by not actually loading the Queue code
> until the first time (if ever) that it is used.
> 
> I'm not familiar with the clever trick Greg is proposing, but I
> do agree that _IF_ everything else were equal, then Queue seems
> to belong in the threading module. My biggest reason is that I
> think anyone who is new to threading probably shouldn't use any
> communication mechanism OTHER than Queue or something similar
> which has been carefully designed by someone knowlegable.
> 
> -- Michael Chermside
> 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)

2005-10-12 Thread john . m . camara
> John Camera writes:
> > It sounds like he feels Queue should just be part of threading but queues
> > can be used in other contexts besides threading.  So having separate
> > modules is a good thing.
>
> Michael Chermside
> Perhaps I am wrong here, but the Queue.Queue class is designed specifically
> for synchronization, and I have always been under the impression that
> it was probably NOT the best tool for normal queues that have nothing
> to do with threading. Why incur the overhead of synchronization locks
> when you don't intend to use them. I would advise against using Queue.Queue
> in any context besides threading.

I haven't used the Queue class before as I normally use a list for a queue.  
I just assumed a Queue was just a queue that was perhaps optimized for 
performance.  I guess I would have expected the Queue class as defined 
in the standard library to have a different name if it wasn't just a queue.
Well I should have known better than to make assumption on this list. :)

I now see where Greg is coming from but I'm still not comfortable having 
it in the threading module.  To me threads and queues are two different 
beasts.

John M. Camara



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)

2005-10-12 Thread john . m . camara
> Skip write:
> Is the Queue class very useful outside a multithreaded context?  The notion
> of a queue as a data structure has meaning outside of threaded applications.
> Its presence might seduce a new programmer into thinking it is subtly
> different than it really is.  A cursory test suggests that it works, though
> q.get() on a empty queue seems a bit counterproductive.  Also, Queue objects
> are probably quite a bit less efficient than lists.  Taken as a whole,
> perhaps a stronger attachment with the threading module isn't such a bad
> idea.
> 
Maybe Queue belongs in a module called synchronize to avoid any confusions.

John M. Camara


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com