Re: [Python-Dev] Python Benchmarks
Here are my suggestions: - While running bench marks don't listen to music, watch videos, use the keyboard/mouse, or run anything other than the bench mark code. Seams like common sense to me. - I would average the timings of runs instead of taking the minimum value as sometimes bench marks could be running code that is not deterministic in its calculations (could be using random numbers that effect convergence). - Before calculating the average number I would throw out samples outside 3 sigmas (the outliers). This would eliminate the samples that are out of wack due to events that are out of our control. To use this approach it would be necessary to run some minimum number of times. I believe 30-40 samples would be necessary but I'm no expert in statistics. I base this on my recollection of a study on this I did some time in the late 90s. I use to have a better feel for the number of samples that is required based on the number of sigmas that is used to determine the outliers but I have to confess that I just normally use a minimum of 100 samples to play it safe. I'm sure with a little experimentation with bench marks the proper number of samples could be determined. Here is a passage I found at http://www.statsoft.com/textbook/stbasic.html#Correlationsf that is related. '''Quantitative Approach to Outliers. Some researchers use quantitative methods to exclude outliers. For example, they exclude observations that are outside the range of �2 standard deviations (or even �1.5 sd's) around the group or design cell mean. In some areas of research, such "cleaning" of the data is absolutely necessary. For example, in cognitive psychology research on reaction times, even if almost all scores in an experiment are in the range of 300-700 milliseconds, just a few "distracted reactions" of 10-15 seconds will completely change the overall picture. Unfortunately, defining an outlier is subjective (as it should be), and the decisions concerning how to identify them must be made on an individual basis (taking into account specific experimental paradigms and/or "accepted practice" and general research experience in the respective area). It should also be noted that in some rare cases, the relative frequency of outliers across a number of groups or cells! of a d esign can be subjected to analysis and provide interpretable results. For example, outliers could be indicative of the occurrence of a phenomenon that is qualitatively different than the typical pattern observed or expected in the sample, thus the relative frequency of outliers could provide evidence of a relative frequency of departure from the process or phenomenon that is typical for the majority of cases in a group.''' Now I personally feel that using 1.5 or 2 sigma approach is rather loose for the case of bench marks and the suggestion I gave of 3 might be too tight. From experimentation we might find that 2.5 is more appropriate. I usually use this approach while reviewing data obtained by fairly accurate sensors so being being conservative using 3 sigmas works well for these cases. The last statement in the passage is worthy to note as a high ratio of outliers could be used as an indication that the bench mark results for a particular run are invalid. This could be used to throw out bad results due to some one starting to listen to music while the bench marks are running, anti virus software starts to run, etc. - Another improvement to bench marks can be obtained when both the old and new code is available to be benched mark together. By running the bench marks of both codes together we could eliminate effects of noise if we assume noise at a given point of time would be applied to both sets of code. Here is a modified version of the code that Andrew wrote previously to show this clearer than my words. def compute_old(): x = 0 for i in range(1000): for j in range(1000): x = x + 1 def compute_new(): x = 0 for i in range(1000): for j in range(1000): x += 1 def bench(): t1 = time.clock() compute_old() t2 = time.clock() compute_new() t3 = time.clock() return t2-t1, t3-t2 times_old = [] times_new = [] for i in range(1000): time_old, time_new = bench() times_old.append(time_old) times_new.append(time_new) John ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-Dev Digest, Vol 27, Issue 44
> Date: Tue, 11 Oct 2005 09:51:06 -0400 > From: Tim Peters <[EMAIL PROTECTED]>> Subject: Re: [Python-Dev] PythonCore\CurrentVersion > To: Martin v. L?wis <[EMAIL PROTECTED]>> Cc: python-dev@python.org > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=ISO-8859-1 > > [Tim Peters] > >>> never before this year -- maybe sys.path _used_ to contain the current > >>> directory on Linux?). > > [Fred L. Drake, Jr.] > >> It's been a long time since this was the case on Unix of any variety; I > >> *think* this changed to the current state back before 2.0. > > [Martin v. L?wis] > > Please check again: > > > > [GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2 > > Type "help", "copyright", "credits" or "license" ! for more information. > > >>> import sys > > >>> sys.path > > ['', '/usr/lib/python23.zip', '/usr/lib/python2.3', > > '/usr/lib/python2.3/plat-linux2', '/usr/lib/python2.3/lib-tk', > > '/usr/lib/python2.3/lib-dynload', > > '/usr/local/lib/python2.3/site-packages', > > '/usr/lib/python2.3/site-packages', > > '/usr/lib/python2.3/site-packages/Numeric', > > '/usr/lib/python2.3/site-packages/gtk-2.0', '/usr/lib/site-python'] > > > > We still have the empty string in sys.path, and it still > > denotes the current directory. > > Well, that's in interactive mode, and I see sys.path[0] == "" on both > Windows and Linux then. I don't see "" in sys.path on either box in > batch mode, although I do see the absolutized path to the current > directory in sys.path in batch mode on Windows but not on Linux -- but &! gt; Mark Hammond says he doesn't see (any form of) the current directo ry > in sys.path in batch mode on Windows. > > It's a bit confusing ;-) > Been bit by this in the past. On windows, it's a relative path in interactive mode and absolute path in non-interactive mode. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)
Greg Ewing wrote: > > Guido van Rossum wrote: > > > I see no need. Code that *doesn't* need Queue but does use threading > > shouldn't have to pay for loading Queue.py. > > However, it does seem awkward to have a whole module > providing just one small class that logically is so > closely related to other threading facilities. > > What we want in this kind of situation is some sort > of autoloading mechanism, so you can import something > from a module and have it trigger the loading of another > module behind the scenes to provide it. > Bad idea unless it is tied to a namespace. So that users knows where this auto-loaded functionality is coming from. Otherwise it's just as bad as 'from xxx import *'. John M. Camara ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)
> > Guido van Rossum writes: > > Code that *doesn't* need Queue but does use threading > > shouldn't have to pay for loading Queue.py. > > Greg Ewing responds: > > What we want in this kind of situation is some sort > > of autoloading mechanism, so you can import something > > from a module and have it trigger the loading of another > > module behind the scenes to provide it. > > John Camera comments: > > Bad idea unless it is tied to a namespace. So that users knows > > where this auto-loaded functionality is coming from. Otherwise > > it's just as bad as 'from xxx import *'. > > Michael Chermside comments: > John, I think what Greg is suggesting is that we include Queue > in the threading module, but that we use a Clever Trick(TM) to > address Guido's point by not actually loading the Queue code > until the first time (if ever) that it is used. > > I'm not familiar with the clever trick Greg is proposing, but I > do agree that _IF_ everything else were equal, then Queue seems > to belong in the threading module. My biggest reason is that I > think anyone who is new to threading probably shouldn't use any > communication mechanism OTHER than Queue or something similar > which has been carefully designed by someone knowlegable. > I guess from Gregs comments Im not sure if he wants to import threading and as a result Queue becomes available in the local namespace and bound/loaded when it is first needed and thus saves himself from typing import Queue immediately after import threading or Queue becomes part of the threading namespace and bound/loaded when it is first needed. Queue then becomes accessible through threading.Queue When Greg says > However, it does seem awkward to have a whole module > providing just one small class that logically is so > closely related to other threading facilities. It sounds like he feels Queue should just be part of threading but queues can be used in other contexts besides threading. So having separate modules is a good thing. The idea of delaying an import until its needed sounds like a great idea and having built in support for this would be great. Here are 2 possible suggestions for the import statements import Queue asneeded delayedimport Queue # can't think of a better name at this time But auto loading a module by a module on behalf of a client just doesnt sit too well for me. How about the confusion it would cause. Is Queue in treading module a reference to a Queue in a Queue module or a new class all together? If we go down this slippery slope we will see modules like array, struct, etc getting referenced and getting auto loaded on behalf of the client. Where will it end. John M. Camara > > Guido van Rossum writes: > > Code that *doesn't* need Queue but does use threading > > shouldn't have to pay for loading Queue.py. > > Greg Ewing responds: > > What we want in this kind of situation is some sort > > of autoloading mechanism, so you can import something > > from a module and have it trigger the loading of another > > module behind the scenes to provide it. > > John Camera comments: > > Bad idea unless it is tied to a namespace. So that users knows > > where this auto-loaded functionality is coming from. Otherwise > > it's just as bad as 'from xxx import *'. > > John, I think what Greg is suggesting is that we include Queue > in the threading module, but that we use a Clever Trick(TM) to > address Guido's point by not actually loading the Queue code > until the first time (if ever) that it is used. > > I'm not familiar with the clever trick Greg is proposing, but I > do agree that _IF_ everything else were equal, then Queue seems > to belong in the threading module. My biggest reason is that I > think anyone who is new to threading probably shouldn't use any > communication mechanism OTHER than Queue or something similar > which has been carefully designed by someone knowlegable. > > -- Michael Chermside > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)
> John Camera writes: > > It sounds like he feels Queue should just be part of threading but queues > > can be used in other contexts besides threading. So having separate > > modules is a good thing. > > Michael Chermside > Perhaps I am wrong here, but the Queue.Queue class is designed specifically > for synchronization, and I have always been under the impression that > it was probably NOT the best tool for normal queues that have nothing > to do with threading. Why incur the overhead of synchronization locks > when you don't intend to use them. I would advise against using Queue.Queue > in any context besides threading. I haven't used the Queue class before as I normally use a list for a queue. I just assumed a Queue was just a queue that was perhaps optimized for performance. I guess I would have expected the Queue class as defined in the standard library to have a different name if it wasn't just a queue. Well I should have known better than to make assumption on this list. :) I now see where Greg is coming from but I'm still not comfortable having it in the threading module. To me threads and queues are two different beasts. John M. Camara ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)
> Skip write: > Is the Queue class very useful outside a multithreaded context? The notion > of a queue as a data structure has meaning outside of threaded applications. > Its presence might seduce a new programmer into thinking it is subtly > different than it really is. A cursory test suggests that it works, though > q.get() on a empty queue seems a bit counterproductive. Also, Queue objects > are probably quite a bit less efficient than lists. Taken as a whole, > perhaps a stronger attachment with the threading module isn't such a bad > idea. > Maybe Queue belongs in a module called synchronize to avoid any confusions. John M. Camara ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com