That helps a lot. It was stupid of me to not compare the output from
the two functions.
The second for loop is actually skiiping over some of the values and
hence it takes longer to run when the "plst" is generated.
On Wed, Jun 24, 2009 at 8:16 AM, Luke
Paireepinart wrote:
> Yash wrote:
>>
>> Hello,
>> I am a newbie to python. I am trying to practice writing a code in
>> python and am trying to write a function to generate prime numbers
>> using sieve of Eratosthenes. (code follows at the end of mail)
>>
>> The way it is coded, if I change the for loop over "j" from
>> for j in range( (2*i*(i + 1)), np, (2*i +1)):
>> to
>> for j in range( (2*i*(i + 1)), np, 2*(2*i +1)):
>> I should get the same output for the two (which I do).
>>
>> The second for loop has to make roughly half the iterations compared
>> to the first, because I increased the step-size by factor 2.
>> However when I time the code for the two different for loops, I
>> observe that the code takes longer to run as compared the first for
>> loop.
>>
>
> How do you know it's taking longer to run? How are you profiling your code?
I use the clock() function from time module to measure the total run time.
>
> I just ran your code through timeit, and you're correct, changing that line
> does make the code take longer to run.
> The reason for this is not extremely clear-cut and you'll have to do further
> analysis to confirm this (look into the "timeit" module among others)
> but here is my suspicion:
> try printing the return from primes
>
> print primes(1000)
>
> try the line the original way you had it, and then try it with the second
> line. You do NOT get the same output.
> In fact, it's significantly different.
> let's see exactly how many extra numbers we have with the second loop style
> (primes2 is the same function, but with your second loop definition.)
>
> x = set(primes(1))
> y = set(primes2(1))
> print len(y)
> print len(x)
> print len(y - x)
>
> this gives us the fact that there are 1881 incorrect results in "y"!
>
> Also, it claims there are 1229 primes less than 10,000 which seems
> unrealistically-high to me (though I very well may be incorrect.) so you may
> want to look into that.
>
> So what it comes down to is that the reason the second way is taking longer
> is because it's simply got a larger set that it's dealing with - when you
> iterate over the items to generate plst after your for loop, it's having to
> iterate over a larger set, and when returning the list, it's having to
> return a larger amount of data (although I don't think that would actually
> have any speed difference as it would just return the reference to the
> set...)
>
> The answer to your original question - why does it take longer to execute
> over a smaller range - it doesn't. It won't take longer to execute over a
> smaller range.
> When you suspect something like this, that is completely counter-intuitive,
> write a new program that tests JUST that feature, and compare based on that.
> As you've seen from this problem, there can be minor nuances in your program
> that effect runtime in very odd ways that you did not expect.
>
> Moral of the story: print your output, use various methods at your disposal
> to check that they're actually equal (use set subtraction / intersections
> and such, rather than relying on eyeballing it) when comparing data sets.
>
>
> Also a few style notes ( just helpful suggestions, not meant to be overly
> critical):
> 1) please, please! don't use semicolons at the end of your lines. It is NOT
> required syntax in Python and it therefore just looks cluttered /
> unnecessary.
> 2) you do not need to call print() as a function unless you're using Python
> 3.0, but it doesn't hurt to. you can just use print 'somestring' instead.
> 3) be consistent with your spacing. for example, in some cases you have
> lines ("else :") with an extra space before the colon, and sometimes you
> don't ("if plst[i]:"). I would suggest you use no space between the
> condition and the colon, but it doesn't really matter so long as you're
> consistent. Just don't be inconsistent.
> 4) don't put the condition of your "if" statement in parenthesis if it's not
> necessary (the line "if ( Nmax < 2 ) :" is functionally equivalent to "if
> Nmax < 2:")
> 5) you catch an exception if Nmax can't be coerced to an integer, output a
> "sorry, Nmax couldn't be converted" message, and then you go ahe