Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Steven D'Aprano
On Sun, 18 Jul 2010 11:51:17 am Richard D. Moores wrote:
[...]
> > On my not-even-close-to-high-end PC, this generates one billion
> > digits in 22 minutes:

> My  took 218 secs.

Lucky for some :)

Your version, on my machine, took 16 minutes, an improvement over my 
initial trial, but still much slower than your PC. 

I had initially thought that I was seeing the difference between local 
hard drive access and the speed of my network, as my home directory is 
on a remote machine, but I don't think this is the case. I modified 
your version to write to a file on the local hard drive instead of a 
remote file, but the speed hardly changed: 15 minutes instead of 16. 
The best I was able to get my version down to is 14 minutes, so I guess 
I just have to accept that my PC is seven times slower than yours :)


-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Richard D. Moores
On Sat, Jul 17, 2010 at 18:01, Steven D'Aprano  wrote:

> Having generated the digits, it might be useful to look for deviations
> from randomness. There should be approximately equal numbers of each
> digit (100,000,000 each of 0, 1, 2, ..., 9), of each digraph
> (10,000,000 each of 00, 01, 02, ..., 98, 99), trigraphs (1,000,000 each
> of 000, ..., 999) and so forth.

I've been doing a bit of that. I found approx. equal numbers of each
digit (including the zeros :) ). Then I thought I'd look at pairs of
the same digit ('00', '11, and so on). See my
. The results for the 1 billion
file start at line 78, and look good to me. I might try trigraphs
where the 2nd digit is 2 more than the first, and the third 2 more
than the 2nd. E.g. '024', '135', '791', '802'. Or maybe I've had
enough. BTW Steve, my script avoids the problem you mentioned, of
counting 2 '55's in a '555' string. I get only one, but 2 in ''.
See line 18, in the while loop.

I was surprised that I could read in the whole billion file with one
gulp without running out of memory. Memory usage went to 80% (from the
usual 35%), but no higher except at first, when I saw 98% for a few
seconds, and then a drop to 78-80% where it stayed.

> The interesting question is, if you measure a deviation from the
> equality (and you will), is it statistically significant? If so, it is
> because of a problem with the random number generator, or with my
> algorithm for generating the sample digits?

I was pretty good at statistics long ago -- almost became a
statistician -- but I've pretty much lost what I had. Still, I'd bet
that the deviations I've seen so far are not significant.

Thanks for the stimulating challenge, Steve.

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Steven D'Aprano
On Sun, 18 Jul 2010 06:49:39 pm Richard D. Moores wrote:

> I might try 
> trigraphs where the 2nd digit is 2 more than the first, and the third
> 2 more than the 2nd. E.g. '024', '135', '791', '802'. 

Why the restriction? There's only 1000 different trigraphs (10*10*10), 
which is nothing.


> Or maybe I've 
> had enough. BTW Steve, my script avoids the problem you mentioned, of
> counting 2 '55's in a '555' string. I get only one, but 2 in ''.

Huh? What problem did I mention? 

Taking the string '555', you should get two digraphs: 55_ and _55. 
In '' you should get three: 55__, _55_, __55. I'd do something like 
this (untested):

trigraphs = {}
f = open('digits')
trigraph = f.read(3)  # read the first three digits
trigraphs[trigraph] = 1
while 1:
c = f.read(1)
if not c:
break
trigraph = trigraph[1:] + c
if trigraph in trigraphs:
trigraphs[trigraph] += 1
else:
trigraphs[trigraph] = 1





> See line 18, in the while loop.
>
> I was surprised that I could read in the whole billion file with one
> gulp without running out of memory.

Why? One billion bytes is less than a GB. It's a lot, but not *that* 
much.


> Memory usage went to 80% (from 
> the usual 35%), but no higher except at first, when I saw 98% for a
> few seconds, and then a drop to 78-80% where it stayed.

That suggests to me that your PC probably has 2GB of RAM. Am I close?



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Dave Angel

Richard D. Moores wrote:

On Sat, Jul 17, 2010 at 18:01, Steven D'Aprano  wrote:
  



import random
def random_digits(n):
   "Return n random digits with one call to random."
   return "%0*d" % (n, random.randrange(10**n))


Thanks for implementing what I was suggesting, using zero-fill for 
getting a constant width string from a number.  No need for extra 
digits, or funny ranges



My  took 218 secs.

  

Having generated the digits, it might be useful to look for deviations
from randomness. There should be approximately equal numbers of each
digit (100,000,000 each of 0, 1, 2, ..., 9), of each digraph
(10,000,000 each of 00, 01, 02, ..., 98, 99), trigraphs (1,000,000 each
of 000, ..., 999) and so forth.



Yes. I'll do some checking. Thanks for the tips.
  

The interesting question is, if you measure a deviation from the
equality (and you will), is it statistically significant? If so, it is
because of a problem with the random number generator, or with my
algorithm for generating the sample digits?



Ah. Can't wait to see what turns up.

Thanks, Steve.

Dick

  
If you care about the randomness, it's important to measure deviations 
from equal, and to make sure not only that they don't vary too much, but 
also that they don't vary too little.  If you measured exactly 100 
million 5's, you're very unlikely to have a real random string.


There are a series of tests you could perform, but I no longer have any 
references to what ones would be useful.  Years ago I inherited a random 
number generator in which the individual values seemed to be quite 
random, but adjacent pairs had some very definite patterns.  I ended up 
writing a new generator, from scratch, which was both much faster and 
much more random.  But I didn't do the testing myself.


DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Richard D. Moores
On Sun, Jul 18, 2010 at 02:26, Steven D'Aprano  wrote:
> On Sun, 18 Jul 2010 06:49:39 pm Richard D. Moores wrote:
>
>> I might try
>> trigraphs where the 2nd digit is 2 more than the first, and the third
>> 2 more than the 2nd. E.g. '024', '135', '791', '802'.
>
> Why the restriction? There's only 1000 different trigraphs (10*10*10),
> which is nothing.

Just to see if I could do it.  It seemed interesting.

>> Or maybe I've
>> had enough. BTW Steve, my script avoids the problem you mentioned, of
>> counting 2 '55's in a '555' string. I get only one, but 2 in ''.
>
> Huh? What problem did I mention?

Sorry, that was Luke.

> Taking the string '555', you should get two digraphs: 55_ and _55.

That seems wrong to me. When I search on '99' and there's a
'999' I don't want to think I've found 2 instances of '99'.
But that's just my preference.  Instances should be distinct, IMO, and
not overlap.

> In '' you should get three: 55__, _55_, __55. I'd do something like
> this (untested):
>
> trigraphs = {}
> f = open('digits')
> trigraph = f.read(3)  # read the first three digits
> trigraphs[trigraph] = 1
> while 1:
>c = f.read(1)
>if not c:
>break
>trigraph = trigraph[1:] + c
>if trigraph in trigraphs:
>trigraphs[trigraph] += 1
>else:
>trigraphs[trigraph] = 1
>> See line 18, in the while loop.
>>
>> I was surprised that I could read in the whole billion file with one
>> gulp without running out of memory.
>
> Why? One billion bytes is less than a GB. It's a lot, but not *that*
> much.

I earlier reported that my laptop couldn't handle even 800 million.

>> Memory usage went to 80% (from
>> the usual 35%), but no higher except at first, when I saw 98% for a
>> few seconds, and then a drop to 78-80% where it stayed.
>
> That suggests to me that your PC probably has 2GB of RAM. Am I close?

No. 4GB.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Steven D'Aprano
On Sun, 18 Jul 2010 08:30:05 pm Richard D. Moores wrote:

> > Taking the string '555', you should get two digraphs: 55_ and _55.
>
> That seems wrong to me. When I search on '99' and there's a
> '999' I don't want to think I've found 2 instances of '99'.
> But that's just my preference.  Instances should be distinct, IMO,
> and not overlap.

I think we're talking about different things here. You're (apparently) 
interested in searching for patterns, in which case looking for 
non-overlapping patterns is perfectly fine. I'm talking about testing 
the randomness of the generator by counting the frequency of digraphs 
and trigraphs, in which case you absolutely do want them to overlap. 
Otherwise, you're throwing away every second digraph, or two out of 
every three trigraphs, which could potentially hide a lot of 
non-randomness.


> >> I was surprised that I could read in the whole billion file with
> >> one gulp without running out of memory.
> >
> > Why? One billion bytes is less than a GB. It's a lot, but not
> > *that* much.
>
> I earlier reported that my laptop couldn't handle even 800 million.

What do you mean, "couldn't handle"? Couldn't handle 800 million of 
what? Obviously not bytes, because your laptop *can* handle well over 
800 million bytes. It has 4GB of memory, after all :)

There's a big difference in memory usage between (say):

data = "1"*10**9  # a single string of one billion characters

and 

data = ["1"]*10**9  # a list of one billion separate strings

or even

number = 10**(10)-1  # a one billion digit longint

This is just an example, of course. As they say, the devil is in the 
details.


> >> Memory usage went to 80% (from
> >> the usual 35%), but no higher except at first, when I saw 98% for
> >> a few seconds, and then a drop to 78-80% where it stayed.
> >
> > That suggests to me that your PC probably has 2GB of RAM. Am I
> > close?
>
> No. 4GB.

Interesting. Presumably the rest of the memory is being used by the 
operating system and other running applications and background 
processes.



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Richard D. Moores
On Sun, Jul 18, 2010 at 05:49, Steven D'Aprano  wrote:
> On Sun, 18 Jul 2010 08:30:05 pm Richard D. Moores wrote:
>
>> > Taking the string '555', you should get two digraphs: 55_ and _55.
>>
>> That seems wrong to me. When I search on '99' and there's a
>> '999' I don't want to think I've found 2 instances of '99'.
>> But that's just my preference.  Instances should be distinct, IMO,
>> and not overlap.
>
> I think we're talking about different things here.

Yes. I was as interested in finding non-overlapping patterns as
testing randomness, I suppose because we wouldn't have been sure about
the randomness anyway.

>You're (apparently)
> interested in searching for patterns, in which case looking for
> non-overlapping patterns is perfectly fine. I'm talking about testing
> the randomness of the generator by counting the frequency of digraphs
> and trigraphs, in which case you absolutely do want them to overlap.
> Otherwise, you're throwing away every second digraph, or two out of
> every three trigraphs, which could potentially hide a lot of
> non-randomness.
>
>
>> >> I was surprised that I could read in the whole billion file with
>> >> one gulp without running out of memory.
>> >
>> > Why? One billion bytes is less than a GB. It's a lot, but not
>> > *that* much.
>>
>> I earlier reported that my laptop couldn't handle even 800 million.
>
> What do you mean, "couldn't handle"? Couldn't handle 800 million of
> what? Obviously not bytes,

I meant what the context implied. Bytes. Look back in this thread to
see my description of my laptop's problems.

>because your laptop *can* handle well over
> 800 million bytes. It has 4GB of memory, after all :)
>
> There's a big difference in memory usage between (say):
>
> data = "1"*10**9  # a single string of one billion characters
>
> and
>
> data = ["1"]*10**9  # a list of one billion separate strings
>
> or even
>
> number = 10**(10)-1  # a one billion digit longint
>
> This is just an example, of course. As they say, the devil is in the
> details.

Overkill, Steve.

>> >> Memory usage went to 80% (from
>> >> the usual 35%), but no higher except at first, when I saw 98% for
>> >> a few seconds, and then a drop to 78-80% where it stayed.
>> >
>> > That suggests to me that your PC probably has 2GB of RAM. Am I
>> > close?
>>
>> No. 4GB.
>
> Interesting. Presumably the rest of the memory is being used by the
> operating system and other running applications and background
> processes.

I suppose so.

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Return error message for my script in Blender

2010-07-18 Thread Andrew Martin
Yeah ok I get it. I have to return something.  I looked at the sample code
provided on the book's website and found out what I am supposed to return.
Thanks. I appreciate the responses, especially to this bonehead question.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread R. Alan Monroe
> That's the goal of the latest version of my script at
> . The best I've been able to do
> so far is a file with 800 million digits.

I don't think anyone else has suggested this: the numpy module can
generate random bytes and has a built-in tofile() method.

Alan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread bob gailer

Check this out:

import random, time
s = time.time()
cycles = 1000
d = "0123456789"*100
f = open("numbers.txt", "w")
for i in xrange(n):
  l = []
  l.extend(random.sample(d, 1000))
  f.write(''.join(l))
f.close()
print time.time() - s

1 million in ~1.25 seconds

Therefore 1 billion in ~21 minutes. 3 ghz processor 2 g ram.

Changing length up or down seems to increase time.

--
Bob Gailer
919-636-4239
Chapel Hill NC

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Alan Gauld

"Richard D. Moores"  wrote

I earlier reported that my laptop couldn't handle even 800 
million.


What do you mean, "couldn't handle"? Couldn't handle 800 million of
what? Obviously not bytes,


I meant what the context implied. Bytes. Look back in this thread to
see my description of my laptop's problems.


But you stored those in a list and then joined the list which meant 
you

actually at one point had two copies of the data, one in the list and
one in the string - that's >1.6billion bytes.

And these tests suuggest you only get about 2billion bytes of memory
to use which maybe explains why you were pushed to the limit at
800million.

HTH,

Alan G. 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] A file containing a string of 1 billion random digits.

2010-07-18 Thread Dave Angel

Steven D'Aprano wrote:

On Sun, 18 Jul 2010 08:30:05 pm Richard D. Moores wrote:

  

What do you mean, "couldn't handle"? Couldn't handle 800 million of 
what? Obviously not bytes, because your laptop *can* handle well over 
800 million bytes. It has 4GB of memory, after all :)





This is just an example, of course. As they say, the devil is in the 
details.



  

Memory usage went to 80% (from
the usual 35%), but no higher except at first, when I saw 98% for
a few seconds, and then a drop to 78-80% where it stayed.


That suggests to me that your PC probably has 2GB of RAM. Am I
close?
  

No. 4GB.



Interesting. Presumably the rest of the memory is being used by the 
operating system and other running applications and background 
processes.



  


The amount of physical RAM has little to do with the size that an 
application can reach.  An application is limited by several things, but 
physical RAM mostly has an effect on performance, not on the size of the 
app.


It is limited by the size of the swap file, and by how much of that swap 
file is in use by other apps, and by the system.  And a 32bit app is 
limited by the 4gb (virtual) address space, even if you have 8gb of 
physical RAM.  And that virtual space is shared with various pieces of 
OS, with device driver space, and sometimes with some physical hardware 
that's memory mapped.


But you can run a 2gb app in 1 gig of physical RAM.  That's what the 
swap file is for.


DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor