Hi Bob
On Fri, May 24, 2013 at 10:14:36AM -0600, Bob Proulx wrote:
> Hi Ola, Marc,
>
> Ola Lundqvist wrote:
> > I'm also adding Bob into this mail chain as he was the one that wrote the
> > code for RANDOM in 2004.
>
> Actually I had suggested the randomization using $RANDOM in email in
> May 2003 and after discussion with Ola it settled on using #!/bin/bash
> to support it. I think it was released for use after Bug#191981 was
> coincidentally reported soon afterward for the same reason. Giving
> credit where credit is due David Weinehall suggested in Bug#260071 the
> RANDOM fallback code currently in use with /dev/urandom & cut which as
> I recall allowed the script to return to #!/bin/sh. :-)
:-)
> > Marc Haber wrote:
>
> > > Please consider the following patch:
> > > - RANDOM=$(dd if=/dev/urandom count=1 2> /dev/null | cksum |
> > > cut -c"1-5")
> > > + RANDOM=$(dd if=/dev/urandom count=1 2> /dev/null | cksum |
> > > cut -d' ' -f1)
> > > which will help cron-apt to gracefully handle the case where the
> > > checksum returned by cksum is shorter than four digits.
>
> Ah... cksum prints the cksum with %u. If the cksum ever results in a
> small integer then the width of the field may be shorter than four
> characters. In that case the second field would be pulled into the
> output since it is cutting strictly by fields. Here is a simulation.
>
> $ echo 123 512 | cut -c1-5
> 123 5
>
> I assume you discovered this case by inspection of the code rather
> than in actual practice. This would be a very unlikely case to ever
> hit in real life. If it did then it would produce one spurious cron
> email out of years of runs. I have a test program running right now
> looking for a random hit on this case and while it has been running
> for some time now and generated many zillions of test cases I have yet
> to produce a cksum shorter than four digits. Not impossible but
> statistically extremely unlikely. Out of curiosity I will leave my
> test case running for some more hours to see if I can produce an
> example. I don't know if it will obtain one before the heat death of
> the universe though.
With enough people on the planet, maybe? In any case it is good to fix
a bug.
> > > - RANDOM=$(dd if=/dev/urandom count=1 2> /dev/null | cksum |
> > > cut -c"1-5")
> > > + RANDOM=$(dd if=/dev/urandom count=1 2> /dev/null | cksum |
> > > cut -d' ' -f1)
>
> If I had my preference I would use awk instead of cut for splitting
> fields. Using awk is as standard as cut and so no difference in
> dependencies. But for splitting fields on whitespace awk is a better
> fit to the task than cut.
That would do the trick as well.
> RANDOM=$(dd if=/dev/urandom count=1 2> /dev/null | cksum | awk '{print$1}')
>
> Splitting on whitespace is more resilient to input differences than
> splitting on each space character. IMNHO using awk for field
> selection is almost always better than using cut. The variations in
> 'wc' output is a good example of where awk works as desired but cut
> would not.
Ah, I see. So if we happend to get a tab here cut would fail. In that case
Checked it and you are right.
ola@quartz:~/svn/fsp/cron-apt$ echo "1204231524"$'\t'"512"
1204231524 512
ola@quartz:~/svn/fsp/cron-apt$ echo "1204231524"$'\t'"512" | awk '{print$1}'
1204231524
ola@quartz:~/svn/fsp/cron-apt$ echo "1204231524"$'\t'"512" | cut -d' ' -f1
1204231524 512
I think we should use awk as well. Good then I'll do so.
> > When testing this out I noticed the following:
> > ...
> > As you can see it gives different output. I do not really think that is
> > a real problem, but I think you two should have the possibility to comment
> > it before I apply the patch.
>
> This is okay. It is changing one random number for a different random
> number. The generation of the arithmetic remainder of ARG1 divided by
> ARG2 in the TIME=$(($RANDOM % $RUNSLEEP)) part washes the difference
> away. The result will be a different random value between 0-$RUNSLEEP.
Great. That is what I thought. I just wanted to check if there were
some specific reason you just used the first digits.
> (And noting for the casual reader that RUNSLEEP defaults to 3600 and
> so this gives a random delay across an hour interval. The cron job
> runs in local time. As machines worldwide are located in different
> timezones the load on the upstream repositories will be distributed
> across the different timezones. Although some timezones will be more
> populous than others.)
>
> Since this has come up for discussion here it gives me an opportunity
> to cringe once again at using /dev/*random and the unportability of
> it and perhaps suggest using awk instead. In my own scripts I have
> been using portable awk for generating random numbers. Therefore I
> would suggest this instead and eliminate the use of /dev/*random.
> Noting that in Debian this works with either mawk or gawk.
>
> RANDOM=$(awk -v rs=$RUNSLEEP -v s=$$$(date +%M%S)
> 'BEGIN{srand(s);print(int(rs*rand()));}')
I do not really think this is a good approach. The reason is that you
initialize the random vector with the same minute and second every day.
Which means that you will end up in no randomness at all.
> Command line test case:
>
> $ awk -v rs=3600 -v s=$$$(date +%M%S)
> 'BEGIN{srand(s);print(int(rs*rand()));}'
> 1194
This is easy to see if you run the above command fast like this:
ola@quartz:~/svn/fsp/cron-apt$ awk -v rs=3600 -v s=$$$(date +%M%S)
'BEGIN{srand(s);print(int(rs*rand()));}'
82
ola@quartz:~/svn/fsp/cron-apt$ awk -v rs=3600 -v s=$$$(date +%M%S)
'BEGIN{srand(s);print(int(rs*rand()));}'
82
ola@quartz:~/svn/fsp/cron-apt$ awk -v rs=3600 -v s=$$$(date +%M%S)
'BEGIN{srand(s);print(int(rs*rand()));}'
555
ola@quartz:~/svn/fsp/cron-apt$ awk -v rs=3600 -v s=$$$(date +%M%S)
'BEGIN{srand(s);print(int(rs*rand()));}'
555
The reason for this randomization is to make sure to be nice to
the servers providing the data. So even if we add date to it, there
would be little use. We are very likely to have a lot of computers
having the same time and therefore cause peak great load.
So thanks for the suggestion, but I'll stick to /dev/*random where
we know that we will get real random data.
> But I am unaware of any Debian kernel that doesn't support
> /dev/*random and therefore do not feel strongly about it. Any
> solution that works is okay with me.
:-)
I can use this variant in case /dev/urandom does not exist...
Actually I will do that. However I will change +%M%S to +%N to have
better randomness in it.
> Thanks Ola for maintaining cron-apt!
You are most welcome. Thanks for your support in maintaining it.
Changes are uploaded now.
// Ola
> Bob
--
--- Inguza Technology AB --- MSc in Information Technology ----
/ [email protected] Annebergsslingan 37 \
| [email protected] 654 65 KARLSTAD |
| http://inguza.com/ Mobile: +46 (0)70-332 1551 |
\ gpg/f.p.: 7090 A92B 18FE 7994 0C36 4FE4 18A1 B1CF 0FE5 3DD9 /
---------------------------------------------------------------
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]