APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds RSS Feeds










/Programming/random_errors.html Comment posting

Comments by new and anonymous visitors are moderated and must be approved before they will appear on the site. We apologize for the necessity of this procedure.

Your name: (or leave as anonymous)
Your Website (optional): 
Your Email :

(optional - use if you want to be notified of new comments, will not be displayed )

We have strong spam controls here: preview your post to be sure it will be let through.
Comments over 6K in length cannot be posted. Please send them to me in email.

For easy cut and paste, the article text and other comments are reproduced below.


Random errors in Perl


2007/12/27


There is a simple mistake I make frequently with random numbers. I'll use Perl to illustrate, but trust me: I can screw this up in just about any language. And I have..

To make it both easy and obvious, let's say we want to randomly choose one of three things. Again to make it easy, we'll say that the integers 0, 1 and 2 will work for whatever it is we have in mind. So here's some code:


#!/usr/bin/perl
srand;
while(1) {

$x=int(rand(2)+.5);
$y=int(rand(3));
$x[$x]++;
$y[$y]++;
last if $count++ > 10000;
}
print "x:  $x[0] $x[1] $x[2]\n";
print "y:  $y[0] $y[1] $y[2]\n";

exit 0;

What we are doing here is counting how many times each number gets picked - how many times we get 0, how many times we get 1, and how many times we get 2.

If you think that both the x and y arrays ought to contain roughly similar numbers in each element, you've made the mistake too. In fact, the code above will output something very much like this:

x:  2475 5033 2494
y:  3276 3343 3383

The "x" method is strongly biased toward the value of "1". That's because rand() returns values between 0 and 2 (but never 2), and int() always "rounds down" - it returns "1" for any number greater than 1 and less than 2. That biases the output toward 1. If that doesn't make sense to you, pretend for a moment that rand() could only generate integers or the .5 value between two integers - in other words, it could produce 0, 0.5, 1.0, and 1.5 only for rand(2) and 0, 0.5, 1.0, 1.5, 2.0 and 2.5 for rand(3). Here's what happens:

Value00.5 1.01.5 2.02.5
int(value)00 11 22
int(value+.5)01 12 n/an/a

Do you see it now? Because int() doesn't round up as many of us seem to think it might, the x value clusters around "1"

I've carelessly and unwittingly made that mistake many times. I know better, but I find myself doing this quite often. No doubt it comes from a desire to "round up", even though that's completely unnecessary for the task at hand. The most likely explanation is that seeing "int()" triggers memories of rounding and overrides my sense of what I'm actually trying to accomplish. I'm a bit on "automatic pilot" there, letting my subconscious fill in the details, and it wants to add .5 when it sees "int()".

When you are trying to choose three or more items, this mistake does damage, but limits it to the lower and upper elements. For example, the output when we use :

$x=int(rand(5)+.5);
$y=int(rand(6));

will look something like:

x:  994 2017 1985 2013 2009 984
y:  1697 1660 1687 1675 1622 1661

Only 0 and 5 get shorted. Not great, but at least we do get some "randomness" (biased as it is). However, when limited to just two elements, we may not even notice the mistake at all because it's unbiased. To see that, this time pretend that rand() can only pick in increments of .25:

Value 00.25 0.500.75
int(value)00 00
int(value+.5)00 11

So if just picking betweeen two choices, the mistake is harmless.. but because it is harmful for larger sets, you really shouldn't get into the habit.

Of course if there was some reason that you actually wanted to introduce bias against the upper and lower bounds, this does the job. I don't think that comes up often.

There's another rounding mistake I often make - well, it's not really a mistake, but I do often forget that "printf" rounds:

printf "%.2f\n",15.3467

will produce "15.35". If that's all you needed (didn't need it rounded for any other reason), there's no reason to do anything else.



Comments