APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Random errors in Perl


© December 2007 Anthony Lawrence

There is a simple mistake I make frequently with random numbers. I'll use Perl to illustrate, but trust me: I can screw this up in just about any language. And I have..

To make it both easy and obvious, let's say we want to randomly choose one of three things. Again to make it easy, we'll say that the integers 0, 1 and 2 will work for whatever it is we have in mind. So here's some code:

#!/usr/bin/perl
srand;
while(1) {

$x=int(rand(2)+.5);
$y=int(rand(3));
$x[$x]++;
$y[$y]++;
last if $count++ > 10000;
}
print "x:  $x[0] $x[1] $x[2]\n";
print "y:  $y[0] $y[1] $y[2]\n";

exit 0;
 

What we are doing here is counting how many times each number gets picked - how many times we get 0, how many times we get 1, and how many times we get 2.

If you think that both the x and y arrays ought to contain roughly similar numbers in each element, you've made the mistake too. In fact, the code above will output something very much like this:

x:  2475 5033 2494
y:  3276 3343 3383
 

The "x" method is strongly biased toward the value of "1". That's because rand() returns values between 0 and 2 (but never 2), and int() always "rounds down" - it returns "1" for any number greater than 1 and less than 2. That biases the output toward 1. If that doesn't make sense to you, pretend for a moment that rand() could only generate integers or the .5 value between two integers - in other words, it could produce 0, 0.5, 1.0, and 1.5 only for rand(2) and 0, 0.5, 1.0, 1.5, 2.0 and 2.5 for rand(3). Here's what happens:

Value00.5 1.01.5 2.02.5
int(value)00 11 22
int(value+.5)01 12 n/an/a


Do you see it now? Because int() doesn't round up as many of us seem to think it might, the x value clusters around "1"

I've carelessly and unwittingly made that mistake many times. I know better, but I find myself doing this quite often. No doubt it comes from a desire to "round up", even though that's completely unnecessary for the task at hand. The most likely explanation is that seeing "int()" triggers memories of rounding and overrides my sense of what I'm actually trying to accomplish. I'm a bit on "automatic pilot" there, letting my subconscious fill in the details, and it wants to add .5 when it sees "int()".

When you are trying to choose three or more items, this mistake does damage, but limits it to the lower and upper elements. For example, the output when we use :

$x=int(rand(5)+.5);
$y=int(rand(6));
 

will look something like:

x:  994 2017 1985 2013 2009 984
y:  1697 1660 1687 1675 1622 1661
 

Only 0 and 5 get shorted. Not great, but at least we do get some "randomness" (biased as it is). However, when limited to just two elements, we may not even notice the mistake at all because it's unbiased. To see that, this time pretend that rand() can only pick in increments of .25:

Value 00.25 0.500.75
int(value)00 00
int(value+.5)00 11


So if just picking betweeen two choices, the mistake is harmless.. but because it is harmful for larger sets, you really shouldn't get into the habit.

Of course if there was some reason that you actually wanted to introduce bias against the upper and lower bounds, this does the job. I don't think that comes up often.

There's another rounding mistake I often make - well, it's not really a mistake, but I do often forget that "printf" rounds:

printf "%.2f\n",15.3467
 

will produce "15.35". If that's all you needed (didn't need it rounded for any other reason), there's no reason to do anything else.


Got something to add? Send me email.





(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

->
-> Random errors in Perl


Inexpensive and informative Apple related e-books:

Take Control of Pages

Digital Sharing Crash Course

Take Control of Apple Mail, Third Edition

Take Control of IOS 11

Take Control of Parallels Desktop 12




More Articles by © Anthony Lawrence




Printer Friendly Version

Have you tried Searching this site?

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us


Printer Friendly Version





The only thing I'd rather own than Windows is English. Then I'd be able to charge you an upgrade fee every time I add new letters like N and T. (Scott McNealy)




Linux posts

Troubleshooting posts


This post tagged:

Basics

Code

Perl

Programming



Unix/Linux Consultants

Skills Tests

Unix/Linux Book Reviews

My Unix/Linux Troubleshooting Book

This site runs on Linode