There is a simple mistake I make frequently with random numbers. I'll
use Perl to illustrate, but trust me: I can screw this up in just about any language. And I have..
To make it both easy and obvious, let's say we want to randomly choose
one of three things. Again to make it easy, we'll say that the integers
0, 1 and 2 will work for whatever it is we have in mind. So here's some
last if $count++ > 10000;
print "x: $x $x $x\n";
print "y: $y $y $y\n";
What we are doing here is counting how many times each number
gets picked - how many times we get 0, how many times we get 1, and
how many times we get 2.
If you think that both the x and y arrays ought to contain roughly
similar numbers in each element, you've made the mistake too. In
fact, the code above will output something very much like this:
x: 2475 5033 2494
y: 3276 3343 3383
The "x" method is strongly biased toward the value of "1". That's
because rand() returns values between 0 and 2 (but never 2), and
int() always "rounds down" - it returns "1" for any number
greater than 1 and less than 2. That biases the output toward
1. If that doesn't make sense to you, pretend for a moment that
rand() could only generate integers or the .5 value between two integers -
in other words, it could produce 0, 0.5, 1.0, and 1.5 only for rand(2)
and 0, 0.5, 1.0, 1.5, 2.0 and 2.5 for rand(3). Here's what
Do you see it now? Because int() doesn't round up as many
of us seem to think it might, the x value clusters around "1"
I've carelessly and unwittingly made that mistake many times.
I know better, but I find myself doing this quite often.
No doubt it comes from a desire to "round up", even though that's
completely unnecessary for the task at hand. The most likely
explanation is that seeing "int()" triggers memories of rounding
and overrides my sense of what I'm actually trying to accomplish.
I'm a bit on "automatic pilot" there, letting my subconscious
fill in the details, and it wants to add .5 when it sees "int()".
When you are trying to choose three or more items, this
mistake does damage, but limits it to the lower and upper elements.
For example, the output when we use :
will look something like:
x: 994 2017 1985 2013 2009 984
y: 1697 1660 1687 1675 1622 1661
Only 0 and 5 get shorted. Not great, but at least we do get
some "randomness" (biased as it is). However, when limited to
just two elements, we may not even notice the mistake at all
because it's unbiased. To see that, this time
pretend that rand() can only pick in increments of .25:
So if just picking betweeen two choices, the mistake is harmless..
but because it is harmful for larger sets, you really shouldn't
get into the habit.
Of course if there was some reason that you actually wanted to
introduce bias against the upper and lower bounds, this does the job.
I don't think that comes up often.
There's another rounding mistake I often make - well, it's not really
a mistake, but I do often forget that "printf" rounds:
will produce "15.35". If that's all you needed (didn't need
it rounded for any other reason), there's no reason to do anything else.
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Anthony Lawrence
Find me on Google+
© 2010-03-14 Anthony Lawrence