Why do programmers count from zero?
Recently I came across a posting that was explaining "sort" to
someone else. The example given was sorting a list of multiple
and because they wanted to sort on the second column, the
command was given as "sort +1".
It was the explanation that caught my eye:
Remember that the first word is 0, not 1 (this is due to
really old legacy issues that also crept into the C language,
I thought that was a pretty funny way to explain it. Using the
word "legacy" makes it sound like it had to be done this way
because something older would break if it were not, or perhaps that
you used to have to do it this way but now that computers are more
powerful it's a silly artifact. Not really true: it never HAD to be
done that way, but programmers think that way for good reason.
We're not programmers
When I started learning about computers, most of my peers in the field knew
at least a little bit about programming. You really had to if you
wanted to be able to do much of anything. That's not true today:
many tech support people know nothing about programming, sometimes
not even simple scripting. Even though I understand that they
really don't absolutely need that knowledge to do their jobs, it
still amazes me that so few have it, and it surprises me even more
that so many are apt to have a very negative attitude toward the
But given that, it's not surprising that these people don't know
why programmers count from zero.
It's all about storage
We're going to dive right into the gory guts of that sort
command and see why that starts counting from zero. You don't need
to know a thing about programming, you just need to think about how
things can be stored in memory. That's what it's really all about:
putting things into memory, manipulating them in some way, and
letting you know the results. That is programming.
So, if we're going to put things in memory, we need to know
where we put them if we are going to do something to those things
later. Every single location in memory has an address. You get to a
particular address by storing a number in a CPU register. So if you
want to look at memory address 100, you'd stick 100 in one of the
CPU registers, and voila, that's pointing at memory address 100 (or
is pointing at nothing and is just storing 100 for you - but we're
not going to get into how CPU registers work here).
CPU registers can hold numbers from 0 up to however big they
are. That could mean numbers from 0 to 255 (an 8 bit register), 0
to 65535 (16 bits) and on up. But notice it's always "0 to
something". So if having 100 means you are pointing at memory
location 100, what does having 0 in there mean?
Well, it could have meant "you made a big mistake, the first
location is one". Actually, because of the kinds of mistakes
programmers sometimes make, it might have been useful and
interesting if CPU's had been designed that way. But they weren't,
and actually I've simplified things here, so of course now we need
to get a little more complicated.
When CPU's want to address memory, they actually usually do so
by using two registers: a base and an offset. So you store 100 in
one register, and 2 in another, and that means you are pointing at
memory address 102 (100 + 2). What's the point of this? Why use two
when one would do? Well, some of that is due to some legacy issues
of early CPU's having some hardware limitations: it's called
segment addressing, and it will make you unhappy, so we won't get
into that here. But it also helps program design in general.
(That's not really how segmented addressing works but the rest
will be easier for you to understand if we pretend it is.)
If 100 is in our base register, and 0 is in the offset, there we
are pointing exactly at address 100. Add 0 to 100 and you get 100.
Prgrammers use this to build higher level structures called
Let's say you are going to store 500 bytes starting at address
3,000, and another 500 bytes starting at address 7,000. You want to
compare those two collections of bytes, add 'em together, subtract
one from the other, whatever. You are working with CPU registers
and you want an easy way to access those bytes. Using the base plus
offset scheme makes that much easier, especially if you can have
more than one base register. Base register one stores 3,000, Base
register two stores 7,000 and to run through both areas in sequence
we just need to increment our offset register. In higher level
languages (which always use these low level registers underneath of
course), we might call the stuff at 3,000 "dogs" and the stuff at
7,000 "cats", and "cats(70)" would be whatever was at address
7,070. Naturally, "cats(0)" is the contents of 7,000 itself. The
high level language could have arranged things so that "cats(1)"
was 7,000 and "cats(0)" was meaningless or an error (and again, for
various reasons that might have been useful). But high level
languages are closely tied to what has to happen underneath, and to
do that would mean a subtraction at the low level (or a complete
redesign of CPU hardware!). That would make the code that much
slower: it may not seem like much, but subtractions add up, and
manipulating arrays is a large part of what programs do. Nobody
wants slower code, and there's also the fact that the extra
subtraction introduces another place where the compiler (the thing
that turns the high level code into actual CPU instructions) could
be coded wrong and screw up.
So, this is why programmers count from zero. I think that if
sort were re-written today, it would probably not have that 0
offset syntax, but when it was written, most all of the people
likely to use it understood base plus offset programming and it
would have seemed unnatural to them to say "sort +1" if they wanted
the first field. Obviously "+1" is the second field if you are
thinking of offsets, right? So there we were, and here we are
today. It is a legacy, isn't it?
It is interesting to think about what would be different if
CPU's treated base + 1 as equal to base, and raised a trap if an
attempt was made to address base plus zero. If that were the
design, programmers would count from 1, arrays would never start at
0, and certain types of programming errors couldn't be made. But
that's not how CPU's work, so programmers do count from zero.
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Tony Lawrence
Find me on Google+
© 2010-10-27 Tony Lawrence