See also Understanding Packed BCD numbers
Under ordinary circumstances, you don't have to know or care how numbers are represented within your programs. However, when you are transferring data files that contain numbers, you will have to convert if the storage formats are not identical. If the numbers are just integers, that's fairly easy because the only differences will be the length and the byte order: how many bytes the number takes up, and whether it is stored lsb or msb (least signifacant byte or most significant byte first). Once you know that, conversion is trivial.
Floating point numbers are a whole other game. For example, in December of 1983, I had to convert some Tandy Basic programs and data files to Xenix MBASIC. The Basic programs themselves were fairly challenging, but the data files were even more so. Tandy stored floating point numbers in what they called "XS128 notation" (Excess 128 is what they really meant) and MBASIC used packed BCD. At the time, I had never given a single thought to how floating point numbers are stored. As you surely realize, this was long before you could ask Google to find you something like MAD 3401 IEEE Floating-Point Notes, and the availability of computer oriented books was not anything like it is today. I was on my own, with only "od -cx", my wits, and pure stubbornness to go on. There was an explanation in the manuals, but it was typical geek-babble and it made my head hurt. It took me several hours of painful work to understand what I needed to do, and a few hours more to write programs to do it, but the project got done. I haven't had to do anything like that since then, and you may never have had to at all, but that doesn't that neither of us ever will. So rather than you getting a headache from trying to puzzle it out (because there's still a lot of techno-babble out there) , I'll get you started.
The first thing you need to know is that your machine may give different results than mine. It probably won't unless you are using something odd, but if it does, don't panic: the theory is still the same; you just have a slightly different implementation. Here's a Perl program that is going to show us what's going on (you do not need to understand this script):
#!/usr/bin/perl
showbits(0);
for ($x=1; $x < 16384; $x*=2) {
showbits($x);
}
showbits("5.75");
showbits("-.1");
sub showbits {
$x=shift;
$string=pack("f",$x);
print "$x\t";
$y=uc(unpack("H*",$string));
print "$y\t";
for ($z=0;$z<8;$z+=2) {
$hx[$z]=sprintf("%.8b ",hex(substr($y,$z,2)));
}
print substr($hx[0],0,1), " ";
print substr($hx[0],1,7);
print substr($hx[2],0,1), " ";
print substr($hx[2],1,7);
print substr($hx[4],0,8);
print substr($hx[6],0,8);
print "\n";
}
We're looking at single precision floating point numbers here. Double precision uses the same scheme, just more bits. Here's what the output looks like :
0 00000000 0 00000000 00000000000000000000000 1 3F800000 0 01111111 00000000000000000000000 2 40000000 0 10000000 00000000000000000000000 4 40800000 0 10000001 00000000000000000000000 8 41000000 0 10000010 00000000000000000000000 16 41800000 0 10000011 00000000000000000000000 32 42000000 0 10000100 00000000000000000000000 64 42800000 0 10000101 00000000000000000000000 128 43000000 0 10000110 00000000000000000000000 256 43800000 0 10000111 00000000000000000000000 512 44000000 0 10001000 00000000000000000000000 1024 44800000 0 10001001 00000000000000000000000 2048 45000000 0 10001010 00000000000000000000000 4096 45800000 0 10001011 00000000000000000000000 8192 46000000 0 10001100 00000000000000000000000 5.75 40B80000 0 10000001 01110000000000000000000 -.1 BDCCCCCD 1 01111011 10011001100110011001101
The first column is what the stored format looks like in hex. After that come the actual bits; I've separated them in this odd way for a very good reason (which will become clear later). The value "5.75" is stored as "01000000101110000000000000000000" or "40B80000" (hex).
You might easily guess that the first bit is the sign bit. I think that's what I first grokked back in 1983 too. The next 8 bits are used for the exponent, and the last 23 are the value. As you will no doubt notice, the value bits from 0 to 8192 are all empty, so I must be crazy and there's no point in reading this trash any farther.
Well, actually there is. There's a hidden bit there that isn't stored but is always assumed. If you are really compulsive and counted the bits, you see that only 23 bits are there. The hidden bit makes it 24.bits (or 4 bytes) and is always 1. So, if we add the hidden bit, the bits would look like:
0 0 00000000 100000000000000000000000 1 0 01111111 100000000000000000000000 2 0 10000000 100000000000000000000000 4 0 10000001 100000000000000000000000 8 0 10000010 100000000000000000000000 16 0 10000011 100000000000000000000000 32 0 10000100 100000000000000000000000 64 0 10000101 100000000000000000000000 128 0 10000110 100000000000000000000000 256 0 10000111 100000000000000000000000 512 0 10001000 100000000000000000000000 1024 0 10001001 100000000000000000000000 2048 0 10001010 100000000000000000000000 4096 0 10001011 100000000000000000000000 8192 0 10001100 100000000000000000000000 5.75 0 10000001 101110000000000000000000 -.1 1 01111011 110011001100110011001101
But remember, it's what I showed above that is really there.
One more thing: there's an implied decimal point after that hidden number. To get the value of bits after the decimal point, start dividing by two: so the first bit after the (implied) decimal point is .5, the next is .25 and so on. We don't have to worry about any of that for the powers of two, because obviously those are whole numbers and the bits will be all 0. But down at the 5.75 we see that at work:
First, looking at the exponent for 5.75, we see that it is 129. Subtracting 127 gives us 2. So 1.0111 times 2^2 becomes 101.11 (simply shift 2 places to the right to multiply by 4). So now we have 101 binary, which is 5, plus .5 plus .25 (.11) or 5.75 in total. Too quick?
Taking it in detail:
Exponent: 10000001, which is 129 (use the Javascript Bit Twiddler if you like). Subtract 127 leaves us with 2.
Mantissa: 01110000000000000000000
Add in the implied bit and we have 101110000000000000000000, with implied decimal point that's 1.01110000000000000000000
Multiple that by 2^2 to get 101.110000000000000000000
That is 4 + 1 + .5 + .25 or 5.75
Look at 2048. The exponent is 128 + 8 + 2 or 138, subtract 127 we get 11. Use the Bit Twiddle if you don't see that. The mantissa is all 0's, which with the implied bit makes this all 1.00000000000000000000000 times 2^11. What's 2^11? It's 2048, of course.
Now the -.1. This actually can't store precisely, but the method is still the same. The exponent is 64 + 32 + 16 + 8 + 2 + 1 or 123. Subtract 127 and we get -4, which means the decimal point moves 4 places to the left, making our value .000110011001100110011001101. Now you understand why it's stored after adding 127 - it's so we can end up with negative exponents. If we calculate out the binary, that's .625 + .3125 + .0390625 and on to ever smaller numbers which get us very, very close to .1 (but off slightly). The sign bit was set, so it's a -.1
The Tandy (and Dec Vax, by the way) "excess 128" exponent storage simply changes the ranges of positive versus negative numbers - other than that, it works just like this.
Finally, there are two reserved values: all 0's for 0, and all 1's for NaN (Not A Number) in other words, too large (or too small) for the format to hold. You'd also get that from dividing by zero.
That's it. Take a look at the link at the beginning if you want to go a little deeper, but this is probably all you need to get started.
Enter your email address for automatic notification of new posts here
(be sure to whitelist 'feedburner.com' if you use spam filtering)
| Views for this page | ||||
|---|---|---|---|---|
| Today | This Week | This Month | This Year | Overall |
| 2 | 96 | 345 | 4,987 | 30,784 |
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publish your articles, comments, book reviews or opinions here!
FloatingPoint :
"Tandy stored floating point nubers in what they called "XS128 notation" (Excess 128 is what they really meant) and MBASIC used packed BCD."
Tandy used excess 128 because Microsoft used it in their 8 bit BASIC interpreters. Excess 128 worked well with 8 bit CPU's (like the Z80 in Tandy's TRS-80 boxes and the 6510 in the Commodore 64) because the conversion between ASCII and floating binary format could be carried out with simple binary adds, shifts and rotates -- operations that were relatively quick on these processors. The alternative, IEEE floating point format, was best implemented on 16 bit processors, where integer multiply and divide operations were common.
However, excess 128 format had a significant flaw, in that it didn't always accurately represent the fractional component of a number. Despite having 7 significant display digits when returned to ASCII format, excess 128 often caused computation errors that could drive a programmer batty. For example, it was not uncommon for an expression like 1.80-1.79 to result in .00999937 or something similar, a predictable result of trying to convert a base-10 fraction in a base-2 representation. I recall many different schemes that were concocted to work around this nonsense. It was what prompted me to turn to BCD routines in my assembly language programs.
Speaking of packed (or compressed) BCD, the MOS Technology 65xx processor family used in many 1970's and 1980's home computers (e.g., the Commodore 64, Atari, Apple II) could be made to do arithmetic in BCD by setting the decimal bit in the processor's status register with the SED mnemonic. If decimal mode was cleared (the CLD mnemonic), all arithmetic was performed on unsigned 8 bit numbers and a carry would occur during addition if the result exceeded an 8 bit value (the width of the .A or accumulator register). No half-carry was considered.
However, if decimal mode was set, a half-carry would automatically occur if, during addition, the value of the low order nybble went past $09 (this is MOS Technology notation for 0x09) -- $09 + $01 in BCD resulted in $10, not $0A as would be the case in binary mode. A full carry occurred if the accumulator went past $99 (it would roll back to $00 and the processor carry flag would be set). A similar action in reverse occurred if decimal mode subtraction was carried out.
There was a booby-trap in the C-64 that could catch the unwary when the processor was in decimal mode: if either an IRQ or NMI occurred, the kernel ROM handlers would push the processor status register onto the stack, thus preserving the decimal mode setting. However, the interrupt handler failed to return the processor to binary mode prior to continuing, which could result in a crash or at least bizarre behavior when some part of the interrupt handler used an ADC (add with carry) or SBC (subtract with borrow) operation -- the result would not be what was expected. The solution was to either disable IRQ's while decimal mode was in use or modify the interrupt vectors to point to code that would clear decimal mode before executing the main interrupt handler (the restoration of the status register from the stack at the end of the interrupt handler automatically placed the processor back into decimal mode). Of course, there was no way to disable an NMI, so if one occurred during a BCD routine, oh well!
BCD has the advantage of exactly converting from the ASCII representation of decimal numbers to the machine format and back. However, it takes more bytes to represent a given number, and BCD multiplication and division, as well as transcendental functions, tend to execute more slowly than their binary equivalents. Compromise, compromise...
--BigDumbDinosaur
I think you are confused somewhere. XS128 is the same concept as ieee floating point, and they both are unable to accurately represent numbers. The more bits you can give to the mantissa, the closer you can get, but neither accurately represent something as simple a -.1 (as in the examples of the article).
--TonyLawrence
No confusion. Of course, both excess 128 and IEEE are closely related due to their reliance on binary exponentiation to represent large numbers in a small space. In fact, 4 byte IEEE has exactly the same number of significant decimal digits as excess 128, which is also a 4 byte notation. Excess 128 handles negativity in a different fashion, though. My above wording, after rereading it, does seem to imply that IEEE was more accurate than excess 128 -- that wasn't my intention.
Excess 128 notation was devised as a way to avoid the use of negative 8 bit numbers (i.e., values where bit 7 was set) in twos complement arithmetic. Signed arithmetic operations on 8 bit processors are inefficient and produce only 7 bits of significance, obviously. Plus most 8 bit CPU's do not have a specific means of handling overflow into the sign bit -- they simply set a flag in the status register and leave it up to the programmer to handle the overflow.
Contrast that to a 16 bit processor, in which twos complement produces 15 bit significance, or the 31 bits of a 32 bit CPU. Also, all modern 16 and 32 bit CPU's have the means to deal with sign overflow. Hence excess 128 tends to be faster on older 8 bit CPU's, whereas IEEE format comes into its own on 16, 32 and 64 bit processors.
--BigDumbDinosaur
In the IEEE standards for floating-point numbers (IEEE 754 and 854), single precision (32-bit) uses 8-bit exponents in excess 127 notation (i.e., the bias is 127). Double precision (64-bit) uses 11-bit exponents in excess 1023.
--FredFoobar
---September 16, 2004
Sat Feb 19 17:55:27 2005: Subject: Great article anonymous
Gracias, this is very informative and I've book marked this page. What did catch me for a while is that .0001 binary was written 0.625 instead 0.0625 and I missed the small decimal point before the one in -.1. So for five minutes I went crazy wondering where the extra 10 factor came from. Now I understand its -0.1=-0.0625+-0.03125, etc.
Add your comments