We know from previous coursework that integer numbers are represented
internally using two's complement form. C++ and the instruction sets of
most contemporary computers also support `unsigned`
integer and character types where the value is always interpreted
as a positive binary number and a sign bit is not used.

Because most of us were born with eight fingers and two thumbs -- ten
digits -- the usual method of human counting is base ten. Positional
notation lets us write numbers that are greater than ten. When a program
reads a decimal number, it converts the base 10 positional notation
into the two's complement (or `unsigned`) binary representation
of the number. Similarly, when an integer value is displayed, the program
converts from the internal representation to the our familiar base 10
positional notation, possibly with a leading minus sign.

Sometimes we want our program to manipulate the individual bits within a binary number. If the binary number is long, say 16 bits or more, it's quite cumbersome to write out all 16+ bits. This is where hexadecimal notation is convenient.

A hexadecimal digit can assume one of sixteen different values where the digits 0 to 9 represent the first nine positive integer numbers and the letters A through F represent the positive integer numbers 10 through 15. If we write the binary representation for each of the sixteen hexadecimal digits, we obtain:

0 |
0 0 0 0 |

1 |
0 0 0 1 |

2 |
0 0 1 0 |

3 |
0 0 1 1 |

4 |
0 1 0 0 |

5 |
0 1 0 1 |

6 |
0 1 1 0 |

7 |
0 1 1 1 |

8 |
1 0 0 0 |

9 |
1 0 0 1 |

A |
1 0 1 0 |

B |
1 0 1 1 |

C |
1 1 0 0 |

D |
1 1 0 1 |

E |
1 1 1 0 |

F |
1 1 1 1 |

Hexadecimal notation provides a convenient, compact way to specify any of the sixteen states that four bits may assume. We've all heard of 8-bit bytes. Four bits are sometimes called a "nibble" because two nibbles make up a full byte. (Nerd humor.)

We can easily express longer bit patterns by writing out longer
strings of hexadecimal digits. We write hexadecimal constants in
C, C++ and Java in the same way -- by beginning the constant with
the characters "0x" followed by a sequence of hexadecimal digits.
Constants to be used in `long int` arithmetic, bit-wise
operations or variables must end in an '`L`' to avoid
unwanted truncation. Truncation occurs when bits are throw away
in order to fit a value into a variable of shorter length, for
example. So, in order to write a 16-bit pattern, we need a hexadecimal
constant with 4 hexadecimal digits:

0xF83E |
1 1 1 1 | 1 0 0 0 | 0 0 1 1 | 1 1 1 0 |

When performing bit-wise operations, we often need to know how long (i.e., how many bits) are in the various datatypes. The Java language defines the length and range of primitive datatypes. In C++, the range and length of primitive datatypes can vary with computer instruction set (e.g., Intel x86, PowerPC, Itanium, Alpha) and compiler (GNU versus a native compiler provided by the vendor.) This variability can wreak havoc on programs that need to be portable and execute on different computer platforms.

Here is a short C++ program that displays the number of *bytes*
for each of the primitive datatypes.

#include <iostream> using namespace std ; int main(int argc, char *argv[]) { cout << "char " << sizeof(char) << endl ; cout << "short int " << sizeof(short int) << endl ; cout << "int " << sizeof(int) << endl ; cout << "long int " << sizeof(long int) << endl ; cout << "long long int " << sizeof(long long int) << endl ; cout << "float " << sizeof(float) << endl ; cout << "double " << sizeof(double) << endl ; }

The C++ `sizeof` operator returns the number of bytes needed
to represent a particular datatype. The `sizeof` operator, by
the way, returns the number of bytes needed to store an array or a
user-defined type, too.

The phrase "bit twiddling" refers to operations that a program performs
on individual bits or short bit fields within a single binary value. A
*bit field* is a contiguous group of bits within a binary value.

Here is a code idiom that extracts a five bit field from a 16-bit binary value. The bit field is located in bits 5 through 9 of the binary value counting bits from the least significant bit (LSB), that is, where the least significant bit (LSB) is bit number zero:

(value >> 5) & 0x001FThe shift operation positions the right side of the bit field such that the LSB of the bit field will be the LSB of the final result. Then we bit-wise AND the shifted value with the binary mask,

As an exercise, try writing C++ code to store a 5-bit binary value into
the same bit field. Hint: You will need to shift the 5-bit value into
position and you will need to clear the existing bit field. If you ever
have any doubts about your reasoning, draw a picture -- I usually have to.
Try running your code. Remember, you can display an integer value in
hexadecimal by inserting the `hex` format manipulator into the
the output stream as shown here:

cout << hex << value << endl ;

Most people like to number bits starting from zero, beginning with the least significant bit. Thus, the LSB is numbered bit zero and the bit number corresponds to the power of two which the bit represents in positional binary notation. Be aware, however, that some vendors number bits in the opposite order beginning with the most signficant bit (MSB.) Such numbering induces a "Lewis Black" moment.

Copyright © 2004-2013 Paul J. Drongowski