Hexadecimal and bit twiddling

Why the fascination with hexadecimal?

We know from previous coursework that integer numbers are represented internally using two's complement form. C++ and the instruction sets of most contemporary computers also support unsigned integer and character types where the value is always interpreted as a positive binary number and a sign bit is not used.

Because most of us were born with eight fingers and two thumbs -- ten digits -- the usual method of human counting is base ten. Positional notation lets us write numbers that are greater than ten. When a program reads a decimal number, it converts the base 10 positional notation into the two's complement (or unsigned) binary representation of the number. Similarly, when an integer value is displayed, the program converts from the internal representation to the our familiar base 10 positional notation, possibly with a leading minus sign.

Sometimes we want our program to manipulate the individual bits within a binary number. If the binary number is long, say 16 bits or more, it's quite cumbersome to write out all 16+ bits. This is where hexadecimal notation is convenient.

Nibbles and bits

A hexadecimal digit can assume one of sixteen different values where the digits 0 to 9 represent the first nine positive integer numbers and the letters A through F represent the positive integer numbers 10 through 15. If we write the binary representation for each of the sixteen hexadecimal digits, we obtain:

0	`0 0 0 0`
1	`0 0 0 1`
2	`0 0 1 0`
3	`0 0 1 1`
4	`0 1 0 0`
5	`0 1 0 1`
6	`0 1 1 0`
7	`0 1 1 1`
8	`1 0 0 0`
9	`1 0 0 1`
A	`1 0 1 0`
B	`1 0 1 1`
C	`1 1 0 0`
D	`1 1 0 1`
E	`1 1 1 0`
F	`1 1 1 1`

Hexadecimal notation provides a convenient, compact way to specify any of the sixteen states that four bits may assume. We've all heard of 8-bit bytes. Four bits are sometimes called a "nibble" because two nibbles make up a full byte. (Nerd humor.)

Longer bit patterns

We can easily express longer bit patterns by writing out longer strings of hexadecimal digits. We write hexadecimal constants in C, C++ and Java in the same way -- by beginning the constant with the characters "0x" followed by a sequence of hexadecimal digits. Constants to be used in long int arithmetic, bit-wise operations or variables must end in an 'L' to avoid unwanted truncation. Truncation occurs when bits are throw away in order to fit a value into a variable of shorter length, for example. So, in order to write a 16-bit pattern, we need a hexadecimal constant with 4 hexadecimal digits:

0xF83E

1 1 1 1

1 0 0 0

0 0 1 1

1 1 1 0

How long is a ...

When performing bit-wise operations, we often need to know how long (i.e., how many bits) are in the various datatypes. The Java language defines the length and range of primitive datatypes. In C++, the range and length of primitive datatypes can vary with computer instruction set (e.g., Intel x86, PowerPC, Itanium, Alpha) and compiler (GNU versus a native compiler provided by the vendor.) This variability can wreak havoc on programs that need to be portable and execute on different computer platforms.

Here is a short C++ program that displays the number of bytes for each of the primitive datatypes.

    #include <iostream>
    using namespace std ;

    int main(int argc, char *argv[])
    {
      cout << "char           " << sizeof(char) << endl ;
      cout << "short int      " << sizeof(short int) << endl ;
      cout << "int            " << sizeof(int) << endl ;
      cout << "long int       " << sizeof(long int) << endl ;
      cout << "long long int  " << sizeof(long long int) << endl ;
      cout << "float          " << sizeof(float) << endl ;
      cout << "double         " << sizeof(double) << endl ;
    }

The C++ sizeof operator returns the number of bytes needed to represent a particular datatype. The sizeof operator, by the way, returns the number of bytes needed to store an array or a user-defined type, too.

Bit twiddling

The phrase "bit twiddling" refers to operations that a program performs on individual bits or short bit fields within a single binary value. A bit field is a contiguous group of bits within a binary value.

Here is a code idiom that extracts a five bit field from a 16-bit binary value. The bit field is located in bits 5 through 9 of the binary value counting bits from the least significant bit (LSB), that is, where the least significant bit (LSB) is bit number zero:

    (value >> 5) & 0x001F

The shift operation positions the right side of the bit field such that the LSB of the bit field will be the LSB of the final result. Then we bit-wise AND the shifted value with the binary mask, 0x001F. We call this a "mask" because it allows only a few bits to "peak through" to the result.

As an exercise, try writing C++ code to store a 5-bit binary value into the same bit field. Hint: You will need to shift the 5-bit value into position and you will need to clear the existing bit field. If you ever have any doubts about your reasoning, draw a picture -- I usually have to. Try running your code. Remember, you can display an integer value in hexadecimal by inserting the hex format manipulator into the the output stream as shown here:

    cout << hex << value << endl ;

Most people like to number bits starting from zero, beginning with the least significant bit. Thus, the LSB is numbered bit zero and the bit number corresponds to the power of two which the bit represents in positional binary notation. Be aware, however, that some vendors number bits in the opposite order beginning with the most signficant bit (MSB.) Such numbering induces a "Lewis Black" moment.