Integer Promotion


The C programming language has the following built-in integer types, all of which come in both a "signed" and "unsigned" flavour:

char
short int
int
long int
long long int

("long long int" was added with the C standard of 1999 and so might no be available on all C compilers)

It's common to abbreviate these names by removing "int", for example:

char
short
int
long
long long

In this list, "char" has the lowest rank, whereas "long long int" has the highest rank. The C Standard does not define maximum ranges for these integer types, but it does define minimum ranges:

signed char : -127 to 127
unsigned char : 0 to 255
signed short : -32767 to 32767
unsigned short : 0 to 65535
signed int : -32767 to 32767
unsigned int : 0 to 65535
signed long : -2147483647 to 2147483647
unsigned long : 0 to 4294967295
signed long long : 9223372036854775807 to 9223372036854775807
unsigned long long : 0 to 18446744073709551615

A handy way of abbreviating that is as follows:

char : 8-Bit
short : 16-Bit
int : 16-Bit
long : 32-Bit
long long : 64-Bit

Signed integer types can store negative numbers, whereas unsigned integer types can't. On the compiler I use, "int" has the following ranges:

int signed: -2147483648 to 2147483647
int unsigned: 0 to 4294967295

(It's -2147483648 instead of -2147483647 because my CPU uses the "Two's Complement" system for storing negative numbers)

The default for an integer type in C is "signed", so if you define a variable as follows:

short i;

then it's the same as:

short signed i;

signed short i;

(The "signed" can be placed either before or after the integer type)

In the C programming language, types with lower rank than "int" get second-class treatment:

char
short

You can't actually perform any sort of mathematical or bitwise operation on a type that has lower rank than an int. In order to perform an operation on one of these types, they must become either "signed int" or "unsigned int". Here's the details:

* signed char always promotes to signed int
* signed short always promotes to signed int
* unsigned char will promote to signed int if INT_MAX >= UCHAR_MAX, otherwise it will promote to unsigned int
* unsigned short will promote to signed int if INT_MAX >= USHRT_MAX, otherwise it will promote to unsigned int
* The plain char type will undergo the same promotion as the type it is compatible with (i.e. signed char or unsigned char)

So let's say we have the following code:

char unsigned i = 5;
i = ~i;    /* This is the "complement" operator
              it flips all of the bits */

Before the complement operator can be applied to this unsigned char, the unsigned char must undergo integer promotion. Depending on the system, this means it will be equal to one of the following:

i = ~(signed int)i;     /* On most systems, it will 
                           promote to "signed int" */
i = ~(unsigned int)i;

That's how integer promotion works with unary operators. (A unary operator is an operator that takes only one operand).

There are also binary operators, i.e. operators that have two operands. The problem with binary operators is that the two types must be identical. If they aren't identical, then conversions must take place. Let's take an example. To start off with, let's pretend that the system we're working on is as follows:

CHAR_BIT == 64
1 == sizeof(char) == sizeof(short) == sizeof(int) == sizeof(long) == sizeof(long long)

Now let's create two integer variables:

short unsigned su = 54;
long ls = 78;

And now let's perform a binary operation as follows, let's add them together:

su + ls

The types in question here are:

(short unsigned) + (long signed)

There are three steps to performing this operation:

1) If either operand is of lesser rank than "int", it must be promoted. In our example, USHRT_MAX is greater than INT_MAX, so the "short unsigned" will become an "int unsigned":

(int unsigned) + (long signed)

The second step is as follows:

2) The type of lesser rank must become the type of higher rank. In this example, our "int" must become "long". How do we decide whether it becomes a "long signed" or a "long unsigned"? As follows:

"unsigned int" will become "signed long" if LONG_MAX >= UINT_MAX, otherwise it will become "unsigned long"

In our example, LONG_MAX is smaller than UINT_MAX, so the "int unsigned" will become a "long unsigned":

(long unsigned) + (long signed)

The final step is as follows:

3) You must match the signedness. If they're of different signedness, the signed one becomes unsigned. Therefore in our example, the "long signed" becomes a "long unsigned":

(long unsigned) + (long unsigned)

And there we have it, the two types are identical so the operation can now be carried out.

In this case, on this particular machine with this particular compiler, we started off with:

(short unsigned) + (long signed)

and we ended up with:

(long unsigned) + (long unsigned)

Therefore, the original operation was equal to:

(long unsigned)su  +  (long unsigned)ls

Note that on a different compiler, we could have ended up with different types. It is for this reason that we should use casts to ensure we get the types we want. For instance:

char unsigned i = 5;

i = ~(unsigned)i;

Here, I cast "i" to an "unsigned int" because I don't want it to become a "signed int". If it were to become a "signed int", then the bit-flipping operation would be performed upon a "signed int", and then this "signed int" would be converted to an "unsigned char" in order to store the value of "i". It's the final conversion from "signed int" to "unsigned char" that I'm afraid of, as it will produce different results on different machines depending on which number system is used for storing negative numbers.



Virjacode Home