## Mixed types arithmetic in C++

# Arithmetic on mixed fundamental types in C++

For a weekend project of mine, I have had to think about mixed type arithmetic
on fundamental types in C++. In the process, I made sense of a few fundamental
things (no *pun* intended ;-) ) and I have decided to write them down.
Hopefully, writing about it will allow me to both clarify my thoughts and
remember the information!

## Arithmetic conversions

Applying binary operators to different types might seem trivial in C++, because it mostly just works. If you write the following code:

```
float flt{15.f};
long lng_a{30L};
long lng_b = lng_a + flt;
assert( lng_b == 45 );
```

and then run it, the value of `lng_b`

will be 45. No surprises… Except
when you stop to think about what happened in the background and how many rules
were involved in the computation.^{[1]}

Naively (as seems to often be the case for me…), because of the performance
reputation of C++, I assumed that the addition expression above mapped to an
assembly language instruction^{[2]} to add two registers. Then, I
started thinking more seriously about the problem, and even though I am anything
but an expert in assembly, it brought me to this question: is there an opcode to
add an `int`

to a `float`

? Are there mixed type instructions for the CPUs? With
modern hardware, it is not as simple as we think anymore, but as far as I could find out, in
most hardware, there
is not. This means that at the hardware
level, both datum have to have the same representation to allow the operation,
which is not completely unreasonable. Thus, even for the simple expression in
the code above, conversions are needed to select a common type to apply the
operation on.

The C++ language standard explicitly states which conversions will take place (inherited from C) allowing one to take control and override the behavior manually using a cast if preferred. This could be needed if, for instance, the default conversion introduces loss of precision on a given platform or if a specific wrapping behavior is required.

One should note that the type selected for the operation by the conversion rules
will be the type of both operands ** and** of the return value. This means that
a supplementary conversion might happen if the type in which the result of the
operation is put is not that which would have been selected by the usual
conversions (as is the case in the example above). Something to keep in mind.

## Usual arithmetic conversions

The conversion rules applied before binary operations on fundamental types are
called the *usual arithmetic conversions* and can be found in
section **8 Expressions** of the C++ standard document^{[3]}.
For those like me who do not easily read “standardese”, information on the
subject with some explanations can be
found
in
other
places. That said,
I have had to read some of the standard’s sections relating to the topic and I
have found them not too hard to read. Might be a sign that I am slowly getting
assimilated…

In the discussion that follows, I will consider an operation `op`

on two
operands `t1`

and `t2`

respectively of types `T1`

and `T2`

. This can be
conceptually represented as:

```
T1 t1;
T2 t2;
t1 op t2;
```

In the discussion, I will consider the following cases:

`T1`

and`T2`

are the same type (yes, conversions*can*happen…)`T1`

is floating point and`T2`

is integral (or vice versa)`T1`

and`T2`

are both floating point, but different types`T1`

and`T2`

are both integral, but different types

These are almost all the situations covered in paragraph 11 of section 8
of the standard (but the last point is actually split in several sub-sections).
The only case I am not considering is when one of the type (or both) is a scoped
enumeration (i.e. an `enum class`

), because that had nothing to do with my
project and I simply did not think about it as much.

## Same type

Even if the types are not actually mixed, I had to consider the case where both
operands are of the same type, i.e., `T1 == T2`

. Intuitively, nothing should
happen in this case, but it turns out that it is a false assumption. Because
arithmetic operators in C++ do not accept any type smaller than `int`

,
integral promotion will take place before the operation. This is described in
section **7.6 Integral promotions** of the standard and can be roughly
summarized as: any type smaller than `int`

will be converted to `int`

or
`unsigned int`

. For instance, the following relation holds:

```
short a{0};
short b{1};
static_assert( is_same_v< int, decltype( a + b ) > );
```

Other than that, nothing else happens in terms of conversions. As the name
suggests, this applies only for integral types. I would assume that is because
the smallest floating point type is at least as large as an `int`

, but I don’t
think that is guaranteed.

## Mixed integral and floating point types

Now, to look at mixed type arithmetic, the simplest case to start with is that
of integral and floating point mixed operations, i.e. either `T1`

or `T2`

is a
floating point and the other is integral. In this case, the standard simply
mandates that the integer value be converted to the floating point type:

```
int + float => (float)int + float
unsigned long long - float => (float)unsigned long long - float
long double + unsigned => long double + (long double)unsigned
...
```

The casts illustrated here are at least what *conceptually happens* if not what
*actually happens*, but, as far as I can tell, it is what actually happens. The
type selected in this situation is not too surprising when you think about it.
At least for IEEE floating points, the range of the smallest floating point type
(`float`

: 3.4×10^{38}) is much larger than that of the largest
integer type (`unsigned long long`

: 1.84×10^{19}). Thus, neglecting
the issue of not being able to represent the value exactly if the mantissa of the
floating point type cannot hold the value of the integer type, the floating point
type will accommodate the integer type. On top of that, the fractional part of
the floating point would necessarily be lost (either by rounding, truncating or
any other choice) if the conversion would be in the other direction.

So again, because of those two points, the standard here makes sense (at least to me!).

## Mixed floating points

Next on the scale of simplicity is the case where both arguments are of a (different) floating point type. In this case, the rule is simple: the smaller type is cast to the larger type before the operation.

```
double / float => double / (double)float
long double + double => long double + (long double)double
...
```

This makes sense. The value in the smaller sized variable will fit in the larger one, so no change in value.

## Mixed integrals

The final case is that of both operands being of integral types. Here, there are
a few more things to consider, since for the same type size, there are signed
and unsigned types (for instance, `int`

and `unsigned int`

must be the same size,
e.g. 4 bytes). This complicates matters a little and before we continue, we need
to first define the concept of integer conversion rank
(section **7.15 Integer conversion rank** of the
standard document) which will be used in deciding the conversions to apply for
mixed integer types arithmetic. Once these ranks are defined, the first
situation that applies in the following four scenarios is the conversion
mandated by the standard:

- both have the same signedness, independent of ranks;
- rank( unsigned ) >= rank( signed );
- rank( signed ) > rank( unsigned ), unsigned in signed range;
- rank( signed ) > rank( unsigned ), unsigned not in signed range;

Note that the order of the rank that I have written in situations 3 and 4 are not mentioned in the standard, but the fact that situations 1 and 2 do not apply implies that the rank of the signed integer is strictly greater than that of the unsigned integer, so I wrote it explicitly.

### Integer conversion rank

From what I understand from reading the standard, the integer types in C++ are
not given explicit values, but the relative ordering of the ranks is specified.
This can be ** loosely** interpreted as: the integer ranks are in corresponding
order of size where the larger integral types have a higher rank. In
particular, the standard says (section 7.15, par. 1.3):

The rank of

`long long int`

shall be greater than the rank of`long int`

, which shall be greater than the rank of`int`

, which shall be greater than the rank of`short int`

, which shall be greater than the rank of`signed char`

.

In order to remove any ambiguity, the standard adds quite a few details (there are 10 clauses to the section), but I believe that the following order of ranks, from smallest rank to highest rank, is mandated by the standard:

`bool`

`char`

,`signed char`

,`unsigned char`

`short`

,`unsigned short`

`int`

,`unsigned int`

`long`

,`unsigned long`

`long long`

,`unsigned long long`

where for a given type size, signed and unsigned types share their rank. I said
the rule of thumb as presented above loosely interprets the standard because the
standard does not explicitly mandate the size of `short`

, `int`

, `long`

, and
others. This freedom is to allow the implementers to represent the various
hardware architectures that exist. I think this is mostly an artifact of history,
since a lot of modern hardware is 32 or 64 bits, but it is still how
the standard is written. That said, it remains that on some machines, two types
could share the same size, e.g. on a particular architecture, `sizeof(long)`

could
be the same as `sizeof(int)`

. In such a case, the standard would still stipulate
that those types’ ranks are different. Specifically, in the example give, `long`

would still have a higher rank than `int`

.

### Same signedness

So, getting back to the mixed operations and the usual conversion, in the case of
two integral types with the same signedness, i.e. both `T1`

and `T2`

are signed
or both of them are unsigned, the standard mandates that the integer with the
smaller rank be converted (after promotion), to the integer with the higher rank.

```
long + int => long + (long) int
unsigned short * unsigned int => (unsigned int)unsigned short * unsigned int
...
```

The higher ranked integer will accommodate the values of the smaller ranked one
without problem, and there are no considerations of sign, so no possible loss of
value or overflow in the conversion (there is possible overflow in the *operation*,
but not in the *conversion*). This case is an easy one.

### Differing signedness, unsigned with larger or equal rank

In this case, the standard says that the signed integer will be converted to the unsigned type.

```
int + unsigned int => unsigned int(int) + unsigned int
short - unsigned int => (unsigned int)short - unsigned int
...
```

The fact that the operation then yields the correct answer is mandated by the
standard. In section **7.8 Integral conversions**, the standard says:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2

^{n}wherenis the number of bits used to represent the unsigned type). [Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note]

Because of the modulo 2^{n} arithmetic, this will give the correct
unsigned answer… most of the time. See the discussion in the
last section for an example where this rule yields
a surprising result.

This being the case, if you are putting the result of the operation in a variable, at this point, it is worth thinking about that variable’s type, because if that type is not the type of the unsigned operand (or larger unsigned integral type), you will incur a conversion. That is, while the operation is guaranteed to be correct by the standard, putting it back into anything but a large enough unsigned integral type might not yield the result you expect. In a smaller unsigned integral type, there is at least another modulo conversion happening. If the type is signed (whether it is large enough or not), then the result is implementation defined as stipulated by the standard, again in section 7.8:

If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.

The standard does not specify what happens in this case and instead gives latitude
to the compiler vendor saying the result is implementation defined. This means
that if you rely on this conversion, the behavior might not be portable (** not**
undefined as in the case of overflow, just not portable and tied to the compiler
you use). On two’s complement machines, this will
actually give you wrapping behavior, but relying on this is actually non portable
(even if, from what I understand, most hardware uses two’s complement these
days). On other architectures, the behavior will be different and so portable
code should not rely on the conversions without some kind of checks.

### Differing signedness, signed with larger rank, unsigned in range

Here, the standard says that the unsigned integral type is converted to the signed integral type.

```
long long int + unsigned long( value < long_long_int_max )
=> long long int + (long long int)unsigned long
```

Given that the unsigned integer is representable in the range of the signed
integer, the conversion will work as stipulated in section 7.8 of the
standard that I quoted in the previous part of this post (at least, that is my
understanding). So that should always give the correct answer since the unsigned
*value* is in range of the signed *type*.

### Differing signedness, signed with larger rank, unsigned not in range

Here, the standard says that both operands are converted to the unsigned type of
same rank as that of the signed integer in the operation. The unsigned should
be in range of the unsigned with the larger rank (i.e. the unsigned with same
rank as the signed in the operation, which is higher than that of the unsigned
in the operation). The signed one will be modulo 2^{n} converted.
Thus the result should be right given the modulo arithmetic, but with the usual
caveats of what you do with the result.

## Back to the first example

So coming back to the first example, let’s see if I can apply the rules to it.

```
float flt{15.f};
long lng_a{30L};
long lng_b = lng_a + flt;
```

According to the conversion rules, I would say that the `long`

value will first
be converted to `float`

to allow the addition, and that the resulting `float`

will be truncated^{[1]}, which is what the standard mandates in
section **7.10 Floating-integral conversions**:

A prvalue of a floating-point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

The numbers above are small enough that it just works as expected! This is probably true for a lot of use cases, which is why I think I can stand by my initial affirmation that “applying binary operations to different types might seem trivial in C++, because it mostly just works”.

## Keep informed

As mentioned in the previous post, there is a (controversial?) proposal that has been brought to the the C++ standards committee by JF Bastien which would make two’s complement the only allowed representation for signed integers. This could change some of the details of this article, namely the parts where conversion from unsigned to signed is implementation defined. So in C++20 or C++23, the information here could be out of date (already).

Also, because of conversions, the following assert will actually fire as the operation will yield false even if the mathematics would suggest otherwise:

```
assert( -1 < 0u )
```

That is because this is a case where both integers have the same rank (the `-1`

literal is `int`

and the `0u`

literal is `unsigned int`

), but differing
signedness. Here, according to the rules above, the signed integer is converted
to the unsigned integer, which means `-1`

becomes the largest unsigned integer,
which will not be smaller than 0. This kind of surprising behavior is currently
being discussed in the context of a proposal by Herb Sutter. Richard Smith
is proposing to bring
consistency between the new three-way comparison operator (a.k.a the *spaceship
operator* `<=>`

) and the usual C comparison operators. This might have no
impact on what I discussed here or might change it completely. I will admit
that I am aware of the proposal, but I have not had time to read it through.

In any case, the two proposals above, if they are adopted, will change some of what I discussed here, so keep informed if this matters to you!

## Notes

I would like to thank Patrice Roy for reading my post and giving me some advice on it. His time is greatly appreciated.

^{[1]} Here is a link to the code of the
first example in compiler explorer (put in a main function so it compiles). You
can see the `cvtsi2ss`

, `addss`

and `cvttss2si`

instructions which respectively
convert the `long`

to a `float`

, adds the resulting `float`

, and the `flt`

variable and converts back the result to a `long`

.

^{[2]} I believe assembly instructions, assembly code, machine code, and
opcodes are roughly the same (according to Wikipedia, some assembly instructions
do not map directly to opcodes, but most do). In the context of this post, I
don’t think it makes much of a difference. Thus, I use the terms interchangeably,
but I might be assuming a bit. I am out of my depths in this domain.

^{[3]} The official published document must be purchased from the ISO
organization, but the draft papers are freely available and can be found on the
web. For instance, a C++17 draft paper (the latest draft before publication I
believe, but I might be wrong) can be found
here.