Conversions and initialisation

Now that we have a rudimentary grasp of the type system and object model, we’re going to learn how conversions work, and two methods of initialisation.

Don’t forget to open this lesson’s Compiler Explorer session! Unlike most other chapters—which use Catch2—we shall be printing to screen, since that should help you visualise what’s actually happening.

Chapter table of contents

  1. Conversions and initialisation
    1. Objectives and outcomes
    2. Acknowledgements
    3. Promotions
    4. Narrowing conversions
    5. Mixing expressions with different types
      1. Mixing int and char in an expression
      2. int-to-double and double-to-int conversions
      3. Mixing ints and doubles
    6. Explicit conversions
    7. Minimising conversions
      1. Braced initialisation
      2. Automatic type deduction
    8. Feedback
    9. Summary

Objectives and outcomes

Objective (you will develop) Outcome (you will be able to)
an understanding of conversions
  • describe promotions and narrowing conversions.
  • recall the direction promotions take.
  • distinguish between implicit conversions and explicit conversions.
skills in writing conversion operations
  • synthesise code that uses:
    • promotions and explicit conversions
    • braced initialisation for built-in types
    • type deduction
    • expressions with mixed types
  • synthesise code that avoids implicit narrowing conversions.

Acknowledgements

Thank you to Vagrant Gautam, Lesley Lai, and Janet Cobb for providing feedback.

Promotions

Let’s start with the following program.

#include <fmt/format.h>

int main()
{
  char const letter = 'A'; // value is 65 in UTF-8
  fmt::print("letter: '{}'\n", letter);

  char const space = ' '; // value is 32 in UTF-8
  fmt::print("space: '{}'\n", space);

  fmt::print("letter + space: '{}'\n", letter + space); // 65+32 in UTF-8 represents 'a'
}

You should get the following output:

letter: 'A'
space: ' '
letter + space: '97'

We can’t directly add chars. Although we can add two char objects, the value we get back seems to be an int. What’s going on there?

An 8-by-4 grid of cells with the entire first row filled. The first row is as in Chapter 3. The second row has merged all eight of its cells into one big cell. Its type is `double`, and its value is π to eight decimal places.

We looked at how objects with built-in types are represented in memory using an abstract machine in Chapter 3. With that machine, we saw that int is bigger than char, and that double is bigger thanint. Also recall that values are bitwise representations of memory, with meaning given to them by types. Let’s take our value example from above and fit it into one of our object boxes.

An 8-by-1 grid of cells with the first cell filled. Its type is `char`, its value is `A`, and its binary representation is 0, 1, 0, 0, 0, 0, 0, 1, 0.

We only need one rows for this chapter. On this abstract machine, a byte is eight bits, which is consistent with most—if not all—modern computers. Our value only has seven significant bits, so it slips nicely into our char object, with a meaningless zero at the front.

In code, that’s initialising our char with the value 'A'.

char const letter = 'A';

Now, what happens when we initialise an int object with our letter?

int const number = letter;
fmt::print("number: {}\n", number);

You should see number: 65 appear in your program’s output. Looks like the number is 65, but the letter is 'A': are the two values equal?

fmt::print("letter == number: {}\n", letter == number);

The system claims that they have the same value, but the character output is printing something completely different for letter and number! Remember that a value is how we interpret a bit sequence, and that types are responsible for doing that interpretation on our behalf.

An 8-by-1 grid of cells with the first cell filled as before. The next three cells are empty, and the final four cells are filled in to represent the integer 65. It has both the decimal representation for humans, and the binary representation for memory.

If we return to our abstract machine, we see that the value of number has the same significant bits as as letter. Remember that leading zeroes are insignificant.

01000001
⇓
00000000 00000000 00000000 01000001

When we try to write one type’s value into an object of a different type—like we are here—we’re performing a conversion. This char-to-int conversion is a special kind of conversion called a promotion, because an int is guaranteed to be able to store every possible value that a char can hold, and it’s a lossless conversion: we lose zero information.

bool-to-int conversions are also a promotion, for the same reason.

Okay, so char-to-int conversions are promotions. What about int-to-char conversions? Are those possible?

Narrowing conversions

Let’s go back to the abstract machine.

An 8-by-1 grid of cells with the first cell filled as before. The next three cells are empty, and the final four cells are filled in to represent the integer 65. It has both the decimal representation for humans, and the binary representation for memory.

If we were able to put the bits representing 'A' into an int, and the bit sequence didn’t change, then it stands to reason that we can transfer those bits back, and keep the same value, right? Let’s find out.

char const another_letter = number;
fmt::print("another_letter: {}\n", another_letter);
another_letter: A

Okay, so it looks like this particular case worked out. What about if we have a much bigger number? Say, 30017, which needs fifteen bits to represent it?

int number = 30'017; // quote between digits is a digit separator
fmt::print("number: {}\n", number);

char const mystery = number;
fmt::print("mystery: {}\n", mystery);

number = mystery;
fmt::print("number: {}\n", number);
number: 30017
mystery: A
number: 65

There’s now more information in number than a char can store, so the compiler first needs to compute a value that fits into a char object by wrapping the value around until it’s small enough to fit into the char object. Because we’re cramming a large value into a small space, we’re performing a lossy conversion. In other words, we lose information when we convert an int into a char. That’s why we get 65 back when we assign mystery to number.

The wide binary representation of 30017 is visually being 'fit' into the smaller `char` object by showing lines 'narrowing' it.

We call such conversions narrowing conversions, because we are putting a value into a region that might be too small for it to fit. All int-to-char conversions are narrowing, even if the value will fit into the char object, and similarly for int-to-bool conversions.

Mixing expressions with different types

Mixing int and char in an expression

When we mix ints and chars in expressions, the compiler will automatically convert the char to an int, and then evaluate the result. For example, the expression 'A' + 32 has the type int.

int const number = 'A' + 32;
fmt::print("number: {}\n", number);

Of course, we can still store it in a char object.

char const letter = 'A' + 32;
fmt::print("letter: {}\n", letter);

Let’s revisit the char-plus-char example.

fmt::print("char + char: {}\n", 'A' + ' ');

Due to some really old rules we inherited from a language designed in the sixties, performing arithmetic on chars immediately promotes char operands to ints. One of the reasons for this is because single-byte arithmetic is really easy to overflow, so the result is stored in an object that won’t overflow because it’s bigger. We don’t get this treatment for ints, however: an int plus an int is still an int.

int-to-double and double-to-int conversions

We’ve devoted a fair chunk of time to the int/char relationship because it’s fairly simple, and their representations are related. What about int-to-double conversions?

int main()
{
  int const gross = 144;
  fmt::print("gross: {}\n", gross);

  double const rational_gross = gross;
  fmt::print("rational_gross * 2: {}\n", rational_gross * 2);
}
gross: 144
rational_gross: 144

It’s the same as before. Our abstract machine’s double object is eight bytes wide, which is twice as wide as an int. An int-to-double conversion will usually preserve all the information, but sometimes when the int’s magnitude is really big, a slight loss of precision can occur. I’ve never seen this happen on a modern, conventional PC, so I suspect it only happens on really exotic hardware: not the sort of stuff you’ll typically be using when learning C++. As a result, int-to-double conversions are considered narrowing conversions.

double-to-int conversions, on the other hand, have more interesting and immediate consequences.

double const e = 2.71828183;
fmt::print("e: {}\n", e);

int const e_as_int = e;
fmt::print("e_as_int: {}\n", e_as_int);
e: 2.71828183
e_as_int: 2

Converting a double with a fractional component truncates (or discards) the fractional bit. That’s why an int representation of e is two and not three. Instead of rounding, it’s discarding the fractional component entirely. double-to-int conversions are narrowing conversions, and we’re only allowed to convert doubles to ints if the value can be represented by an int. Anything outside these bounds (e.g. infinity, NaN, etc.) is a logic error.

#include <fmt/format.h>
#include <limits> // imports std::numeric_limits

int main()
{
  fmt::print("int min value: {}\n", std::numeric_limits<int>::min());
  fmt::print("int max value: {}\n", std::numeric_limits<int>::max());

  fmt::print("double min value: {}\n", std::numeric_limits<double>::min());
  fmt::print("double max value: {}\n", std::numeric_limits<double>::max());
  fmt::print("double lowest value: {}\n", std::numeric_limits<double>::lowest());
}
int min value: -2147483648
int max value: 2147483647
double min value: 2.2250738585072014e-308
double max value: 1.7976931348623157e+308
double lowest value: -1.7976931348623157e+308

We can query the min/max values for int, double, and char using std::numeric_limits, with the type we want to query for inside angle brackets.

Mixing ints and doubles

We discussed char and cross-type operations involving int and char earlier, and how they always convert chars to ints before evaluating the operation. Take a moment to think about the code below before you copy it across to Compiler Explorer. What do you think happens for expressions involving ints and doubles?

#include <fmt/format.h>

int main()
{
  int const one_gross = 144;
  double const average_hobbit_height = 0.915;
  fmt::print("one gross of hobbits standing on each other's shoulders: {}\n", one_gross * average_hobbit_height);
  fmt::print("average_hobbit_height == 1: {}\n", average_hobbit_height == 1);
}
one gross of hobbits standing on each other's shoulders: 13176.0
average_hobbit_height == 1: false

When mixing ints and doubles in expressions, our ints are converted to doubles, and so the resulting expression is a double expression.

Explicit conversions

What we’ve looked at so far in this chapter are called implicit conversions: that is, we’re converting values from one type to another, but we’re not stating that we ever intended to convert. This is often okay for lossless conversions: there’s no potential information loss, so we can get away without saying anything in many situations. Narrowing conversions, on the other hand, are a different story: it’s unclear as to whether or not you’re even aware that narrowing is happening.

Ordinarily, we should try to minimise the number of conversions that we make by choosing appropriate types ahead of time. While this is a nice ideal, we can’t completely avoid conversions. Fortunately, C++ has a nice, loud way to tell readers that your intention is to narrow: called static_cast. This static_cast operator is known as an explicit cast, and is the preferred way to perform conversions (if you need to do a conversion at all).

#include <fmt/format.h>

int main()
{
  double const average_hobbit_height = 0.915;
  fmt::print("average_hobbit_height: {}\n", static_cast<int>(average_hobbit_height));
  fmt::print("30'017 as char: '{}'\n", static_cast<char>(30'017));
}

We achieve this explicit cast by saying static_cast<destination_type>(expression_to_convert). It’s a bit of a mouthful, but it’s almost impossible to miss when reading. The above code should print the following output.

average_hobbit_height: 0
30'017 as char: 'A'

Minimising conversions

We’ve talked about how implicit lossless conversions are usually okay, and why implicit narrowing conversions are not. The previous section also hinted that it’s better to avoid conversions whenever possible. At this point, it would be reasonable for you to ask if there’s a way to clamp down on making implicit narrowing conversions. There are a few ways that we can put a stop to implicit narrowing conversions. We’ll look at two.

Braced initialisation

The first way is to replace the equals sign in our initialisation with braces ({ and }). This braced-initialization prohibits narrowing conversions. For now, just use it with built-in types: that is, don’t use it for std::string. In the program below, we attempt to initialise a char constant with an int variable.

#include <fmt/format.h>

int main()
{
  int whole_number{45};
  fmt::print("whole_number: {}\n", whole_number);

  char const letter{whole_number};
  fmt::print("letter: {}\n", letter);
}

We know from previous sections that char const letter = whole_number will work out okay, but now that we’re using this braced-initialisation, we get a compiler diagnostic1.

error: non-constant-expression cannot be narrowed from type 'int' to 'char' in initializer list [-Wc++11-narrowing]
  char const letter{whole_number};
                    ^~~~~~~~~~~~
note: insert an explicit cast to silence this issue
  char const letter{whole_number};
                    ^~~~~~~~~~~~
                    static_cast<char>( )

This is good news: it means that the compiler is helping us catch subtle mistakes that could lead to catastrophic run-time errors2. What about if whole_number was a constant? It turns out that for this very simple program, the compiler is able to determine that 45 will indeed fit in a char, and so it doesn’t give us a diagnostic. If the value wasn’t right in front of us, or the value were too big (say, 450), then the compiler would issue an error instead of a program.

Similarly, this error happens when we try to narrow chars to bools, and doubles to ints. This is a good thing: we’re limiting our ability to make mistakes by relying on the compiler.

#include <fmt/format.h>

int main()
{
  char const letter{'A'};

  bool const truth{letter};
  fmt::print("truth: {}\n", truth);

  int const zero{0.0};
  fmt::print("zero: {}\n", zero);
}
error: non-constant-expression cannot be narrowed from type 'char' to 'bool' in initializer list [-Wc++11-narrowing]
  bool const truth{letter};
                   ^~~~~~
note: insert an explicit cast to silence this issue
  bool const truth{letter};
                   ^~~~~~
                   static_cast<bool>( )
error: type 'double' cannot be narrowed to 'int' in initializer list [-Wc++11-narrowing]
  int const zero{0.0};
                 ^~~
note: insert an explicit cast to silence this issue
  int const zero{0.0};
                 ^~~
                 static_cast<int>( )

The types don’t need to be an exact match: promotions and other lossless conversions are allowed because we know that the information will never be lost.

#include <fmt/format.h>

int main()
{
  char const letter{'A'};
  fmt::print("letter: {}\n", letter);

  int const letter_as_int{letter};
  fmt::print("letter_as_int: {}\n", letter_as_int);

  int const truth_as_number{true};
  fmt::print("truth_as_number: {}\n", truth_as_number);

  double const letter_as_double{letter};
  fmt::print("letter_as_double: {}\n", letter_as_double);
}
letter: A
letter_as_int: 65
truth_as_number: 1
letter_as_double: 65

This braced initialisation is the first way to prevent implicit narrowing conversions. Let’s move on to the second approach.

Automatic type deduction

Up until now, we’ve been explicitly stating the type of each object.

int const meaning_of_life = 42;
double const pi = 3.14159265;
char const first_latin_letter = 'A';
bool const book_on_cxx = true;
std::string const book_name = "Applied Modern C++";

With the exception of std::string, we know that each type that we’ve looked at so far has its own literal. All literals are expressions, and all expressions have types. It’s reasonable to conclude that the literals we’re using at the moment have the same types as the ones we’re putting them beside. That is:

  • 42 has the type int
  • 3.14159265 has the type double
  • 'A' has the type char
  • true has the type bool

Since each literal has a type associated with it, spelling out the name of the type is redundant. We can instead let the compiler automatically deduce the type on our behalf, by looking at the type on the right-hand side of the initialisation.

#include <fmt/format.h>

int main()
{
  auto const meaning_of_life = 42;
  fmt::print("meaning_of_life: {}\n", meaning_of_life);

  auto const pi = 3.14159265;
  fmt::print("pi: {}\n", pi);

  auto const first_latin_letter = 'A';
  fmt::print("first_latin_letter: {}\n", first_latin_letter);

  auto const book_on_cxx = true;
  fmt::print("book_on_cxx: {}\n", book_on_cxx);
}
meaning_of_life: 42
pi: 3.14159265
first_latin_letter: A
book_on_cxx: true

These auto-declared types are identical to their explicit-type counterparts. The only difference is that we are asking the compiler to work out the type of our constant (or variable) based on the type of what’s on the right-hand side of the =. This almost reads like a mathematical description such as “let pi = 3.14159265”.

Relying on type deduction helps us produce programs that are more likely to be correct. In cases where we only care about working with integers, it’s less important that we specifically use int, and more important that we program against the interface that integers offer us. Code changes over time; when we code against interfaces and let the compiler choose types on our behalf, it means that maintenance updates more seamlessly propagate through our code. Another reason that this syntax improves correctness is because the compiler requires us to initialise our variables in order to deduce the type.

Our objects’ types are usually correct from the moment of declaration when we rely on type deduction. In cases where you really, really, really need to say what the type is, you can still do that. For example, to define a string using auto, we need to say the type on the right-hand side of the declaration:

auto const book_name = std::string("Applied Modern C++");

This sort of reads as “let the constant book_name be a std::string with the value "Applied Modern C++"”. Because std::string isn’t a built-in type, we need to specify the type. We can also say the type’s name when using auto.

auto const gross = int{144};

When the right-hand side of = isn’t a literal, the situation is pretty much the same. The objects that we’re defining have the same type as whatever is on the right-hand side.

auto const two_squared = std::pow(2, 2);

As we progress, we’ll see more reasons for preferring this syntax. We’ll be using auto on the left-hand side of all object declarations from now on. I encourage you to make liberal use of it in your own code, or to at least stick with it until you finish the series, and only then evaluate whether or not you like the style. New stuff can be scary, so I completely understand if you’re cautious at the moment (I was staunchly against using auto when it was first shown to me). Give it some time, and you should find that your apprehension will eventually fade.

Feedback

If you’d like to provide feedback regarding this series, please file an issue on GitHub.

If you’re interested in reading future chapters, subscribe to my RSS feed to receive a notification at the time of publication. If you’d previously subscribed to my feed on my old website (www.cjdb.com.au), please be sure to note the new domain!

Summary

This chapter broke down several kinds of conversions, which fell into two broad categories: implicit and explicit. We also discussed why conversions should be avoided, and strategies for avoiding them.


  1. A diagnostic is any message that the compiler communicates to the programmer, including error messages, warnings, and notes.↩︎

  2. While something such as an implicit conversion gone wrong might seem small, under the right circumstances, it can be a billion-dollar mistake.↩︎