Now that we have a rudimentary grasp of the type system and object model, we’re going to learn how conversions work, and two methods of initialisation.
Don’t forget to open this lesson’s Compiler Explorer session! Unlike most other chapters—which use Catch2—we shall be printing to screen, since that should help you visualise what’s actually happening.
Objective (you will develop) | Outcome (you will be able to) |
---|---|
an understanding of conversions |
|
skills in writing conversion operations |
|
Thank you to Vagrant Gautam, Lesley Lai, and Janet Cobb for providing feedback.
Let’s start with the following program.
#include <fmt/format.h>
int main()
{
char const letter = 'A'; // value is 65 in UTF-8
::print("letter: '{}'\n", letter);
fmt
char const space = ' '; // value is 32 in UTF-8
::print("space: '{}'\n", space);
fmt
::print("letter + space: '{}'\n", letter + space); // 65+32 in UTF-8 represents 'a'
fmt}
You should get the following output:
letter: 'A'
space: ' '
letter + space: '97'
We can’t directly add char
s. Although we can add two
char objects, the value we get back seems to be an int
.
What’s going on there?
We looked at how objects with built-in types are represented in
memory using an abstract machine in Chapter 3. With that machine, we saw
that int
is bigger than char
, and that
double
is bigger thanint
. Also recall that
values are bitwise representations of memory, with meaning given to them
by types. Let’s take our value example from above and fit it into one of
our object boxes.
We only need one rows for this chapter. On this abstract machine, a
byte is eight bits, which is consistent with most—if not all—modern
computers. Our value only has seven significant bits, so it slips nicely
into our char
object, with a meaningless zero at the
front.
In code, that’s initialising our char
with the value
'A'
.
char const letter = 'A';
Now, what happens when we initialise an int
object with
our letter
?
int const number = letter;
::print("number: {}\n", number); fmt
You should see number: 65
appear in your program’s
output. Looks like the number
is 65
, but the
letter
is 'A'
: are the two values equal?
::print("letter == number: {}\n", letter == number); fmt
The system claims that they have the same value, but the character
output is printing something completely different for
letter
and number
! Remember that a value is
how we interpret a bit sequence, and that types are responsible for
doing that interpretation on our behalf.
If we return to our abstract machine, we see that the value of
number
has the same significant bits as as
letter
. Remember that leading zeroes are insignificant.
01000001
⇓
00000000 00000000 00000000 01000001
When we try to write one type’s value into an object of a different
type—like we are here—we’re performing a conversion. This
char
-to-int
conversion is a special kind of
conversion called a promotion, because an int
is
guaranteed to be able to store every possible value that a
char
can hold, and it’s a lossless conversion: we lose zero
information.
bool
-to-int
conversions are also a
promotion, for the same reason.
Okay, so char
-to-int
conversions are
promotions. What about int
-to-char
conversions? Are those possible?
Let’s go back to the abstract machine.
If we were able to put the bits representing 'A'
into an
int
, and the bit sequence didn’t change, then it stands to
reason that we can transfer those bits back, and keep the same value,
right? Let’s find out.
char const another_letter = number;
::print("another_letter: {}\n", another_letter); fmt
another_letter: A
Okay, so it looks like this particular case worked out. What about if we have a much bigger number? Say, 30017, which needs fifteen bits to represent it?
int number = 30'017; // quote between digits is a digit separator
::print("number: {}\n", number);
fmt
char const mystery = number;
::print("mystery: {}\n", mystery);
fmt
= mystery;
number ::print("number: {}\n", number); fmt
number: 30017
mystery: A
number: 65
There’s now more information in number
than a
char
can store, so the compiler first needs to compute a
value that fits into a char
object by wrapping the value
around until it’s small enough to fit into the char
object.
Because we’re cramming a large value into a small space, we’re
performing a lossy conversion. In other words, we lose information when
we convert an int
into a char
. That’s why we
get 65
back when we assign mystery
to
number
.
We call such conversions narrowing conversions, because we
are putting a value into a region that might be too small for it to fit.
All int
-to-char
conversions are narrowing,
even if the value will fit into the char
object, and
similarly for int
-to-bool
conversions.
int
and char
in an expressionWhen we mix ints and chars in expressions, the compiler will
automatically convert the char
to an int
, and
then evaluate the result. For example, the expression
'A' + 32
has the type int
.
int const number = 'A' + 32;
::print("number: {}\n", number); fmt
Of course, we can still store it in a char
object.
char const letter = 'A' + 32;
::print("letter: {}\n", letter); fmt
Let’s revisit the char
-plus-char
example.
::print("char + char: {}\n", 'A' + ' '); fmt
Due to some really old rules we inherited from a language designed in
the sixties, performing arithmetic on chars immediately promotes
char
operands to int
s. One of the reasons for
this is because single-byte arithmetic is really easy to overflow, so
the result is stored in an object that won’t overflow because it’s
bigger. We don’t get this treatment for int
s, however: an
int
plus an int
is still an
int
.
int
-to-double
and double
-to-int
conversionsWe’ve devoted a fair chunk of time to the
int
/char
relationship because it’s fairly
simple, and their representations are related. What about
int
-to-double
conversions?
int main()
{
int const gross = 144;
::print("gross: {}\n", gross);
fmt
double const rational_gross = gross;
::print("rational_gross * 2: {}\n", rational_gross * 2);
fmt}
gross: 144
rational_gross: 144
It’s the same as before. Our abstract machine’s double
object is eight bytes wide, which is twice as wide as an
int
. An int
-to-double
conversion
will usually preserve all the information, but sometimes when the
int
’s magnitude is really big, a slight loss of precision
can occur. I’ve never seen this happen on a modern, conventional PC, so
I suspect it only happens on really exotic hardware: not the sort of
stuff you’ll typically be using when learning C++. As a result,
int
-to-double
conversions are considered
narrowing conversions.
double
-to-int
conversions, on the other
hand, have more interesting and immediate consequences.
double const e = 2.71828183;
::print("e: {}\n", e);
fmt
int const e_as_int = e;
::print("e_as_int: {}\n", e_as_int); fmt
e: 2.71828183
e_as_int: 2
Converting a double
with a fractional component
truncates (or discards) the fractional bit. That’s why an
int
representation of e
is two and not three.
Instead of rounding, it’s discarding the fractional component entirely.
double
-to-int
conversions are
narrowing conversions, and we’re only allowed to convert doubles to ints
if the value can be represented by an int
. Anything outside
these bounds (e.g. infinity, NaN
, etc.) is a logic
error.
#include <fmt/format.h>
#include <limits> // imports std::numeric_limits
int main()
{
::print("int min value: {}\n", std::numeric_limits<int>::min());
fmt::print("int max value: {}\n", std::numeric_limits<int>::max());
fmt
::print("double min value: {}\n", std::numeric_limits<double>::min());
fmt::print("double max value: {}\n", std::numeric_limits<double>::max());
fmt::print("double lowest value: {}\n", std::numeric_limits<double>::lowest());
fmt}
int min value: -2147483648
int max value: 2147483647
double min value: 2.2250738585072014e-308
double max value: 1.7976931348623157e+308
double lowest value: -1.7976931348623157e+308
We can query the min/max values for int
,
double
, and char
using
std::numeric_limits
, with the type we want to query for
inside angle brackets.
int
s and
double
sWe discussed char
and cross-type operations involving
int
and char
earlier, and how they always
convert char
s to int
s before evaluating the
operation. Take a moment to think about the code below before you copy
it across to Compiler Explorer. What do you think happens for
expressions involving int
s and double
s?
#include <fmt/format.h>
int main()
{
int const one_gross = 144;
double const average_hobbit_height = 0.915;
::print("one gross of hobbits standing on each other's shoulders: {}\n", one_gross * average_hobbit_height);
fmt::print("average_hobbit_height == 1: {}\n", average_hobbit_height == 1);
fmt}
one gross of hobbits standing on each other's shoulders: 13176.0
average_hobbit_height == 1: false
When mixing int
s and double
s in
expressions, our int
s are converted to
double
s, and so the resulting expression is a
double
expression.
What we’ve looked at so far in this chapter are called implicit conversions: that is, we’re converting values from one type to another, but we’re not stating that we ever intended to convert. This is often okay for lossless conversions: there’s no potential information loss, so we can get away without saying anything in many situations. Narrowing conversions, on the other hand, are a different story: it’s unclear as to whether or not you’re even aware that narrowing is happening.
Ordinarily, we should try to minimise the number of conversions that
we make by choosing appropriate types ahead of time. While this is a
nice ideal, we can’t completely avoid conversions. Fortunately, C++ has
a nice, loud way to tell readers that your intention is to narrow:
called static_cast
. This static_cast
operator
is known as an explicit cast, and is the preferred way to
perform conversions (if you need to do a conversion at all).
#include <fmt/format.h>
int main()
{
double const average_hobbit_height = 0.915;
::print("average_hobbit_height: {}\n", static_cast<int>(average_hobbit_height));
fmt::print("30'017 as char: '{}'\n", static_cast<char>(30'017));
fmt}
We achieve this explicit cast by saying
static_cast<destination_type>(expression_to_convert)
.
It’s a bit of a mouthful, but it’s almost impossible to miss when
reading. The above code should print the following output.
average_hobbit_height: 0
30'017 as char: 'A'
We’ve talked about how implicit lossless conversions are usually okay, and why implicit narrowing conversions are not. The previous section also hinted that it’s better to avoid conversions whenever possible. At this point, it would be reasonable for you to ask if there’s a way to clamp down on making implicit narrowing conversions. There are a few ways that we can put a stop to implicit narrowing conversions. We’ll look at two.
The first way is to replace the equals sign in our initialisation
with braces ({
and }
). This
braced-initialization prohibits narrowing conversions. For now,
just use it with built-in types: that is, don’t use it for
std::string
. In the program below, we attempt to initialise
a char
constant with an int
variable.
#include <fmt/format.h>
int main()
{
int whole_number{45};
::print("whole_number: {}\n", whole_number);
fmt
char const letter{whole_number};
::print("letter: {}\n", letter);
fmt}
We know from previous sections that
char const letter = whole_number
will work out okay, but
now that we’re using this braced-initialisation, we get a compiler
diagnostic1.
error: non-constant-expression cannot be narrowed from type 'int' to 'char' in initializer list [-Wc++11-narrowing]
char const letter{whole_number};
^~~~~~~~~~~~
note: insert an explicit cast to silence this issue
char const letter{whole_number};
^~~~~~~~~~~~
static_cast<char>( )
This is good news: it means that the compiler is helping us catch
subtle mistakes that could lead to catastrophic run-time errors2. What about if
whole_number
was a constant? It turns out that for this
very simple program, the compiler is able to determine that
45
will indeed fit in a char
, and so it
doesn’t give us a diagnostic. If the value wasn’t right in front of us,
or the value were too big (say, 450
), then the compiler
would issue an error instead of a program.
Similarly, this error happens when we try to narrow
char
s to bool
s, and double
s to
int
s. This is a good thing: we’re limiting our ability to
make mistakes by relying on the compiler.
#include <fmt/format.h>
int main()
{
char const letter{'A'};
bool const truth{letter};
::print("truth: {}\n", truth);
fmt
int const zero{0.0};
::print("zero: {}\n", zero);
fmt}
error: non-constant-expression cannot be narrowed from type 'char' to 'bool' in initializer list [-Wc++11-narrowing]
bool const truth{letter};
^~~~~~
note: insert an explicit cast to silence this issue
bool const truth{letter};
^~~~~~
static_cast<bool>( )
error: type 'double' cannot be narrowed to 'int' in initializer list [-Wc++11-narrowing]
int const zero{0.0};
^~~
note: insert an explicit cast to silence this issue
int const zero{0.0};
^~~
static_cast<int>( )
The types don’t need to be an exact match: promotions and other lossless conversions are allowed because we know that the information will never be lost.
#include <fmt/format.h>
int main()
{
char const letter{'A'};
::print("letter: {}\n", letter);
fmt
int const letter_as_int{letter};
::print("letter_as_int: {}\n", letter_as_int);
fmt
int const truth_as_number{true};
::print("truth_as_number: {}\n", truth_as_number);
fmt
double const letter_as_double{letter};
::print("letter_as_double: {}\n", letter_as_double);
fmt}
letter: A
letter_as_int: 65
truth_as_number: 1
letter_as_double: 65
This braced initialisation is the first way to prevent implicit narrowing conversions. Let’s move on to the second approach.
Up until now, we’ve been explicitly stating the type of each object.
int const meaning_of_life = 42;
double const pi = 3.14159265;
char const first_latin_letter = 'A';
bool const book_on_cxx = true;
std::string const book_name = "Applied Modern C++";
With the exception of std::string
, we know that each
type that we’ve looked at so far has its own literal. All literals are
expressions, and all expressions have types. It’s reasonable to conclude
that the literals we’re using at the moment have the same types as the
ones we’re putting them beside. That is:
42
has the type int
3.14159265
has the type double
'A'
has the type char
true
has the type bool
Since each literal has a type associated with it, spelling out the name of the type is redundant. We can instead let the compiler automatically deduce the type on our behalf, by looking at the type on the right-hand side of the initialisation.
#include <fmt/format.h>
int main()
{
auto const meaning_of_life = 42;
::print("meaning_of_life: {}\n", meaning_of_life);
fmt
auto const pi = 3.14159265;
::print("pi: {}\n", pi);
fmt
auto const first_latin_letter = 'A';
::print("first_latin_letter: {}\n", first_latin_letter);
fmt
auto const book_on_cxx = true;
::print("book_on_cxx: {}\n", book_on_cxx);
fmt}
meaning_of_life: 42
pi: 3.14159265
first_latin_letter: A
book_on_cxx: true
These auto
-declared types are identical to their
explicit-type counterparts. The only difference is that we are asking
the compiler to work out the type of our constant (or variable) based on
the type of what’s on the right-hand side of the =
. This
almost reads like a mathematical description such as “let pi = 3.14159265”.
Relying on type deduction helps us produce programs that are more
likely to be correct. In cases where we only care about working with
integers, it’s less important that we specifically use int
,
and more important that we program against the interface that integers
offer us. Code changes over time; when we code against interfaces and
let the compiler choose types on our behalf, it means that maintenance
updates more seamlessly propagate through our code. Another reason that
this syntax improves correctness is because the compiler requires us to
initialise our variables in order to deduce the type.
Our objects’ types are usually correct from the moment of declaration
when we rely on type deduction. In cases where you really, really,
really need to say what the type is, you can still do that. For example,
to define a string using auto
, we need to say the type on
the right-hand side of the declaration:
auto const book_name = std::string("Applied Modern C++");
This sort of reads as “let the constant book_name
be a
std::string
with the value
"Applied Modern C++"
”. Because std::string
isn’t a built-in type, we need to specify the type. We can also
say the type’s name when using auto
.
auto const gross = int{144};
When the right-hand side of =
isn’t a literal, the
situation is pretty much the same. The objects that we’re defining have
the same type as whatever is on the right-hand side.
auto const two_squared = std::pow(2, 2);
As we progress, we’ll see more reasons for preferring this syntax.
We’ll be using auto
on the left-hand side of all object
declarations from now on. I encourage you to make liberal use of it in
your own code, or to at least stick with it until you finish the series,
and only then evaluate whether or not you like the style. New stuff can
be scary, so I completely understand if you’re cautious at the moment (I
was staunchly against using auto
when it was first shown to
me). Give it some time, and you should find that your apprehension will
eventually fade.
If you’d like to provide feedback regarding this series, please file an issue on GitHub.
If you’re interested in reading future chapters, subscribe to my RSS feed to receive a notification at the time of publication. If you’d previously subscribed to my feed on my old website (www.cjdb.com.au), please be sure to note the new domain!
This chapter broke down several kinds of conversions, which fell into two broad categories: implicit and explicit. We also discussed why conversions should be avoided, and strategies for avoiding them.
A diagnostic is any message that the compiler communicates to the programmer, including error messages, warnings, and notes.↩︎
While something such as an implicit conversion gone wrong might seem small, under the right circumstances, it can be a billion-dollar mistake.↩︎