Functions

Functions—sometimes known as subroutines—allow us to package code into a named operation that makes it available for reuse. We’ve already defined one function in Chapters 2 and 4: the nullary function main, which serves as the entry point for a program.

In this chapter, we learn how to define other functions (and some of their subtleties), how C++ models functions, and the lifetime of objects.

Table of contents

  1. Functions
    1. Table of contents
    2. Objectives and outcomes
    3. Acknowledgements
    4. Function definitions
      1. Defining functions
      2. Functions need to be declared before they can be used
      3. Don’t forget the return statement!
        1. main is a special function
        2. We don’t return const objects
      4. Overloads: same name, different parameters
      5. Passing parameters as references-to-const
        1. Don’t use references in return types yet
      6. auto as a parameter
    5. Understanding how the computer sees functions
      1. The Collatz conjecture
      2. A call to satisfies_collatz_conjecture
      3. Looking inside a stack frame
      4. Tying this back to references
    6. Feedback
    7. Summary

Objectives and outcomes

Objective (you will develop) Outcome (you will be able to)
an understanding of functions and skills in writing code that leverages functions
  • synthesise function definitions, including overloads.
  • identify a function’s return type, parameters, and name.
  • identify the difference between a value parameter and a reference parameter.
  • explain how the computer models and executes function calls.
  • outline how function overloads are chosen.
  • identify appropriate times to overload functions.
  • outline the differences between main and all other functions.

Acknowledgements

Thanks to Janet Cobb, Juliette H. L., Vagrant Gautam, Nicole Mazzuca, Arien Judge, and Tom Kunc, for providing technical feedback, and to Maren Pan for providing artistic advice.

Function definitions

Defining functions

Let’s suppose that we wanted to compute the hypotenuse of a right-angle triangle. The formula for this is $$c = \sqrt{a^{\ 2} + b^{\ 2}}$$ In code, this would look like

auto hypotenuse(double const a, double const b) -> double
{
  return std::sqrt((a * a) + (b * b)); // std::sqrt is in <cmath>
}

There are a few differences to main, but the structure is fairly similar. The key differences are that we have replaced int with auto, have a parameter list inside the parentheses, then follow up with -> double to specify the return type, and finally, we have a return-statement inside the function’s body.

Functions need to be declared before they can be used

Before a function can be called, it needs to be declared. A function declaration is like an announcement to the compiler that a function exists. Unlike languages such as C♯ and Java, the position of a function matters. For example, this program fails to compile, because the compiler doesn’t yet know about the existence of hypotenuse.

TEST_CASE("hypotenuse")
{
  CHECK(hypotenuse(3.0, 4.0) == 5.0);
}

auto hypotenuse(double const a, double const b) -> double
{
  return std::sqrt((a * a) + (b * b));
}
error: use of undeclared identifier 'hypotenuse'
  CHECK(hypotenuse(3.0, 4.0) == 5.0);
        ^
error: use of undeclared identifier 'hypotenuse'
error: use of undeclared identifier 'hypotenuse'
3 errors generated.

You can ignore the repeated diagnostics: in order to deliver a high-quality test framework, Catch2 does some behind-the-scenes work that tricks the compiler into thinking the function is called three times, even though it’s only called once.


The solution is to put the function definition before its first use.

auto hypotenuse(double const a, double const b) -> double
{
  return std::sqrt((a * a) + (b * b));
}

TEST_CASE("hypotenuse")
{
  CHECK(hypotenuse(3.0, 4.0) == 5.0);
}
===============================================================================
All tests passed (1 assertion in 1 test case)

Don’t forget the return statement!

If we comment out the return statement, then we’ll get three errors.

auto hypotenuse(double const a, double const b) -> double
{
  // return std::sqrt((a * a) + (b * b));
}
error: unused parameter 'a' [-Werror,-Wunused-parameter]
auto hypotenuse(double const a, double const b) -> double
                             ^
error: unused parameter 'b' [-Werror,-Wunused-parameter]
auto hypotenuse(double const a, double const b) -> double
                                             ^
error: non-void function does not return a value [-Werror,-Wreturn-type]
}
^

The third error is the interesting one in this context. It’s telling us that our function doesn’t return something, despite our interface saying that we’ll return a double. We get this error because our current setup is configured to have the compiler fail and issue a diagnostic when we forget it. If we turn off all of the safety mechanisms, the program would have compiled, and then crashed when we ran it.

If we don’t have a value that we wish to return, then we can use the type void to indicate as much.

#include <catch2/catch_test_macros.hpp>
#include <cmath>
#include <fmt/format.h>
#include <string>

auto hypotenuse(double const a, double const b) -> double
{
  return std::sqrt((a * a) + (b * b));
}

auto log(std::string const& message) -> void
{
  fmt::print("Checking '{}'\n", message);
}

TEST_CASE("hypotenuse")
{
  auto const triples = std::string("pythagorean triples");
  log(triples);
  CHECK(hypotenuse(3.0, 4.0) == 5.0);
}

void is a special type that is used almost exclusively as a placeholder to indicate “we won’t be returning anything”. It’s important to understand, however, that a void function does still return: it just doesn’t have a return value. Some languages—such as Rust and Haskell—spell void as () and call it the ‘unit type’

main is a special function

At this point, you’re probably wondering why our usage of main doesn’t have a return-statement. We already know that main has some special rules attached to it, since it must have a return type of int. Another one of those rules is that if we reach the end of main and there isn’t a return-statement, the compiler will insert a return 0; on our behalf. It’s a programming convention to have main return 0 for “successful” program runs, and all other values as an “unsuccessful” run. Although nothing in our program can use that value, the operating system gets that value back and can expose it through various means (e.g. a shell).

Two other rules that are of interest are that we’re not allowed to call main, and that there can only be one overload of main.

We don’t return const objects

Something important to notice is that none of the return types in this chapter (or in this book) will ever take the form <type> const: they’ll always be <type>. const-qualifying a return type doesn’t achieve anything useful (it doesn’t force the caller to make a const object), and in many cases, will inhibit optimisation opportunities.

Overloads: same name, different parameters

Suppose that we’re sick of writing fmt::print("{}\n", x) and instead want to make a function so that we never forget to write the newline. We could do this:

auto println_int(int const x) -> void
{
  fmt::print("{}\n", x);
}

auto println_double(double const x) -> void
{
  fmt::print("{}\n", x);
}

auto println_char(char const x) -> void
{
  fmt::print("{}\n", x);
}

auto println_bool(bool const x) -> void
{
  fmt::print("{}\n", x);
}

auto println_string(std::string const& x) -> void
{
  fmt::print("{}\n", x);
}

This is kind of tedious and error-prone because we are embedding the type into the function’s name. It would be much better if we could instead just call the function println, right? C++ allows us to do this by creating overload sets: that is, a set of functions that share the same name, but have different parameters. A single function in an overload set is called an “overload”.

auto println(int const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(double const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(char const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(bool const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(std::string const& x) -> void
{
  fmt::print("{}\n", x);
}

Overloads don’t need to have the same number of parameters: we can have overloads that take fewer parameters, and we can have ones that take more. We can also change the return type of an overload!

auto println() -> void
{
  fmt::print("\n");
}

auto println(std::string const& x, std::string const& y) -> void
{
  fmt::print("{} {}\n", x, y);
}

auto println(int const x, int const y) -> int
{
  fmt::print("{} {}\n" x, y);
  return 2; // returning the number of parameters
}

We aren’t able to overload solely based on return types though.

auto println(int const x) -> void
{
  fmt::println("{}\n", x);
}

auto println(int const x) -> int
{
  fmt::println("{}\n", x);
  return 1;
}

This will produce the error below.

error: functions that differ only in their return type cannot be overloaded
auto println(int const x) -> int
     ^
note: previous definition is here
auto println(int const x) -> void
~~~~ ^

Similarly, we can’t overload on top-level constness. These two functions are the same from the compiler’s perspective: both println(int).

auto println(int const x) -> void
{
  fmt::println("{}\n", x);
}

auto println(int x) -> void
{
  fmt::println("{}\n", x);
}
error: redefinition of 'println'
auto println(int x) -> void
     ^
note: previous definition is here
auto println(int const x) -> void
     ^

Overloads are chosen by the compiler based on how “close” of a “match” the arguments are to an overload’s parameters. This process is known as overload resolution, and its finer points are an expert topic, but there are some things we can learn about it now. Suppose that we only defined the following println overloads.

auto println(int const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(std::string const& x) -> void
{
  fmt::print("{}\n", x);
}

int main()
{
  println(0) ;
  println("hello");
  println(0.0);
}

The first call to println has the argument 0, which has the type int. There happens to be an overload for println that takes a single int as a parameter: this is an exact match, and so the compiler chooses the println(int) overload. The second call to println has the argument "hello". This is a string literal: we don’t have an overload that directly takes the string literal type, so there are no direct matches. We do, however, have the overload println(std::string const&). Since the compiler can implicitly convert the string literal to a std::string1, it does that, and then calls println(std::string const&). The third call is more or less the same as the second in principle, except that double narrows to int.

A problem arises, however, when there are no exact matches and multiple overloads that could be chosen. Which overload is chosen for println(0.0) when we add the following overload to our overload set?

auto println(int const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(std::string const& x) -> void
{
  fmt::print("{}\n", x);
}

auto println(char const x) -> void
{
  fmt::print("{}\n", x);
}

int main()
{
  println(0.0);
}

The answer is none of them. There’s no println(double) overload, so the compiler looks for other overloads that it might be able to convert double to, and finds two such overloads: println(int) and println(char). Both of these are considered to be “equal” choices in the compiler’s eyes, and so the best that it can do is provide a diagnostic and ask the programmer to make an explicit conversion. When the compiler can’t choose a specific overload, we call the overload set ambiguous.

error: call to 'println' is ambiguous
  println(0.0);
  ^~~~~~~
note: candidate function
auto println(int const x) -> void
     ^
note: candidate function
auto println(char const x) -> void
     ^

Passing parameters as references-to-const

You may have noticed that the previous two sections both passed strings as std::string const&, rather than as std::string const like we’ve used for the built-in types. This is known as passing by reference-to-const. A reference is an alias to an object, sort of like a reference at the end of a paper or a hyperlink on the web. Rather than copying the entire contents into your function, you’re saying “when I say message, I really mean the std::string object triples from TEST_CASE("hypotenuse")”.

The reason we want to pass by reference-to-const is because copying is both a slow thing for computers to do, and because it means that our program will use more memory (often unnecessarily).

auto println_value(std::string const message) -> void
{
  fmt::print("{}\n", message.size());
}

auto println_reference(std::string const& message) -> void
{
  fmt::print("{}\n", message.size());
}

Whenever you make a claim about performance, you always need to back it up with a benchmark. println_reference(triples) is roughly ten times faster than println_value(triples), and it’s only nineteen bytes long. As the length of a string increases, so too, does the delta between passing-by-value and passing-by-reference: a string that’s the same length as this chapter’s manuscript (about 15kB) shows println_reference as 140x faster!

A lot of the code won’t make sense right now, but if you’re interested in seeing the raw numbers for yourself, you can find them on quick-bench.com. As indicated before: performance is only a concern if you’ve measured the results, so we’ll be running benchmarks every now and again. Once we’ve covered all the material needed to understand how to write benchmarks, we’ll learn how to write our own.

We need to cover two more things, and then we’ll explore why println_reference outpaces println_value by so much.

Don’t use references in return types yet

There are some rules that we must follow when doing auto f(std::string x) -> std::string const&. For now, always return by value, and we’ll revisit returning references in Module 3.

auto as a parameter

We defined the following five println overloads much earlier on.

auto println(int const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(double const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(char const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(bool const x) -> void
{
  fmt::print("{}\n", x);
}

auto println(std::string const& x) -> void
{
  fmt::print("{}\n", x);
}

The function bodies are all identical, so it’s still somewhat tedious and error-prone for us to keep so many definitions around. We don’t need to write out and maintain all of these overloads by hand: we can instead get the compiler to generate overloads as necessary:

auto println(auto const& x) -> void
{
  fmt::print("{}\n", x);
}

int main()
{
  println(144);
  println(0.915);
  println('F');
  println(true);
  println(std::string("Elbereth Gilthoniel"));
}

Each of the above calls will instruct the compiler to generate a println overload for the type that we passed as a parameter. Because we intend to pass std::string objects to println, we need to make sure that x is a reference-to-const.

You might think that using auto parameters for everything is the natural direction that we’d be taking, but this isn’t the case. Unlike constants and variables, we prefer spelling out parameters’ types whenever possible. Similarly to returning references, we’ll explore why in detail in Module 3. The general rule of thumb for now is to only parameterise your type when you have multiple function bodies that are textually identical.

Understanding how the computer sees functions

Why is passing a string object by reference so much faster than when we pass it by value? Well, this all comes down to how functions themselves are represented in memory. When we call a function, the computer needs to carve out some space in memory for the function. This region of memory is called a stack frame. Many languages have a concept of “the stack”, and it’s incredibly similar to that.

The Collatz conjecture

To demonstrate this, we’ll use the Collatz conjecture. The Collatz sequence is an algorithm that takes an integer n and either halves it if is n even, or triples it and adds 1 if it n is odd. Mathematically, this looks like $$f(n) = \begin{cases}\frac{n}{2} & \text{if } n \equiv 0\ (\text{mod } 2),\\ 3n + 1 & \text{if } n \equiv 1\ (\text{mod } 2).\end{cases}$$ The Collatz conjecture is a question that has remained unsolved, which asks whether or not the Collatz sequence eventually reaches 1 for all positive integers. We can express a naive implementation of the conjecture as follows:

auto satisfies_collatz_conjecture(int const n) -> bool
{
  return (n <= 0)     ? false
       : (n == 1)     ? true
       : (n % 2 == 0) ? satisfies_collatz_conjecture(n / 2)
       :                satisfies_collatz_conjecture(3 * n + 1);
}

satisfies_collatz_conjecture is a recursive function that uses the conditional operator. The conditional operator allows programs to make decisions in an expression, and is similar to Python’s x if condition else y and Haskell’s if condition then x else y. We write it as condition ? x : y in C++. The program checks the value of condition and then evaluates one of x or y, depending on whether condition is respectively true or false. This Compiler Explorer session demonstrates simple examples of the operator.

A call to satisfies_collatz_conjecture

Ordinarily, we wouldn’t need to switch Compiler Explorer sessions in the middle of a chapter, but we need to use int main() for our examples to be simple, and we can’t when we’re using Catch2. Let’s consider the following listing.

int main()
{
  satisfies_collatz_conjecture(1);
  satisfies_collatz_conjecture(2);
}

If we were to pause program execution just before we call satisfies_collatz_conjecture, then the call stack will have a stack frame for main that sits on top of the operating system.

When we resume program execution, we call satisfies_collatz_conjecture(1) , and push a satisfies_collatz_conjecture frame to the top of the call stack.

Since n == 1, we have a base case, and immediately return to main with the value true. When we return back to main, we pop the satisfies_collatz_conjecture frame from the top of the call stack.

The second call to satisfies_collatz_conjecture will have some more interesting effects. As before, when we call satisfies_collatz_conjecture(2), we push a satisfies_collatz_conjecture frame on to the call stack.

We don’t have a base-case this time, and so we need to call satisfies_collatz_conjecture(n / 2). We again push a satisfies_collatz_conjecture frame on to the stack, because we need to preserve the information that the first call has.

This second call has n == 1, so it immediately returns true, and we pop the top frame off the call stack.

From here, we return the value we got back from our second call to main, and pop the top frame off.

Finally, since we’re not going to make any further calls to satisfies_collatz_conjecture, main returns, and we exit to the operating system.

Looking inside a stack frame

Now that we understand how stack frames are added when functions are called and removed when functions return, we should look at what a stack frame looks like from the memory perspective in our fictional abstract machine. For brevity, we’ll consider only the largest snapshot from the previous section. Every frame contains metadata that says who the function is, and who called it (and where they called it from). The first bit is used by any potential callees, and the latter is important for knowing where to return to. The top frame then has an int that’s used to store the parameter n. The next byte is used to store the result of any satisfies_collatz_conjecture calls we might make (it goes unused because we don’t make any further calls into satisfies_collatz_conjecture, but the compiler has no way of knowing this). The remaining three bytes don’t get used.

The middle frame also has a parameter object that is distinct from the top object. Its result object is separate from the top frame, and will contain what’s returned from the satisfies_collatz_conjecture that just returned.

The final frame is for main. It doesn’t have any parameter objects, and it has one (not two) result object. Since the result of the first function goes unused, the result of the second call can go in the same spot (thereby not needing to occupy another cell).

Putting this all together, our call stack looks like this. The operating system bit has been dropped off because that’s not relevant to this section.

Tying this back to references

We now have all the pieces in place to understand why passing a strings as references-to-const are expeditious, as opposed to when they’re passed by value. Suppose that we have the following code:

auto println_value(std::string const message) -> void
{
  fmt::print("{}\n", message);
}

auto println_reference(std::string const& message) -> void
{
  fmt::print("{}\n", message)
}

int main()
{
  auto const mushrooms = std::string("Grip, Fang, Wolf: all good boys!");
  println_value(mushrooms);
  println_reference(mushrooms);
}

Our main function will look like this:

When we call println_value, we copy the contents of our string as we pass it mushrooms to the function. That means that every character of mushrooms needs to be recreated in message, and so its stack frame looks incredibly similar.

This might be made more clear if we look at the full call stack when we’re inside println_value.

As the length of the string grows, so too, does the time required to copy its value. However, when we call println_reference, we do not duplicate the contents of mushrooms: instead, we’re telling the compiler that when we say message, we really mean mushrooms. The length of the string doesn’t influence anything in this case.

Similarly, here is what the full call stack looks like while calling println_reference.

For this reason, we pass large types and types that can grow in size (like std::string) by reference-to-const. We will give a first definition of what a “large type” is in Chapter 6, and properly explore what that means in Module 4 or beyond.

Feedback

If you’d like to provide feedback regarding this series, please file an issue on GitHub.

If you’re interested in reading future chapters, subscribe to my RSS feed to receive a notification at the time of publication. If you’d previously subscribed to my feed on my old website (www.cjdb.com.au), please be sure to note the new domain!

Summary

This chapter introduced function definitions, overloads, immutable references, and parameterised functions. We learnt that we always need to provide a return statement for functions that don’t have a void return type, with main being the sole exception. We learnt how computers call functions at a machine level, and the implications this has when writing code. Finally, we lightly touched on how an overload is selected, and what it means to have an ambiguous overload set.


  1. It’s considered okay to implicitly convert a string literal into a std::string. We’ll revisit this in the next chapter.↩︎