A Prime Opportunity for Ranges

It’s been a while since my last blog. This isn’t the blog that I promised, but it’s still an important one. If you haven’t read Prepping Yourself to Conceptify Algorithms and Transforming std::find into std::ranges::find, please do go and read those: there are a lot of assumptions.

Motivation: The limitations of vanilla C++98 through C++17

Take a look at the toy code demoed below, and try to work out what it is that I’m intending to do.

// Vanilla C++17 code below
#include <cmath>
#include <iostream>

constexpr bool divides(int const x, int const y) noexcept { return y % x == 0; } // not to be confused with std::divides
constexpr bool is_even(int const x) noexcept { return divides(2, x); }
constexpr bool is_odd(int const x) noexcept { return not is_even(x); }
void handle_stream_state(std::istream& in, std::ios_base::iostate const validbits);

int main()
{
   for (auto input = 0; std::cin >> input;) {
      if (input == 2 or (input > 2 and is_odd(input))) {
         auto is_prime = true;
         switch (input) {
         case 2: [[fallthrough]];
         case 3:
            break;
         default:
            if (divides(3, input)) {
               is_prime = false;
               break;
            }

            for (auto n = 5; n * n < input; n += 6) {
                if (divides(n, input) or divides(n + 2, input)) {
                    is_prime = false;
                    break;
                }
            }
         }

         if (is_prime) {
            std::cout << input << '\n';
         }
      }
   }

   handle_stream_state(std::cin, std::ios_base::eofbit);
}

Read integers from the character input while the stream is good.

If input is odd (but not 1), or 2, then set is_prime to true.

If input is 2 or 3, then do nothing.

Otherwise, starting from 5, loop until n isn’t less than the square-root of input, with a step of 2, and exit the loop if n divides input (also set is_prime to false).

If is_prime is true, print out the input, plus a new line.

After the loop, check the state of std::cin.

Program terminates.

If this sounds like the first run-through of your analysis, then you’ve fallen victim to the limitations of imperative programming. You might have eventually had an ‘oh, I see, the program does X’, moment too. The above code doesn’t describe what is happening: rather it describes how the computation is to be executed. A description of what is intended in the above code would read more like this.

Print all prime numbers read from the character input, stopping only when the EOF character is read. Report if an error came up.

The difference between the two summaries is that the former description never states what the programmer wants to achieve, which makes it difficult to determine their intention, and painful to determine whether or not their code is actually correct. This leads us to a question:

Can we do better?

Certainly! There’s two paths that we can take: refactor the code into functions, or model our program in a different way. One scales; the other does not.

Refactoring: A first attempt

// Vanilla C++17 below
#include <cassert>
#include <cmath>
#include <iostream>
#include <stdexcept>

constexpr bool divides(int const x, int const y) noexcept { return y % x == 0; } // not to be confused with std::divides
constexpr bool is_even(int const x) noexcept { return divides(2, x); }
constexpr bool is_odd(int const x) noexcept { return not is_even(x); }

constexpr bool is_prime(int const input)
{
   if (input == 2 or (input > 2 and is_odd(input))) {
      auto is_prime = true;
      switch (input) {
      case 2: [[fallthrough]];
      case 3:
         break;
      default:
         if (divides(3, input)) {
            is_prime = false;
            break;
         }

         for (auto n = 5; n * n < input; n += 6) {
            if (divides(n, input) or divides(n + 2, input)) {
               is_prime = false;
               break;
            }
         }
      }

      if (is_prime) {
         return true;
      }
   }

   return false;
}

void handle_stream_state(std::istream& in, std::ios_base::iostate const validbits);

int main()
{
   try {
      std::cin.exceptions(std::ios_base::badbit);
      for (auto input = 0; std::cin >> input;) {
         if (is_prime(input)) {
            std::cout << input << '\n';
         }
      }

      handle_stream_state(std::cin, std::ios_base::eofbit);
   }
   catch (std::runtime_error const& e) {
      std::cerr << e.what() << '\n';
      return std::cin.bad() ? 1 : 2;
   }
}

main is a bit nicer to read, and it will be easier to deduce what the programmer’s intention is. That doesn’t mean that the program is clearer overall: all we’ve done is shift the work into another function (that might end up being banished to a library).

What if it were possible to describe what we’re after without needing to tell the computer exactly how to go about doing it? But first, an aside.

Programming paradigms

A paradigm is a way of thinking, and a programming paradigm is a way to classify a programming language based on how it enables us to think[1]. There are two general programming paradigms:

Imperative programming, which is stateful, and describes how computations are to be executed in order to solve some problem. Languages such as C and Pascal are imperative languages. C++ lets us write imperative programs (although we don’t need to do so).
Declarative programming, which describes what the solution to a problem is, and how to achieve it. Declarative programming is split into many categories, including, but not strictly limited to:
- Markup languages, such as HTML, LaTeX, and Markdown, are a way to annotate a text document to describe what you would like to communicate. Nowhere, in a Markdown file, do you consider yourself explaining how your web browser should render bold text, yet it does. This is because you don’t really care about how the software achieves it; only that it does.
- Functional programming, which focuses on computation as evaluating mathematical functions. State is avoided wherever possible, and immutable expressions are provided instead. Languages that primarily focus on functional composition include Haskell, Lisp, Scala, Kotlin, and Idris.
- Logic programming, which is based around formal logic. The building blocks for imperative programming and functional programming are respectively statements and expressions: for logic programs, the building blocks are facts and rules. Prolog is an example of a logic programming language.

Back to the prime number program

Many C++ programs are written in an imperative format, but we don’t need to do this. It’s possible to functionally compose C++ programs by relying on lightweight abstractions (which C++ has been designed to do well). Right now, how these are implemented is of less importance than getting out a program that does what we want (we’ll look at performance later).

Vanilla C++17 doesn’t let us achieve this, so we’ll get some help from the third-party library, range-v3, which implements ranges and range adaptors in a fashion that is compatible with C++14 and C++17.

// Using C++17 with range-v3
#include <iostream>
#include <range/v3/all.hpp>

constexpr bool divides(int const x, int const y) noexcept { return y % x == 0; } // not to be confused with std::divides
constexpr bool is_even(int const x) noexcept { return divides(2, x); }
constexpr bool is_odd(int const x) noexcept { return not is_even(x); }

void handle_stream_state(std::istream& in, std::ios_base::iostate const validbits);

int main()
{
   namespace view = ranges::view;
   auto const divisible_by_2_or_3 = [](auto const x) noexcept {
      return (x != 2) and (x != 3) and (x < 2 or is_even(x) or divides(3, x)); };

   auto const non_prime = [divisible_by_2_or_3](auto const x) noexcept {
      auto divisors = view::iota(5)
                    | view::stride(6)
                    | view::take_while([x](auto const d) noexcept { return d * d < x; });
      return divisible_by_2_or_3(x) or
         ranges::any_of(divisors, [x](auto const d) noexcept {
            return divides(d, x) or divides(d + 2, x);
         });
   };

   auto primes = ranges::istream<int>(std::cin) | view::remove_if(non_prime);
   handle_stream_state(std::cin, std::ios_base::eofbit)
   ranges::copy(primes, ranges::ostream_iterator<int>(std::cout, "\n"));
}

If this looks like foreign magic to you right now, don’t worry. First, note that we don’t actually describe how computation should look at any point in our program. There’s no state at all: just a handful of declarations and function calls.

What is going on?

You might be scratching your head if you’ve not seen this style of programming. Let’s try to change that. To begin, let’s translate the program to English.

Let divisible_by_2_or_3 be a unary predicate that checks if its input is divisible by 2 or 3, but isn’t equal to 2 or 3.

Let non_prime be a unary predicate that checks if its input satisfies divisible_by_2_or_3, or is divisible by any of the integers in the set {5, 7, 11, 13, 17, 19, ...}, up to the square root of the input.

Let primes be a range of integers read from the character input, where any that satisfy non_prime are removed from the range.

Copy primes to the character output, such that each integer is on a separate line.

Notice that we’ve four declarations and an expression: no explicitly managed state! This is a high-level description of what we want our program to do. There’s no mention of how we get integers from the character input, nor how we remove integers that aren’t prime. Similarly, our non_prime is quite short when compared to the larger is_prime in the previous sections.

Note that state is still present, but it’s behind the scenes. For example, we’re querying the state of std::cin, but not actually in charge of what happens.

Range adaptors

According to The One Ranges Proposal, which seeks to incorporate ranges into C++20, range adaptors

are utilities that transform a Range into a View with custom behaviours. These adaptors can be chained to create pipelines of range transformations that evaluate lazily as the resulting view is iterated.

…

The bitwise or operator is overloaded for the purpose of creating adaptor chain pipelines. The adaptors also support function call syntax with equivalent semantics.

We’ve already covered what a Range is, but not a View. A View is a Range that has constant-time copy, move, and assignment operators. This means that a pair of iterators in some wrapper is a view, because iterators are cheaply copyable, movable, and assignable. A container is not a view, because it copies each element when the container is copied, which is at least a linear operation.

The term ‘lazily’ means that no computation is performed until it is requested. This is in contrast to an eager operation, which is evaluated immediately. Eager operations are the motivators for lazy ones.

We use five range adaptors:

Range adaptor	Description
`view::iota`	Lazily creates a range of integers from the provided number to inifinity.
`view::remove_if`	Lazily discards an element in the range if it satisfies the given predicate.
`view::stride`	Lazily moves n elements away from the current element. Similar to `i += n`.
`view::take_while`	Lazily takes elements from a range while the given predicate is satisfied.
`ranges::istream`	Wraps an `istream` object inside a range adaptor so that it can be used like a range.

What’s the bitwise or doing? Chaining???

auto divisors = view::iota(5)
              | view::stride(6)
              | view::take_while([x](auto const d) noexcept { return d * d < x; });
// ...
auto primes = ranges::istream<int>(std::cin) | view::remove_if(non_prime);

If you’re familiar with the Unix shell or Microsoft’s PowerShell, then you’ll be familiar with the pipe operator. If you’re not familiar with either, then in a nutshell, the pipe operator allows us to feed data from one operation to another without need for any intermediate state. This is known as a composition operator, and overloads the bitwise or operator.

The former reads

Compose a range of integers called divisors, which starting from 5, contains every sixth integer, and continues while the square of the nth integer is less than x.

You can read the latter as

Compose a range that reads integers from the character input and then removes integers if they satisfy non_prime.

And how is that laziness evaluated? What are the eager operations?

Both ranges::copy and ranges::all_of are eager operations. The former has the same semantics as std::copy, but takes a range: that is, it copies the contents of one range to another. The latter works like std::all_of, but takes a range.

Almost the same, but not quite

Just as it’s possible to express an imperative program in different ways, it’s also possible to express our functionally-composed program differently.

// Using C++17 with range-v3
#include <iostream>
#include <range/v3/all.hpp>

constexpr bool divides(int const x, int const y) noexcept { return y % x == 0; }; // not to be confused with std::divides
constexpr bool is_even(int const x) noexcept { return divides(2, x); };
constexpr bool is_odd(int const x) noexcept { return not is_even(x); };

int main()
{
   namespace view = ranges::view;
   auto const divisible_by_2_or_3 = [](auto const x) noexcept {
      return (x != 2) and (x != 3) and (x < 2 or is_even(x) or divides(3, x)); };

   auto const is_prime = [divisible_by_2_or_3](auto const x) noexcept {
      auto divisors = view::iota(5)
                    | view::stride(6)
                    | view::take_while([x](auto const d) noexcept { return d * d < x; });
      return not divisible_by_2_or_3(x) and
         ranges::none_of(divisors, [x](auto const d) noexcept {
            return divides(d, x) or divides(d + 2, x);
         });
   };

   auto primes = ranges::istream<int>(std::cin) | view::filter(is_prime);
   ranges::copy(primes, ranges::ostream_iterator<int>(std::cout, "\n"));
}

Not that different, eh? The difference between this program and the previous is that instead of having a predicate not_prime, we have a predicate is_prime, which checks for the complement condition. To compensate for this, we turn our remove_if into a filter, which only accepts elements of ranges that satisfy the predicate (so it’s also a complement).

I want the semantics, but I don’t want the syntax

Alas, some people aren’t a fan of using the composition operator. If you think it’s weird, I strongly recommend you use it for a while (say, for a full project), and the syntax will eventually grow on you. If your concern is that some people might not know what the composition operator means, please keep in mind that there’s always something new to learn, and people pick things up quickly, especially when they can see examples and try things out. After all, at one point, no one knew how to write programs!

If you’re still not convinced, there’s a non-pipe syntax. I thoroughly discourage its use, especially since we need to build our range in reverse order.

// alternative 1
auto divisors = view::take_while(view::stride(view::iota(5), 6), [x](auto const d) noexcept { return d * d < x; });

// alternative 2
auto five_up = view::iota(5);
auto step_six = view::stride(five_up, 6);
auto divisors = view::take_while(step_six, [x](auto const d) noexcept { return d * d < x; });

Performance

Program size

I’ve put both one of the range-based programs and the vanilla C++ program into Compiler Explorer and compiled using Clang 6, both with libstdc++ (the C++ Standard Library that ships with GCC) and libc++ (the LLVM C++ Standard Library). Due to timeouts, GCC wasn’t a viable option, but the results can be checked locally, if you’re interested.

The compiler output on the left uses libstdc++; the output on the right uses libc++. On top, we have the implementation using ranges, and the bottom is vanilla C++. All four implementations were compiled using -O3 -DNDEBUG. It’s easy to see that the code on the bottom is smaller than the code on the top: about 55% of the size of the ranges solution when using libstdc++, and 80% of the size when using libc++.

That’s program size. What about runtimes?

Run-time performance

As I’m not familiar with macrobenchmarking, we’ll only be looking at microbenchmarking, using Google Benchmark.

Making things fair

Reading input from the character input requires communication with the operating system, which can be non-deterministic, so it’s not exactly a fair endeavour to use std::cin for this section. Instead, we’ll provide a pre-loaded istringstream object that contains the one hundred million integers in a random order (the seed remains consistent, so program re-runs will remain consistent).

Similarly, writing to any output stream requires communication with the OS, so we’ll avoid using both std::cout and ostringstream. Instead, we’ll use a pre-loaded vector.

Finally, we’ll need to do an operation on our output so that the optimiser doesn’t see that we’re not using it and remove code that we need to benchmark. This is achieved by reading over the output twice and reporting the number of primes and non-primes: that way, we can confirm that the output is consistent across both programs.

You can find the changes in this instance of Compiler Explorer.

Benchmarking apparatus

Everything was compiled and run using an Intel Core i7-8700 CPU @ 3.2GHz on this Docker image with six cores, 26’368 MB memory (clocked at 2133MHz), and 512 MB swap space allocated to the container. The operating system was Windows 10.0.17134 Build 17134.

The exact commands to the Ninja build system included:

/usr/bin/clang++ \
   -stdlib=libc++ \
   -O3 \
   -DNDEBUG \
   -flto=thin \
   -stdlib=libc++ \
   -lc++abi \
   benchmark/CMakeFiles/amcpp.gp2.is_prime_vanilla.dir/is_prime_vanilla.cpp.o \
   -o benchmark/amcpp.gp2.is_prime_vanilla \
   /root/.conan/data/google-benchmark/1.4.1/mpusz/stable/package/468fddffcd89d717847492001b37f7099807a722/lib/libbenchmark.a \
   -lpthread \
   /usr/lib/x86_64-linux-gnu/librt.so

Method

The programs were run in an alternating fashion, with the ranges implementation run first, then the vanilla implementation.

Results

The raw output of the results can be found on Pastebin. A summary is tabulated below:

Run	Vanilla C++ CPU (sec)	Ranges-based C++ CPU (sec)
1	54.809	53.246
2	55.245	52.720
3	54.990	53.390
4	54.766	53.054
Average	54.953	53.103

The vanilla solution runs at about 1.03x as long as the ranges-based implementation, and this scaled when I took my samples, starting from 5’000’000 integers all the way to the 100’000’000 recorded here.

Where can I learn more?

I’ll be hosting a two-day post-conference class at CppCon 2018, titled Generic Programming 2.0 with Concepts and Ranges. Ranges aren’t the only topics we’ll cover: there will be some exploration into what makes a concept, and the conditions where you can cultivate one, but we do spend a fair amount of time looking into ranges and even look at how a few range adaptors are implemented. You can find more infromation in the aforementioned link, or in this reddit post.

Eric Niebler also delivered ‘Ranges for the Standard Library’ at CppCon 2015, and ‘Introducing the Ranges TS’ at code::dive 2017. Eric is the code owner for the range-v3 library that was used in our examples, and you can play around with his tests and examples in the range-v3 repository to learn how things are used.

Finally, there’s a discussion channel over on cpplang.slack.com called #ranges. If you’re interested in chatting about this stuff, it would be worth heading over there.

Conclusion

In summary, a functionally-composed solution lets us describe what our intentions are, rather than how our code ought to be executed (that’s very much an implementation detail). Additionally, the functionally-composed code consistently takes ever-so-slightly less time to run! Note that this boost is without compiler support, so when ranges finally hit the Standard Library in C++20, they may be even better-suited to the task than they are now. Finally, other research has demonstrated that ranges lend themselves to parallelism quite well, which means that we can get even more performance gains than previously.

Of course, this article only explores determining if an integer is prime or not prime, and doesn’t claim that using ranges will always be a better choice for performance than imperative code. All of these point toward using ranges, but if performance is critical, it is important to benchmark your code. For non-performance-critical sections of code, I strongly encourage that you use ranges to functionally compose your data wherever possible.

Discussion

If you’d like to offer any feedback, please head to this specific GitHub issue, as I haven’t yet had the time to embed something such as Disqus into the blog. Please be aware of the code of conduct before posting.

Acknowledgements

I’d like to offer thanks to Corentin Jabot, Daniel Soutar, Gordon Brown, JC van Winkel, Kostas Kyrimis, and Tristan Brindle for their feedback on this material.