Performance of dynamic polymorphism

In my previous article that you can find here, I’ve discussed about polymorphism in C++. At the end of that article, I’ve mentioned that it’s better to avoid dynamic polymorphism if performance is critical for your application.

In this article, I will explain why I mentioned this and we will discuss about dynamic polymorphism performance and what is the alternative.

What’s wrong with dynamic polymorphism?

Just to start this off, I want to mention that dynamic polymorphism is also called runtime polymorphism.

That means that the proper implementation for a function will be chosen at run time(more info, in my previous article).

I think you can see already, but I will point it out anyway. The fact that the compiler has to do more things at run time, it means that our application will be slower. More instructions means more time needed to run the code, after all.

There are some main points why dynamic polymorphism might be “slow”:

  • Additional pointer stored in the object(the vptr)
  • Extra indirection(pointer dereference)
  • Not inlined
  • Data and instruction cache miss

I will get into more details on each point above later on with some examples and profiling data but for now, let’s see what alternative we have.

The alternative

As you probably guessed already, the alternative would be to use static polymorphism where applicable.

It’s pretty simple, static polymorphism means that the proper implementation for a function will be decided at compile time which, in turn, means that there are no more instructions to be run at run time.

Now, static polymorphism has it’s own issues, so I don’t want to say that you should use it in every instance. I won’t get in details of what are the static polymorphism’s problems since it’s not in the scope of the article, but you will probably deduce some from the examples below.

Dynamic vs static

So, let’s take some examples:

// dynamic polymorphism
class BaseDP
{
    virtual void setup() = 0;
    virtual void run() = 0;
    virtual void teardown() = 0;
public:
    virtual ~BaseDP() = default;
    void process()
    {
        setup();
        run();
        teardown();
    }
};

class DerivedDP : public BaseDP
{
    void setup() override { /*....*/ }
    void run() override { /*....*/ }
    void teardown() override { /*....*/ }
};
// static polymorphism
template <class Derived>
class BaseSP
{
public:
    void process()
    {
        static_cast<Derived*>(this)->setup();
        static_cast<Derived*>(this)->run();
        static_cast<Derived*>(this)->teardown();
    }
};

class DerivedSP : public base<DerivedSP>
{
public:
    void setup() { /*....*/ }
    void run() { /*....*/ }
    void teardown() { /*....*/ }
};

With these examples, we are now able to profile the differences between both implementations.

Additional pointer and extra indirection

First two points against dynamic polymorphism above are “additional pointer stored in the object” and “extra indirection“.

Based on the previous article, we already know that these points are true and it will add some overhead to our implementation in run time.

However, we can profile this using benchmarks. Let’s check it out on quick-bench.

static void DynamicPolymorphism(benchmark::State& state) {
  std::vector<BaseDP*> ptrs;
  for (int i = 0; i < 5000; i++)
  {
    ptrs.push_back(new DerivedDP);
  }

  // profiling
  for (auto _ : state) {
    for (const auto& ptr : ptrs)
      ptr->process();
  }

  for (auto& ptr : ptrs)
    delete ptr;
}
BENCHMARK(DynamicPolymorphism);
static void StaticPolymorphism(benchmark::State& state) {
  std::vector<BaseSP<DerivedSP>*> ptrs;
  for (int i = 0; i < 5000; i++)
  {
    ptrs.push_back(new BaseSP<DerivedSP>);
  }

  // profilling
  for (auto _ : state) {
    for (const auto& ptr : ptrs)
      ptr->process();
  }

  for (auto& ptr : ptrs)
    delete ptr;
}
BENCHMARK(StaticPolymorphism);

Well, I don’t think it’s even necessary to show results here, since you can probably already guess. The calls to the functions in the static polymorphism cases are optimized away. Full example can be found on quick-bench, here.

I will add the results anyway :).

It’s worth to mention that a call to a virtual function is 25% slower than a call to a normal function. That does not mean that the actual function body is 25% slower, only the actual call(because of the indirection).

Not inlined

This means that the compiler cannot inline simple implementations of the functions if pointers are used, since the call is resolved at runtime.

For example:

class Base
{
public:
    virtual bool doSomething() { return true; }
};

class Derived : public Base
{
    bool doSomething() override { return false; }
};

Base b;
b.doSomething(); // this can be inlined

Base* b1 = new Derived;
b1->doSomething(); // this cannot
delete b1;

Instruction and data cache miss

Let’s check if this is true.

Having the example above, we can use cachegrind, a handy tool from valgrind which will profile the cache for us.

We will use the same implementation as for benchmarking but in different binaries. Flags used: -std=c++20 -O3.

Basically, aside from the definition of the classes, we will have:

// dynamic
void DynamicPolymorphism()
{
    std::vector<BaseDP*> ptrs;
    for (int i = 0; i < 5000; i++)
    {
        ptrs.push_back(new DerivedDP);
    }

    for (const auto& ptr : ptrs)
      ptr->process();

    for (auto& ptr : ptrs)
      delete ptr;
}

// static
void StaticPolymorphism()
{
    std::vector<BaseSP<DerivedSP>*> ptrs;
    for (int i = 0; i < 5000; i++)
    {
        ptrs.push_back(new BaseSP<DerivedSP>);
    }

    for (const auto& ptr : ptrs)
      ptr->process();

    for (auto& ptr : ptrs)
      delete ptr;
}

Using cachegrind(for profiling) and kcachegrind(for visualizing the data), we have the following results.

As you can see from the pictures above, the results are clearly in the favor of static polymorphism.

DynamicStatic
Instruction cache miss0.400.20
Data cache miss23.463.35

Final thoughts

The point that I’m trying to make in this article is that it’s better to avoid dynamic polymorphism as much as possible if the performance of your application is a critical factor.

This article does not mean you should always avoid it and static polymorphism is the way to go. On the contrary, static polymorphism also has a lot of problems and might not be a good fit for your needs.

The only aim was to provide you the information necessary, so you can make the right choice.

5 thoughts on “Performance of dynamic polymorphism”

  1. Is it possible to use static polymorphism if the classes are not known till runtime? Say reading from a json

    1. Static polymorphism means that the call is resolved at compile time so, as long as this cannot be done, you will get compilation error.
      I’m not sure that I fully understood what you mean by “reading from a json” but I hope this answers your question.

  2. Pingback: Casting in C++ - cppdev

  3. SomeOtherCppDev

    I think your comparison is only applicable if you are using only a single implementation.
    This means you cannot use a single vector of ptrs for different implementations.
    Of course it is possible to use static polymorphism with multiple implementations but all the code that is affected by the ‚static’ interface will be templateized, and therefore instanciated for each and every implementation.

Leave a Reply

%d bloggers like this: