What are C++ variadic templates and fold expressions?

Introduction

Introduced in C++11, a variadic template is a function or class template that accepts a parameter pack. The pack can have arbitarary number of parameters having different types. At the end of compile-time, a variadic template function is translated into multiple solid functions that each of them accepts a certain number of parameters with certain types.

There is no explicit for-loop commands to iterate the pack parameters. Parameters were separated with recursive functions. Luckily, since C++17, fold expressions, which are way cleaner that recursive functions, are implemented to expand parameter packs.

Prerequisites

I am using GCC 11.2 and Clang 14 with flag -std=c++20 to compile the examples.

I assume you are familiar with templates, auto keyword, concepts, constexpr.

Definition

A simple variadic function can be defined as:

auto f(auto... args){
    // some code to use args    
}

Ellipsis (...) following auto is to say args is a parameter pack. Therefore, the function can be any of these:

auto f(){/* code */}
auto f(auto arg1){/* code */}
auto f(auto arg1, auto arg2){/* code */}
auto f(auto arg1, auto arg2, auto arg3){/* code */}
// and so forth

The type of parameters can also be accessed in the function:

template<typename ...Ts>
auto f(Ts... args){
    // some code to use args 
    // some code to use Ts   
}

Similiar to single parameter templates, you can add qualifiers such as const, &, and so on:

auto f(const auto&... args){/*code */}
/* OR */
template<typename ...Ts>
auto g(const Ts&... args){/*code*/}

Size of the pack

The size or number of parameters in a pack can be found with sizeof...() operator:

auto sum(auto... args){
    std::cout<< sizeof...(args);
}

Fold expression

Unfortunately, there is no for-loop style code to iterate the parameters of a pack. To do so a fold expression is used. For instance, a parameter pack with the fold expression of:

(E op ...)

for 4 parameters is expanded as

(E1 op (E2 op (E3 op E4)))

E is an expression that uses parameter pack, and op is a binary operator like +, *, &&, ||, ==, (,) and so on. So fold expression of + operator is the sum of all the parameters:

(E + ...)
// turns to
(E1 + (E2 + (E3+E4)))
// Equals to
E1 + E2 + E3 + E4

Let’s try it with an example:

auto f(auto... args){
    return (args+...);
    // turns to (arg1+ arg2 + arg3 + ...)
}

f(1); // 1
f(1,2); // 3

I said we can use an expression that uses the pack, OK, let’s get the norm of 1D, 2D, 3D, … vectors:

auto norm(auto... args){
    return sqrt(((args*args) +...)) ;
    // turns to  sqrt( arg1*arg1 + arg2*arg2 + ...)
}
norm(1); // 1
norm(3,4); // 5

The compiler creates two functions, norm<int>(int) and norm<int, int>(int, int), to handle the calls.

The expression can also be a call to another function or lambda

auto root(const auto& x){
    return sqrt(x);
}
auto f(const auto&... args){
    return ( root(args) + ...);
}
// turns to ( root(arg1) + root(arg2) + ...)
f(1); // 1
f(1,4); // 3

Remember perfect forwarding

Generic parameters are better be passed to inner functions via universal reference and perfect forwarding:

template<typename... Ts>
auto f(Ts &&... args)
{
    return ( g(std::forward<Ts>(args)) + ... );
}

However, in the examples of this post, for the sake of readability I drop the forwarding bit.

Pass pack to function

A function can pass its parameter pack to another function:

auto average(auto... args){
    auto s =  sum(args...); //pass to sum()
    return s / sizeof...(args);
}
auto sum(auto... args){
    return (args+...);
}

std::cout<< average(1.0,2);
std::cout<< average(1.0,2,3);

Besides a pack, You can also pass other parameters:

auto f(auto x, auto... args);
auto g(int i, double d, auto... args);

Pack in initializer_list

An initializer list can be created with a parameter pack inside curly braces

template<typename ... Ts>
void f(Ts... params){
    auto l = std::initializer_list<int>{params...};
}

That means the class constructors (or other functions) which accept initializer list can be called with a parameter pack:

template<typename ... Ts>
void f(Ts... params){
    auto v = std::vector<int>{params...};
    auto a = std::array<int,3>{params...};
}

f(1,3,5);

Analysis more fold operators

For - operator the outcome is a bit unusual:

(E - ...)
// Equals to
E1-E2+E3-E4

* operator gives multiplication of all expressions:

(E * ...)
// expanded as
E1 * E2 * E3 * E4

For && operator, all must be true to get a true result:

(E && ...)
// turns to
(E1 && (E2 && (E3 && E4)))
// Equals to
E1 && E2 && E3 && E4

Fold expression with initial expression

Fold expression accepts initial expression which is not dependent on the parameter pack:

(E op ... op I)

where I is the initial expression. So an expression with 4 parameters with + operator is expanded as

(E + ... + I)
// turns to
(E1 + (E2 + (E3 + (E4 + I))))
// Equals to
E1 + E2 + E3 + E4 + I

This is good if you want to add an initial value to the expansion or when the pack is empty.

Left fold expression

All the examples above were right fold expressions. If you bring ellipsis forward, we have left fold expression:

(... op E)
// expanded as
(((E1 op E2) op E3) op E4)

Some operators like +, *, && and , give the same results for left and right folds, but some don’t. For example, - operator gives different result with left fold:

(... - E)
// Equals to
E1-E2-E3-E4

Fold of empty pack

Only three operators accept empty pack:

  • && : returns true
  • ||: returns false
  • (,): returns void()
auto f(auto... args){
    return ( args && ...);
}
f(); // true
f(true, false);// false

Therefore, for other operators like + as in this example

auto f(auto... args){
    return (args+...);}
f(); // Error: empty fold

You get an error if you call f() with no parameter. To overcome this, fold with initial value is used:

auto f(auto... args){
    return (args+...+0);}
f(); // 0

Comma operator is something else

First, let’s refresh you on this: the outcome of comma-separated expressions in parentheses is the last one although all the expressions are evaluated in order:

int a = (1,2,3); // a=3
auto b = (1.2, "hi", 1+2); // b = 3
auto c = (b=b-1, b++, b); // c = 3;

Do not confuse this with the comma that is used to pass different parameters to a function.

For the comma operator, the order of evaluation is left to right. So the fold expression becomes:

(E , ...)
// turns to
(E1 , (E2 , (E3 , E4)))
// Equals to
(E1 , E2 , E3 , E4)

Can you show that for a left fold of comma operator the result is still the same as above?

An example for (,) operator is:

auto f(auto... args){
    
    return (args,...);
}
f(1); // 1
f("hi",2); // 2
f(0.2,2,'A'); // A

Just reminding you again that variadic template is concluded during compile-time and the outcome functions are available at runtime. So in this example, if you remove f(1) line, the compiled executable won’t have a function like f(int) in it.

We know based on comma separation rules, all the expressions are evaluated. Now we can write a function that prints its parameters:

void print(auto... args){
   ((std::cout<<args),...);
}
print(1, 1.2f, 'A');

Behind the scene, the code is expanded as

void print(int a, float b, char c){
    (std::cout<<a, std::cout<<b, std::cout<<c);
}

While it is more efficient to use +... fold expression, but as a challenge let’s calculate sum of parameters with (,) operator:

auto sum(auto... args){
    double s=0;
   ((s+=args),...);
   return s;
}
sum(1,2,2.2); // 5.2

Try write a variadic function that returns the minimum argument.

Even more with comma-separated expressions

In the previous section we said the pack is expanded for the expression:

((expression-for-args),...)

Well, the expression can be made of more comma-separated expressions where all of them are executed:

void f(auto... args){
   ((
       std::cout<< "arg="<< args,
       args==1?std::cout<<", it's one!, ":std::cout<<", not one!, ",
       std::is_same_v<int,decltype(args)>?std::cout<<"is int.\n":std::cout<<"not int.\n"       
    ),...);
}

So those expressions are executed for each parameter. is_same_v is a part of <type_traits> header. Running the example, we get:

f(1,2,4.5);
/* outcome:
arg=1, it's one!, is int.
arg=2, not one!, is int.
arg=4.5, not one!, not int.
*/

Increment pitfall

You may need to use an increamenting index in the fold expression:

auto f (auto... Is){
        int i=0;
        return ( (Is * (i++)) + ...);
} 
// Hoping for 
f(4,7,9);
// gives you
// 4*0 + 7*1 + 9*2

But the problem is that, + operator like many binary operators has no order of caluclation. The above fold is expanded like

4* (i++) + 7*(i++) + 9*(i++)

The compiler is free to calculate terms in any order. So, if it calculates them in any order except left-to-right, the result is not what we wanted. Let’s try right-to-left:

4*(2) + 7*(1) + 9*(0)

GCC compiler didn’t give any warning for this. But Clang gave:

warning: multiple unsequenced modifications to 'i' [-Wunsequenced]

To avoid this problem, expand the fold expression with (,) operator which has left-to-right sequence of calculation.

Case study 1: Print

Write a function that prints its variadic arguments:

  • Primitive type: print as they are,
  • Pointer type: print the target of the pointer,
  • STL cointainer (array, vector,…) with iterator: print all the members of the container.

Hint: STL containers can be filtered with a Concept (read my post on Concepts).

Solution:

#include<array>
#include<vector>
#include<iostream>
#include<type_traits>

template<class T>
concept hasIterator = requires (T a) 
{ 
    a.begin(); 
    a.end(); 
};

template<typename T>
void impl(const T& x){
    if constexpr (hasIterator<T>){
        std::cout<<"(";
        for (auto& m: x)
            std::cout<<m<<" ";
        std::cout<<")";
    }
    else if constexpr (std::is_pointer_v<T>)
        std::cout<<*x;
    else
        std::cout<<x;
}

void print(const auto&... args){
   ((impl(args)),...);
}

int main(){
    auto a = std::array<int,3>{9,8,7};
    auto v = std::vector<double>{1.5, 2.0};
    int* p = new int{3};
    print(1,4.5, a,v,p);
    return 0;
}

Case study 2: Types info

Write a function that returns the name and size of its variadic parameters in a form of vectors. The function handles int and std::array types but of course you can extend it to more types with more details.

Hint: to filter std::array use template specialization technique (or so-called meta-function).

Solution:

#include<array>
#include<iostream>
#include<type_traits>
#include<vector>
#include<tuple>

template <typename T>
struct is_std_array : public std::false_type{};

template <typename V, size_t n>
struct is_std_array<std::array<V, n>> 
                : public std::true_type{};


template<typename ... Ts>
auto GetInfo(const Ts& ... args){
    std::vector<std::string> types{};
    std::vector<int> sizes{};
    ( [&]()
    {
        if constexpr (std::is_same_v<int,Ts>){
            types.push_back("int");
            sizes.push_back(1);
        } else if constexpr (is_std_array<Ts>::value){
            types.push_back("std::array");
            sizes.push_back(args.size());
        }
    }
    ()
    ,...);
    return std::make_tuple(types,sizes);
}

int main(){

    std::array<int,5> arr;
    auto [types, sizes] = GetInfo(1,arr);
    
    for (size_t i=0; i<types.size();i++)
        std::cout<< "type:"<< types[i] 
                 <<" size:" << sizes[i]<<"\n";   

    return 0;
}

In this case I purposefully used a lambda function to show how much you can fit into a comma-separated expression. Can you take the lambda out and define a different function to replace it?

Case study 3: Recursive expansion

Fold expression replaced recursive functions, but it is always good to be able to read C++11 code. Can you write a recursive function that calculates the sum of its variadic parameters? To make it more like C++11 don’t use if-constexpr.

Solution:

// When there are 2 or more parameter
template <typename T, typename ...Ts>
double sum(const T& first, const Ts&... args){
    return first+sum(args...);
}

// When there is only one parameter left
template <typename T>
double sum(const T& last){
    return last;
}

int main(){
    return sum(1,2,3,4,5);
}

Now you know how much the fold expression of C++17 made our life easier.

Case study 4: Vector addition

Write a function add() which accepts many vectors, adds the vectors element-wise, and returns a result vector. In terms of performance, compare add() function with an overloaded + operator function doing the same job.

Hint: to get the inner type of the first vector in the variadic template you need to write a meta function.

solution:

This meta-function gets the inner type of the first vector. We need this type to define our output vector:

template<typename ...Ts>
struct GetFirst{};

template<typename First, typename ...Rest>
struct GetFirst<std::vector<First>, Rest...>{
    using Type = First;
};

Using the fold expression, the add function is:

template<typename ...Ts>
auto add (const Ts&... vecs){
    using T = typename GetFirst<Ts...>::Type;
    auto size = (vecs.size(),...);// choosing the size of the last vector
    std::vector<T> res(size);

    for (int i=0;i<size;i++){
        res[i] = ((vecs[i])+...);
    }
    return res;
}

This is the overloaded + function:

auto operator+(const std::vector<int>& a, const std::vector<int>& b){
    std::vector<int> res(a.size());
    
    for (int i=0;i<a.size();i++){
        res[i] = (a[i]+b[i]);
    }
    return res;
}

They are tested with:

int main(){
    std::vector<int> a(3,1);
    std::vector<int> b(3,2);
    std::vector<int> c(3,3);
    std::vector<int> d(3,4);

    auto res1 = add(a,b,c,d);
    auto res2 = a + b + c + d;

    return 0;
}

Both functions work fine, but add() runs 2.5 times faster than + operator using Clang compiler with the -O3 flag. This is because + operator a+b is calculated in a loop, then added to c in another loop, and so on. However, using add(), in one loop, all the vectors are summed. Anyhow, + operator can be overloaded with expression templates to be similar to add(), but that seems like overkill for small jobs.

Case study 5: nD vector

Create a mutli-dimensional template class containing a dynamic contiguous array of data (i.e. std::vector):

  • The array represents locations in 1D, 2D, 3D, … spaces.
  • An object is constructed with the sizes along each dimension: (size_x) for 1D, (size_x, size_y) for 2D, …
  • The class has a variadic get function to get elements: get(i) for 1D, get(i,j) for 2D, get(i,j,k) for 3D, … .
  • get() is not allowed to accept any form of array.
  • You get an error if you ask for an element with wrong number of indexes, e.g. (i,j) from a 1D, 3D, 4D, … array

If all elements of an ND array are fitted in a 1D array, the location of element (i,j,k,…) in the ND array is associated with the index of the 1D array as:

index_1D = i + j*size_x + k*size_x*size_y + ...;

Solution:

The code is a bit rough to be easier to read. A few points to consider:

  • getId(): contains the important points we talked about in this post,
  • std::integral: is a concept that limits parameters to be integers,
  • static_assert: to check the correct number of arguments injected,
  • GetCoeff(): recursive function to get coefficients of the above equation: 1, size_x, size_x*size_y, …
#include<array>
#include<vector>
#include<concepts>
#include<iostream>

template <typename T, size_t Dims>
struct VectorNd{
    std::vector<T> data{};
    std::array<size_t,Dims> sizes;
    std::array<size_t,Dims> coeffs;

    VectorNd()=delete;
    
    VectorNd(std::integral auto ... sizes_):sizes{sizes_...}{
        static_assert(sizeof...(sizes_)==Dims);
        data.reserve( (sizes_ *...));
        for (size_t i=0;i<Dims;i++)
            coeffs[i] = GetCoeff(i);
    };

    size_t GetCoeff(size_t i){
        if (i==0)
            return 1;
        return sizes[i-1]*GetCoeff(i-1);
    }

    size_t getId(std::integral auto ... Is){
        static_assert(sizeof...(Is)==Dims);
        size_t i=0;
        size_t id=0;
        ((id+=Is*coeffs[i++]),...);
        return id;
    }

    auto& get(std::integral auto ... Is){
        return data[getId(Is...)];
    }
};

The code can be used with the below code:

int main(){

    size_t sizex=5;
    size_t sizey=5;
    size_t sizez=5;

    VectorNd<double,2> a(sizex,sizey);
    VectorNd<int,3> b(sizex,sizey,sizez);

    // x is the fastest index (column major)
    for (int j=0;j<sizey;j++)
        for (int i=0; i<sizex;i++) 
            a.get(i,j) = i*j;

    std::cout<<a.get(2,2)<<"\n";
    std::cout<<a.getId(2,2)<<"\n";
    std::cout<<b.getId(2,2,3)<<"\n";

return 0;
}

The code also can be compared with the specific functions below. If you compare the assembly code when optimization is -O2 there is not much difference between the generic and specific functions:

auto getId2(size_t i, size_t j){
        return i + j*sizes[0];
}

auto getId3(size_t i, size_t j, size_t k){
        return i + j*sizes[0] + k*sizes[0]*sizes[1];
}

Final note, this example works the way Eigen library access elements of a matrix get(i,j). We could also write the program in a way that get() receives an std::span as a parameter pointing to {i,j,k,...} indexes, hence, variadic template wouldn’t be necessary. Could you give it a go?

Read more

If you liked this post, I have more like this click on the meta-programming tag below.

Subscribe

I notify you of my new posts

Latest Posts

Comments

1 comment
Peter 2-May-2023
There is a trivial typo in the section "Analysis more fold operators": (E - ...) // Equals to E1-E2+E3-E4 should be (E - ...) // Equals to E1-E2-E3-E4 or better (imho) (E - ...) // Equals to E1-(E2-(E3-E4))