CMake part 1: It is a programming language!

Introduction

CMake is a very popular build configuration generator for Fortran/C/C++ programs. The true power of it lies in its basics which is hard to find in a coherent way on the internet. This post, CMake part 1, aims to neatly mention important concepts, syntax, and commands of CMake as a programming language. For each command a reference to the manual is given for more details. I focus on installation, defining variables, if-conditions, loops, functions, and so on. Here, I ignore the commands to create libraries and executables because they will be explained in the next post.

Installation

First check if you already have it installed. In a terminal/PowerShell run

cmake --version

If it is not there, you can download CMake for Windows, MacOS, and Linux from here and install it.

In Windows, you can also install it using choco, open a PowerShell as administrator and run

choco install cmake

CMake is usually installed on Linux distros by default. If not, it is included in their package manager. For example, on Ubuntu, you can install it via

sudo apt-get install cmake

Examples structure

The examples in this post are run similar to practical applications. There is a project folder that contains CMakeLists.txt file and build folder.

--myProject
  |
  ----- CMakeLists.txt
  |
  |
  ------ build

The CMake script is written in CMakeLists.txt. CMake is run from within build folder by this command in a terminal:

cmake ..

or from anywhere in the file system:

 cmake -S path/to/myProject -B path/to/build

CMake automatically detects CMakeLists.txt and runs the commands in it.

Minimum requirement

CMake is an evolving language. New features are added to it every day. At the beginning of a CMake script, you can set the earliest CMake version that compiles your code correctly by,

cmake_minimum_required(VERSION 3.22)

Version 3.22 is the one I use for this post.

Project

Always set the name of the project in the script

project(LinearSystemSolver)

You can also set the project version and language: C, CXX (for C++), Fortran:

project(LinearSystemSolver VERSION 1.2.0 LANGUAGES CXX)

See the manual for project here.

Comment

A text starts with # considered as a comment and CMake ignores it.

# This is a comment
cmake_minimum_required(VERSION 3.2)
project(LinearSystemSolver) # Another comment

A multi-line comment is created with #[[…]]:

#[[ this is 
a long comment
for this code.]]

Script language

CMake is a dynamically-typed language like Python. CMake script is composed of commands. Each command ends with parentheses with some arguments and keywords.

  • Commands are case insensitive,
  • variables are case sensitive,
  • Keywords are always written in upper case.

This two lines are the same:

command1(KEYWORD1 arg1)
COMMAND1(KEYWORD1 arg1)

Message

message writes its parameter on screen like print() in Python:

message("this is a message")

Message function has many modes for showing warnings, errors, and so on. Usually, mode STATUS is used to inform users that a step is started or finished:

message(STATUS "The compilation is finished.")

Note that STATUS is a CMake keyword and needs to be upper case.

Another important usage for message is debugging a CMake script. You can write the value of any suspicious variable on the screen.

See the manual for message here.

Normal variable

In CMake, there is no data type like char, integer, float, or class. All variables are strings (or text). A variable is defined or changed with set command:

set(x hello)
set(y "hello")

Both x and y are set to hello. hello is a constant or value.

I prefer quotation for constants because

  • you emphasize that it is a constant
  • white space handled correctly
set(z "Hi there")

Without quotations, spaces imply a list variable, explained in list section.

The value of a variable is accessed with ${variable}:

set(a "hi")
set(b ${a}) 
# b is "hi"

${a} is expanded to hi.

See the manual for set() here.

Variable operations

We can merge variables to create a new one:

set(myPath "/home")
set(myDir "projectA")
set(myFile "${myPath}/${myDir}/main.cpp")
Message(myFile) # will print /home/projectA/main.cpp

You have to dereference a variable to use its content (if-condition is an exception, but ignore it now.):

set(file1 "sample.h")
set(header file1) # This is a Mistake!

here header is set to text “file1”. We wanted it to be text “sample.h”, so the code is fixed like:

set(file1 "sample.h")
set(header ${file1}) 

Derefrencing can happen recursively:

set(a "Final")
set(b a)
message( b ) # shows b
message( ${b} ) # shows a
message( ${${b}} ) # shows Final

This example shows:

  • CMake is funny,
  • A variable stores only a string,
  • Everything is a constant unless dereferenced to be treated as a variable (there are exceptions like if-condition and foreach),
  • The concept of reference-to-reference or pointer-to-pointer,
  • Why I like to set constants in quotations.

You may use tricks like this to have class-like data set:

set(folder1-header "/folder1/a.h")
set(folder1-source "/folder1/a.cpp")

set(folder2-header "/folder2/b.h")
set(folder2-source "/folder2/b.cpp")

set(folder folder1)
# comment the above line
# and uncomment the below line, see the message
#set(folder folder2)

message(${${folder}-header})
message(${${folder}-source})

Unset variable

A variable can be cleared/deleted:

unset(a)
set(a) # set without value

We can check if a variable is set by

if (a)
    # do something
endif()

if-condition

A condition is written similiar to other programming languages:

if (<conditon1>)
    # do something
elseif(<condition2>)
    # do another thing
else()
    # do default action
endif()

Some constant strings are translated to true/false:

  • True constants: TRUE, 1, Yes, Y, ON, …
  • False constants: FALSE, 0, NO, N, OFF, …

A variable can be put as a condition to test if it is set:

unset(a) # emphasizing a is not set
set(b "sample.cpp")

if(a)
    # the code here not run
endif()

if (b)
    # the code here is run
endif()

Note: In the above example, if-condition checks whether its argument is a set variable, DO NOT use ${}. Otherwise, if-condition checks whether the value of the variable is a variable.

Various compounds can be made for conditions. They can be related with AND and OR, negated with NOT, and separated by parentheses. The most useful operator is STREQUAL to check if two variables are equal:

set(a "book")
set(b "book")
if (a STREQUAL b) # true condition
    message("they are equal.") 
endif()

You could also write if (${a} STREQUAL ${b}), but I prefer if (a STREQUAL b) because in this way we can have this rule:

Always use the name of a variable without ${} in conditions.

We can compare a variable with a constant:

set(a "book")
if (a STREQUAL "book") # a true condition
    message("they are equal.") 
endif()

Every variable is a string but CMake can compare numbers with LESS, EQUAL, GREATER, and so forth.

if(1 EQUAL 01.0)  # a true condition
	message("1 equal to 01.0") 
endif()

See the manual for if() here.

List

A list is defined with set as well, with space separation of items:

set(myFiles a.cpp b.cpp a.h)

Or in quotations with ; separation:

set(myFiles "sample1.h;sample2.h")

set(main "x.cpp")
set(myFiles "sample1.h;sample2.h;${main}")

CMake comes with a variety of keywords to modify a list such as APPEND, POP_BACK, REMOVE_ITEM. For example, the previous example could be rewritten as:

set(myFiles "sample1.h;sample2.h")

set(main "x.cpp")
list(APPEND myFiles ${main})
# So now myFiles="sample1.h;sample2.h;x.cpp"

When passing a list to add_* commands (e.g., add_library, add_executable), pass it as ${myFiles}. Do NOT pass it as "${myFiles}" because it will pass the list as a single string including semicolons.

You can try printing myFiles via:

message(${myFiles})
# sample1.hsample2.hx.cpp

All items are printed one after the other without the semicolon separators, indicating that the variable is treated as a list. To properly print a list, see section foreach loop. However, if you write:

message("${myFiles}")
# sample1.h;sample2.h;x.cpp

The content of the variable is considered a single string, which is then printed.

See the manual for list() here.

foreach loop

A numerical loop is defined as

foreach( i RANGE 1 5)
	message(${i})
endforeach()
# It will print 1 2 3 4 5

Note that the end of the range is included in contrast to Python.

A list can be iterated as

set(names "Jack;Kate;Sara")

foreach(name IN LISTS names)
	message(${name})
endforeach()
# It will print Jack Kate Sara

Notice here, in foreach, the same as if-condition, we drop ${} from variable.

See the manual for foreach here.

Cached Variable

The state of normal variables is lost after a cmake run. To overcome this, we have cached variables which are written in CMakeCache.txt file. Whenever we run cmake command they are loaded from that file. These variables aim to store user preferences on disk. Some examples of user preferences are:

  • installation directory,
  • build type (release or debug),
  • special compiler flags,
  • option to install some libraries.

They are created and set firstever time that cmake is called. They are defined with this template:

set(<variable> <value>  CACHE <type> <docstring>)

An example is:

set(libAPath "/home/libA" CACHE PATH "info about libAPath for user")

After the first time, anymore cmake is called, the line above will be ignored because libAPath is already in the cache.

The idea behind it is that /home/libA is the default value, and a user is responsible for changing it to something that suits their need. They can use, cmake -D flag, ccmake command or cmake-gui to modify cached variables.

<type> tells ccmake and cmake-gui what we are expecting to get from the user. Types are:

  • FILEPATH: GUI shows a file selector dialog.
  • PATH: GUI shows a directory selector dialog.
  • STRING: GUI shows a textbox.
  • BOOL: GUI shows a checkbox.
  • INTERNAL: Hidden from GUI, for the developer.

A user can run the cmake-gui from a terminal with

cmake-gui -S pathToSourceFolder -B pathToBuildFolder 

You can also set a cached variable when running cmake with -D flag:

cmake -D <var>:<type>=<value>

See this example:

cmake -D compilesModule1:BOOL=ON -S path/to/source -B path/to/build

Every time you set a variable with -D, it overwrites the cached value.

Besides set, another way to create a boolean (ON/OFF) cache variable in a script is option:

option(hasModule1 "info about this option" ON)

which is the same as this:

set(hasModule1 "ON" CACHE BOOL "info about this option")

While it is not recommended, we can also overwrite a cached variable from the script every time cmake is run using FORCE:

set(libAPath "/home/libA" CACHE PATH "some info" FORCE)

Sometimes we want to store some variables on disk as a developer, but we don’t want them to be changed by the user, then we write

set(libAPath "/home/libA" CACHE INTERNAL "some info")

The INTERNAL variables are global variables accessible in every scope. FORCE is not necessary for them as they are always forced. Therefore, to work with them, we can write this:

if (NOT libAPath) # if it is not in the cache file
    # set the default value
    set(libAPath "/home/libA" CACHE INTERNAL "some info")
endif()
# work with libAPath

Never choose the same name for a cached and a normal variable unless you know what you are doing.

See the manual for set(), option(), and flags of cmake executable.

String

With string command you can find-and-replace, manipulate, compare strings. You can even work with JSON strings. See below examples:

string(TOUPPER "hello" a) # a is set to "HELLO"
string(LENGTH "hello" b) # b is "5" 
string(SUBSTRING "hello" 2 3 c) # c is "llo"

See the manual for string here.

Math

A math equation is solved with this template:

math(EXPR output_variable math_expression)

for example

math(EXPR x "5*(1+1-1)/5") # x will be 1

See the manual for math here.

File

With File command you can

  • read and write files,
  • perform file system actions such as copy, remove, and rename files,
  • upload or download files
  • create or extract archives (zip, 7zip, …) and many more actions.

The below keywords are common to get the list of files in the project:

  • GLOB for getting the list of files in the directory of the current CMakeLists.txt,
  • GLOB_RECURSE for getting the list of files in the current directory and all its subdirectories.

You have to set globbing expressions to find desired files. The example below finds the list of files with .h and .cpp in sub1 directory. The results are stored in myfiles variable:

file(GLOB_RECURSE myfiles LIST_DIRECTORIES false ${PROJECT_SOURCE_DIR}/sub1/*.cpp ${PROJECT_SOURCE_DIR}/sub1/*.h)

The last two terms are globbing expressions, you can add as many globbing expressions as you like.

Note that CMake doesn’t recommand GLOB is not recommended for collecting source files. For more info, see the manual for file() here.

Function

A function in CMake is defined as

function(NameOfFunction arg1 arg2)
    # body of function
endfunction()

A function that prints its arguments

function(print a b)
    message("${a} ${b}")
endfunction()

print("March" "May")

The arguments are stored in ARGV list, so for a function which accepts different number of arguments, we write:

function(print)
    foreach(arg IN LISTS ARGV)
       message(${arg})
    endforeach()
endfunction()

print("March" "May" "June")

The parameters set in a function are local to the scope of function and not accessible outside:

function(doSomething)
    set(name "Sara")
endfunction()

doSomething()

if (NOT name)
    message("name is not set!") # This line is reached
endif()

However, a function has access to copy of variables in the scope it is called i.e. a function has access to a copy of variables in its parent scope:

set(name "Sara")

function(doSomething)
    message(${name})
endfunction()

doSomething() # prints Sara

We say it has access to a copy of the parent scope because if you change a parent variable in the function, it will not change it in the parent scope:

set(name "Sara")

function(doSomething)
    set(name "Jack")
    message(${name})
endfunction()

doSomething() # Jack
message(${name}) # Sara

If you are willing to do so, you have to set the variable again with PARENT_SCOPE:

set(name "Sara")

function(doSomething)
    # for local scope
    set(name "Jack")
    # for parent scope
    set(name "Jack" PARENT_SCOPE)
    message(${name})
endfunction()

doSomething() # Jack
message(${name}) # Jack

Now we can define a function that returns a variable to its parent

function(findName outFullName first last)
    set(${outFullName} "${first} ${last}" PARENT_SCOPE)
endfunction()

findName(fullName "Steve" "Jobs")

message(${fullName})

A function can be terminated with return() command.

See the manual for function here.

Macro

Macro is defined the same as a function. However, while a function hides its content from a caller scope, macro pastes its content at the caller’s place. Therefore, the variables and commands defined in the macro will be exposed to the caller scope.

macro(setName)
    set(name "Sara")
endmacro()

setName()

message(${name}) 

In function, variable ARGV contains a list of arguments, but in macros. ${ARGV} does so.

Macro vs Function

Generally, a function is the first pick as it leads toward clean code and less bug. A macro can be used for wrapping commands that make some changes like setting some variables in the scope they are called.

For more detailed information on scope of function and macro see my post here.

Special CMake variables

Any variable that starts with CMAKE_ is a reserved variable for CMake. It will populate them when a script is run. We can store a list of variables defined for our system in a file via:

cmake --system-information information.txt

add_subdirectory

In any subdirectory of a project, you can have a CMakeLists.txt file. Imagine we have a file system like this

-- myProject
    |
    ----- build
    |
    ----- library1
    |        |
    |        ---- CMakeLists.txt
    |
    ------ CMakeLists.txt

In library1/CMakelists.txt we have

message("Hello from library1")
message(${CMAKE_CURRENT_SOURCE_DIR})
message(${PROJECT_SOURCE_DIR})

And in myProject/CMakeLists.txt we have this line

project(myProject)
message("Hello from myProject")
message(${CMAKE_CURRENT_SOURCE_DIR})
message(${PROJECT_SOURCE_DIR})
add_subdirectory("library1")

Running cmake, it will set CMAKE_CURRENT_SOURCE_DIR variable to the myProject path. When it reaches the add_subdirectory line, it will jump into library1/CMakeLists.txt, set CMAKE_CURRENT_SOURCE_DIR to library1 path, and runs the command there. Afterward, CMake comes back to the project scope and sets CMAKE_CURRENT_SOURCE_DIR to myProject path. The variables in subdirectory scope are private and not visible to the project scope.

While CMAKE_CURRENT_SOURCE_DIR is dependent on the location of focused CMakeLists.txt, PROJECT_SOURCE_DIR is always set to the top-level folder containing CMakeLists.txt which has project() command in it.

A subdirectory usually contains some source files that need to be compiled.

Include

We can include CMake scripts from another file by include. No new scope is created as if the content of the file is pasted at the include() line.

Let’s create a file in the project folder, sample.txt which contains

message("Hello from sample.txt")

In CMakeLists.txt file include it as:

include("sample.txt")

It will write the message on the screen.

If the file has the extension of .cmake, it is called a module and we don’t need to mention its extension. It is common to add the folder containing modules to CMAKE_MODULE_PATH list variable. So, include command automatically search those folders for the mentioned module.

For example, let’s create mymodules folder in the project directory. In that directory we put sample.cmake module contains

message("Hello from sample module")

So the file system will look like this

--myProject
  |
  ----- build
  |
  ----- mymodules
  |       |
  |       --- sample.cmake
  |
  ----- CMakeLists.txt

Now in CMakeLists.txt we can include the module as

project(myProject)
list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/mymodules")
include(sample) 

Running cmake you will see the message.

The difference between include and add_subdirectory is:

  • include is used to add modules that may contain functions, macros, instruction to install packages and so forth.

  • add_subdirectory is used to add folders that contain source code to be compiled.

See manual for include() here.

More on CMake

The part 2 and 3 of this series are

Tags ➡ C++

Subscribe

I notify you of my new posts

Latest Posts

Comments

3 comments
Mohammad Reza Bayat 28-Jul-2022
Hi Sorush. I always enjoy reading your posts. It was awesome. Have you heard of Bazle?
Sorush 2-Aug-2022
@Mohammad Reza Bayat, it makes me really happy, when my posts are helpful. Not heard of Bazel, thanks for the mention, I will read about it.
Pete Dietl 6-Apr-2023
I think instead of Message(myFile) # will print /home/projectA/main.cpp you meant Message(${myFile}) # will print /home/projectA/main.cpp