Goal
I want to write an MPI ping pong program where two processes send a piece of data back and forth to each other. This is a good example to measure the latency and bandwidth of a cluster machine. Since, currently, the MPI for C is developing stronger than C++ (see References), I am wrapping MPI-C commands in C++ classes here.
The ball
If we imagine the data transfer between two nodes is like a ping-pong game, the data is the ball. The ball can be a vector of an MPI type. Here for simplicity, is the integer:
struct Ball
{
Ball(int size){
data.resize(size);
data[0] = 0;
}
auto Size(){return data.size();}
auto& operator[](size_t i){return data[i];}
private: vector<int> data;
};
In this example, I ignore what is in data
array except 0-th element which counts the number of back-and-forths of the ball.
IPlayer
As I am still thinking in the world of ping pong, I call each process a player. We have different players, rank 0 and 1 which play the game and the other ranks which are idle. So to avoid nested if-conditions, I created IPlayer
interface:
struct IPlayer
{
virtual void Play()=0;
};
Idle
IPlayer
is adopted by the Idle
class which does nothing
struct Idle: IPlayer
{
void Play() override{};
};
Player
The players of the game adopt the same interface. The constructor of Player
needs to set some private members like who is the starter of the game. If the player’s rank is 0 its target is 1 and vice versa.
struct Player: IPlayer
{
Player(Ball& _ball, bool _iAmGameStarter):
ball(_ball), iAmGameStarter(_iAmGameStarter){
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// At least 2 processors needed
if (size<2){throw;}
target = (rank + 1) % 2;
};
// ... rest of the code
}
A player sends the ball and waits to receive it back. I used MPI_Send
to send the ball in a blocking way. It will receive the ball using MPI_Recv
which blocks the process until the ball buffer is reusable. Both block the buffer, the ball, to make sure there is not a race condition to read and write there. For more information, read more on the MPI race condition here.
void SendBall(){
MPI_Send( &ball[0] , ball.Size() , MPI_INT , target , 0 , MPI_COMM_WORLD);
}
void RecvBall(){
MPI_Recv( &ball[0], ball.Size() , MPI_INT, target, 0 , MPI_COMM_WORLD, &stat);
}
I have to override Play
of the interface. It basically calls SendBall()
and RecvBall()
in the order depending on who’s started the game. After each player receives the ball, they increment Data[0]
by one, firstly to count communications and secondly to show the program works fine.
void Play() override{
if (iAmGameStarter) SendBall();
RecvBall();
ball[0]++;
cout<< "Rank :" << rank <<" has the ball, No of throws: "<<ball[0]<<endl;
if (!iAmGameStarter) SendBall();
}
Code
The whole code is here. It is compiled with GCC 10.2, OpenMPI 4.0.
#include <mpi.h>
#include <stdio.h>
#include <memory>
#include<vector>
using namespace std;
struct Ball
{
Ball(int size){
data.resize(size);
data[0] = 0;
}
auto Size(){return data.size();}
auto& operator[](size_t i){return data[i];}
private: vector<int> data;
};
struct IPlayer
{
virtual void Play()=0;
};
struct Idle: IPlayer
{
void Play() override{};
};
struct Player: IPlayer
{
Player(Ball& _ball, bool _iAmGameStarter):
ball(_ball), iAmGameStarter(_iAmGameStarter){
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// At least 2 processors needed
if (size<2){throw;}
target = (rank + 1) % 2;
};
void SendBall(){
MPI_Send( &ball[0] , ball.Size() , MPI_INT , target , 0 , MPI_COMM_WORLD);
}
void RecvBall(){
MPI_Recv( &ball[0], ball.Size() , MPI_INT, target, 0 , MPI_COMM_WORLD, &stat);
}
void Play() override{
if (iAmGameStarter) SendBall();
RecvBall();
ball[0]++;
cout<< "Rank :" << rank <<" has the ball, No of throws: "<<ball[0]<<endl;
if (!iAmGameStarter) SendBall();
}
private:
Ball& ball;
int rank;
int size;
int target;
bool iAmGameStarter;
MPI_Request req;
MPI_Status stat;
};
auto MakeUniquePlayer(Ball& ball,int rank){
// Only rank 0 and 1 play the game
if (rank==0){
// Rank 0 starts the game
return unique_ptr<IPlayer> (new Player(ball, true));
} else if (rank==1)
{
return unique_ptr<IPlayer> (new Player(ball, false));
} else
{
return unique_ptr<IPlayer> (new Idle());
}
}
int main() {
MPI_Init(NULL, NULL);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Customize here
int ballSize = 1000000;
int warmupIterations = 0;
int mainIterations = 30;
Ball ball(ballSize);
auto player = MakeUniquePlayer(ball,rank);
// Warm up players before the game
for (size_t i = 0; i < warmupIterations; i++)
{
player->Play();
}
double start = MPI_Wtime();
// Main loop to be monitored
for (size_t i = 0; i < mainIterations; i++)
{
player->Play();
}
double end = MPI_Wtime();
auto elapsedTime = end-start;
// each iteration has 2 transfers: send and receive
auto transferTime = elapsedTime/(mainIterations * 2);
auto ballSizeInGigaByte = ballSize * 4.0 /* byte */ / 1000000000;
if (rank==0)
{
cout<< "Ball size (GB): "<< ballSizeInGigaByte<<endl;
cout<< "Transfer time (Sec): "<< transferTime <<endl;
cout<< "Bandwidth (GB/s): " << ballSizeInGigaByte/transferTime <<endl;
}
MPI_Finalize();
}
Validation
The first run is only 3 iterations to see how the ball moves back and forth. In all tests, only 2 processes were utilized.
BallSize = 1000,000
main iterations = 3
warm up iterations = 0
Rank :1 has the ball, No of throws: 1
Rank :0 has the ball, No of throws: 2
Rank :1 has the ball, No of throws: 3
Rank :0 has the ball, No of throws: 4
Rank :1 has the ball, No of throws: 5
Rank :0 has the ball, No of throws: 6
Bandwidth
The second run is to measure the bandwidth:
BallSize = 1000,000
warm up iterations = 10
iterations = 10,000
Ball size (GB): 0.004
Transfer time (Sec): 0.000531798
Bandwidth (GB/s): 7.52166
More like this
You can challenge your MPI skills by programming a traffic example I explained in this post.
References
I got ideas and codes from the below website(s)
OpenMPI EPCC MPI Exercise OLCF-tutorials Boost MPI MPI Deprecated the C++ bindings