A note on computer round-off error in physics

Introduction

A 16bit16-bit floating-point number is stored roughly as below
0     |  100  0000 00|00 0001
sign     Mantissa      Exponent

One bit for sign of the number, 66 bits for the exponent and 99 bits for significant numbers, mantissa. For more details on how the number is returned to base 1010 , see here.

This is similar to showing numbers with scientific notation in base 1010

123=(1)×0.123×103-123 = (-1) \times 0.123 \times 10^3

0.000456=(+1)×0.456×1030.000456 = (+1) \times 0.456 \times 10^{-3}

Looking at this style of saving a floating number, we can conclude below points

  • The precision is finite and dependent on the size of the mantissa. So, if mantissa can only store 33 digits

x=0.12345x=0.123x = 0.12345 \Rightarrow x = 0.123

  • Multiplication happen by calculation of mantissa’s and exponents separately and then round-off
x=0.0123×567=0.123×101×0.567×103=(0.123×0.567)×(101×103)=0.069741×102=0.69741×10   remove leading 0’s=0.697×10     round-off  \begin{aligned} x &= 0.0123 \times 567 \\ &= 0.123 \times 10^{-1} \times 0.567 \times 10^3 \\ &= (0.123 \times 0.567) \times (10^{-1} \times 10^3) \\ &= 0.069741 \times 10^2 \\ &= 0.69741 \times 10 ~~ \text{ remove leading 0's} \\ &= 0.697 \times 10 ~~~~~ \text{round-off } \end{aligned}
  • Addition happens by bringing the small number to the same exponent as the large number then adding mantissa’s
x=567+0.123=0.567×103+0.000123×103=0.567123×103=0.567×103  round-off  \begin{aligned} x &= 567 + 0.123 \\ &= 0.567 \times 10^3 + 0.000123 \times 10^3 \\ &= 0.567123 \times 10^3 \\ &= 0.567 \times 10^3 ~~ \text{round-off } \end{aligned}

Note that in the above example, the whole small number disappeared in the result.

Implication in physics

In physics codes, we have to ensure the precision is enough to capture the small magnitude phenomena. For example in a system that maximum temperature is 1000°C1000\degree C, with single-precision numbers (8 digits), we cannot have a heat source at 0.00001°C0.00001 \degree C.

In systems with different unit properties like pressure, density, and velocity, it becomes hard to track the precision of all properties. Therefore, they are usually normalized with reference values to measure their accuracy from 1.

For example, let’s have a look at Bernoulli’s equation in a horizontal pipeline:

p+12ρv2=cp + \frac{1}{2} \rho v^2 = c

Where pp is pressure, ρ\rho is the density of the fluid, vv is the velocity of the fluid and cc a constant. Before coding, we can normalize the equation. Let's assume the pressure is in the order of atmospheric pressure, p0p_0. A reference velocity can be found as p0=12ρv02 v0=2p0/ρ p_0 = \frac{1}{2} \rho v_{0}^2 \\ \text{ }\\ v_0 = \sqrt{ 2 p_0 / \rho} Dividing both side by p0p_0 pp0+12ρv2p0=cp0 pp0+(vv0)2=c2 P+V2=c2 \frac{p}{p_0} + \frac{1}{2} \frac{ \rho v^2}{p_0} = \frac{c}{p_0} \\ \text{ }\\ \frac{p}{p_0} + (\frac{ v}{v_0})^2 = c_2 \\ \text{ }\\ P + V^2 = c_2 So we code the equation with PP and VV which are close to 11 rather than absolute pressure and velocity which can be of different orders, for example, 5×1055 \times 10^5 pa and 55 m/s respectively. Note that by normalization, we do not increase the accuracy but facilitate analyzing accuracy.

A trick to improve the precision

There are situations that during the simulation runtime the correction to a property is much smaller than its initial value. For example, see the equation below pnew=pold+Δp p_{new} = p_{old} + \Delta p pp is the pressure of a system, and Δp\Delta p is the correction to pressure. This line might happen 10001000 times in a simulation. If the pressure is 1000,0001000,000 pa but the corrections are in the order of 11 pa, the numbers which are stored in a single-precision variable look like below 1000,001.21000,123.4 1000,001.2 \\ 1000,123.4 \\ and we cannot store 1000,123.456 1000,123.456 We are losing ~ 4 digits precision. Because during the 10001000 steps of the simulation the first 4 digits are never changed. In this case, we can work with an alternative variable in the code pˉ=p1000,000 \bar{p} = p - 1000,000 So we code the program based on pˉ\bar{p} which stores below numbers 1.2345678123.45678 1.2345678 \\ 123.45678 And when presenting the results, we convert them back to pp.

Round-off error accumulation

In algorithms where round-off error is accumulated the outcome deviates from what expected. In the below example, 1/31/3 is constantly added to a number then it is subtracted the same amount from the outcome. The result is different from the initial value.
#include <iostream>
#include <iomanip>

int main()
{
  float initial = 1.0;
  float result = initial;
  float one_third = 1.0/3.0;
  int iterations = 100000;  
  for (auto i = 1 ;i< iterations;i++)
    result = result + one_third;
  
           
  for (auto i = 1 ;i< iterations;i++)
    result = result - one_third;
    
    std::cout << std::fixed << std::setprecision(8) 
        << "initial= " << initial    // 1.00000000
        << "\nresult = "<<result;    // 1.00017393 
}

Note 1/3=0.3333...31/3 = 0.3333...3 cannot be exactly stored in floating-point memory. if we assume delta as the error of storing 1/31/3, we can write the above sequence as r0=1.0r1=(1+δ)(r0+13)r2=(1+δ)(r1+13)rn=(1+δ)(rn1+13) \begin{aligned} r_0 &= 1.0 \\ r_1 &= (1+\delta) (r_0 + \frac{1}{3}) \\ r_2 &= (1+\delta) (r_1 + \frac{1}{3}) \\ r_n &= (1+\delta) (r_{n-1} + \frac{1}{3}) \\ \end{aligned}

It can be concluded that the error of each term added to the next term and we get an accumulation of the round-off error.

Note that if the sequence was in a way that we had subtraction of terms, the errors could cancel out each other.

Tags ➡ C++ HPC

Subscribe

I notify you of my new posts

Latest Posts

Comments

0 comment