Data Races, what are they, and why are they evil? – Part 2

In Part 1 of this blog, we described the differences between data races and race conditions and demonstrated some simple fixes through examples. You may have observed that fixing data races does not necessarily eliminate the race condition. If so, what is the harm actually brought by a data race? And why do we want to get rid of them?

Before we step into the murky world of data races, let’s first take a step back to understand a new concept: “The benign data race”.

There is a strong and widely held perception that data races are acceptable in certain contexts, and thus commonly used in existing code.

This, as it turns out, is very wrong. Not only are these so-called benign data races technically incorrect and harmful, but new bugs may be introduced by future re-compilation if the code is meant to be portable.

We present some common C/C++ “benign” patterns from a paper by Hans-J. Boehm below, and briefly explain how these common patterns might go wrong:

1. Double-checked locking

Example 1
if (!init_flag) {
    lock(); 
    if (!init_flag) {
        my_data = ...;
        init_flag = true;
    }
    unlock(); 
} 
tmp = my_data;

This is a very famous pattern (known as DCLP) and is also well-known to be incorrect. The code is potentially buggy if compiler optimization or hardware reordering is applied, which, in most real cases, they are.

For example, there’s nothing to prevent the compiler from advancing init_flag = true before the setting of my_data. Thus a second thread may end up reading an uninitialized value.

A general solution for this pattern does not exist until Java 1.5 (the volatile keyword) and C++11 (the memory fence). Without these, we have to protect the whole pattern with a lock, which will cost us in degraded performance.

For more details about DCLP, you can check the wiki, this page for Java, and Scott and Andrei’s paper about C++.

2. Reading an imprecise value is allowed

There are cases where a read from a thread does not care whether another thread has updated the value or not, i.e., either the old value or the new value will result in a valid computation. In such cases, is a data race benign?

The answer, of course, is NO. Simply because data races can potentially cause the read operation to obtain a meaningless value that corresponds to neither the old value nor the new.

Example 2
// Read global
int my_counter = counter;
int (* my_func) (int);
if (my_counter > my_old_counter) {
    ...
    my_func = ...;
}
...
if (my_counter > my_old_counter) {
    ... 
    my_func(...) 
}
// Read global
int my_counter = counter;
int (* my_func) (int);
if (my_counter > my_old_counter) {
    ...
    my_func = ...;
}
...
// ANOTHER read global
int my_counter = counter;
if (my_counter > my_old_counter) {
    ... 
    my_func(...) 
}

Consider the code pattern on the left, where my_counter reads from the global variable counter without protection. The global counter may get updated by some other threads running in parallel. Here the developer may think reading an old value of counter is acceptable due to the if (my_counter > my_old_counter) check.

However, suppose the compiler thinks the register of my_counter needs to be spilled between the two if-checks. With this in mind, it is legit for the compiler to re-read my_counter‘s value again to avoid spilling, as shown on the right. Under this optimized code, the two if-checks may not yield the same result, which can lead to unexpected behavior when calling my_func.

If we go a bit further, down to the hardware level. Assuming we want to maintain an imprecise global 64-bit integer counter. We make the counter++ concurrent and unprotected. If your system only supports 32-bit indivisible writes, then once a thread is updating the counter value from 2^32 - 1 to 2^32, another thread could read a value whose higher bits are already updated but lower bits are not. This will make the counter value totally meaningless.

3. Incorrect ad hoc synchronizations

Ad hoc (or user constructed) synchronizations should not use ordinary operations. Otherwise, we will generally face issues similar to those mentioned above.

Example 3
// `flag` is intended to be an ad hoc lock
while (!flag) {
    ...
}

tmp = flag
while (!tmp) {
    ...
}

For example, the busy-waiting loop on the left can be optimized to something like the example on the right, due to register spilling. The optimized loop is likely to be infinite, yet it is equivalent to the example on the left under sequential code.

Or if we want to use a busy-waiting loop as a memory barrier, such as in the example below:

Example 4
while (!flag) {}; // waiting for `data` to be updated
tmp = data;

Compilers or hardware could potentially reverse the order of the two instructions. Thus, this ad hoc memory barrier is mostly meaningless.

4. Redundant write

Assume we have a function f(x):

Example 5
// count is a shared variable between threads
f(x) {
    ...;
    for (p = x; p != 0; p = p -> next) {
        if (p->data < 0) count++;
    }
}

Notice that count is a shared variable, and is only updated when p has a negative value. Now if we use this function in the following way:

Example 6
// Thread 1
count = 0;
f(positives);

// Thread 2
f(positives);
count = 0;

The argument positives is a linked list whose nodes only contain positive values. Hence, there’s only one data race between the two count = 0s. You may wonder how this could possibly go wrong since all we are doing is writing the same value to count repeatedly.

However, things are different if the compiler inlines f(positives) and promotes the variable count to a thread-local register (this is a legitimate concern because count is written unconditionally). Now the actual instructions being executed by each thread become something like below:

Example 7
// Thread 1
count = 0; // step 2
reg1 = count // step 4
count = reg1 // step 6
// Thread 2
reg2 = count; // step 1
count = reg2; // step 3
count = 0; // step 5

Now under this specific interleaving, we find it is possible for count to remain its original value instead of being zeroed out.

5. Disjoint bit manipulation

This kind of “benign” data race is described as:

… data races between two memory operations where the programmer knows for sure that the two operations use or modify different bits in a shared variable.

Those kinds of manipulation are not generally benign. Updating bits usually entails 1) Reading the whole value; 2) Updating the relevant bits; and 3) Writing the value back. We can clearly see how specific interleaving may cause some updates to be invalid.

Even in the case where there is only 1 thread updating the bits, we can come up with the following example:

Example 8
x.bits1 = 1;
switch(i) {
    case 1:
        x.bits2 = 1;
        ...;
        break;
    case 2:
        x.bits2 = 1;
        ...;
        break;
    case 3:
        ...;
        break;
    default:
        x.bits2 = 1;
        ...;
        break;
}

x.bits1 = 1;
tmp = x.bits2
x.bits2 =1
switch(i) {
    case 1:
        ...;
        break;
    case 2:
        ...;
        break;
    case 3:
        x.bits2 = tmp;
        ...;
        break;
    default:
        ...;
        break;
}

Our original code is shown on the left. x is a global variable whose bits represent different flags, and bits1 and bits2 are two bit fields of x.

Notice that in our original code, case 3 does not update x.bits2. It also happens to be the branch we are using (assume this is a fact that the compiler does not know, so that the other branches will not be pruned). Now, if we have a second thread only reading the value of x.bits2. This seems to be a perfectly safe situation.

However, existing compilers may consider adjacent bit-fields as belonging to the same memory location (here in our case, x.bits1 and x.bits2). The write to x.bits1 will make the compiler think the target memory location is data-race free, thus justify its optimizations on x.bits2. This leads to the optimized code shown on the right, where the potential updates on x.bits2 are advanced to shrink the code size (if we have more branches that update x.bits2). Now, if we are still using case 3, the read from the other thread will conflict with the write on case 3, causing the second thread to read invalid values of x.bits2.

In summary, the evilness of data races mostly lies in compiler optimizations, hardware reordering, and underlying memory models. Most compiler optimizations assume the program is data-race free; thus, compilers may introduce undefined behaviors when data races exist. The worst thing about undefined behavior is it could lead to any actions — even launching a missile — instead of just entering an invalid path of your program.

If you are interested in further reading and discussion of data races, there’s another excellent blog explaining this.

How do I prevent my program from launching missiles unexpectedly?

Having made it this far, hopefully you are not utterly terrified by what can be caused by data races. Just to be fair, there could be “benign” data races under more concrete contexts. Such as when knowing the specific architecture, compiler version, the optimization passes used, the memory models, and where the code is not intended to be portable.

Nevertheless, We strongly believe developers should pay more attention to potential data races in the same manner that they treat compiler warnings.

Properly protecting concurrent shared memory accesses not only prevents programs from falling foul of all the subtle bugs we presented above, but also significantly increases the readability of your code by specifying their concurrent semantics. Besides, there are tools like ThreadSanitizer, Intel Inspector, and Coderrect Racedetector to efficiently reveal potential data races at different programming phases.

Let’s just ensure we are not making our software program into an accidental missile launcher, ;).

Leave a Reply