Starter Cases

Starter Cases #

Detect OpenMP Races #

This tutorial shows how to use Coderrect to detect races in a single file multi-threaded C program written with OpenMP.


Prerequisites

  • Our sample code relies on OpenMP to achieve parallelism. You will need a compiler that supports OpenMP. We will be using gcc, but clang and other modern alternatives also have OpenMP support. 

Detect a race in pi.c

We will start by detecting a race in a small single file program pi.c. The program is designed to compute pi=3.1415926…. We provide this example program in the Examples directory under Coderrect installation, or you can copy the code below to your system.

//pi.c
#include <omp.h>
#include <stdio.h>

#define N 1000000000

int main () {

    double delta = 1.0/(double) N;
    int MAX_THREADS = omp_get_max_threads();
    // Compute parallel compute times for 1-MAX_THREADS
    for (int j=1; j<= MAX_THREADS; j++) {

        printf(" running on %d threads: ", j);
        omp_set_num_threads(j);

        double sum = 0.0;
        double start = omp_get_wtime();

        #pragma omp parallel for //reduction(+:sum)
        for (int i=0; i < N; i++) {
            double x = (i+0.5)*delta;
            sum += 4.0 / (1.0+x*x);
        }

        // Out of the parallel region, finalize computation
        double pi = delta * sum;
        double time_lapse = omp_get_wtime() - start;
        printf("PI = %.12g computed in %.4g secondsn", pi, time_lapse);
    }
}

Check that pi.c can be compiled and run with the following commands

gcc -fopenmp pi.c -o pi
./pi

You should see output that looks something like:

running on 1 threads: PI = 3.141592653589971 computed in 12.84 seconds
running on 2 threads: PI = 3.141593993623682 computed in 6.928 seconds
running on 3 threads: PI = 3.141594228301372 computed in 7.741 seconds
running on 4 threads: PI = 3.141595112697573 computed in 8.376 seconds

As you can see from the results, running on different number of threads generated different values of PI, indicating the existence of a concurrency bug.


Run Coderrect

The easiest way to run the tool is by passing the build command to coderrect:

coderrect -t gcc -fopenmp pi.c

Remember, the command to build pi.c was gcc -fopenmp pi.c.

-t switch is used to generate a quick summary report in terminal.

Calling coderrect in this way ensures all the required compilation flags can be passed on the command line. For a project using a build system such as makecoderrect tool can be called with the same build command used to build the project. For an example: checkout out detecting races in a Makefile-based project.


Interpret the Results

The coderrect tool reports a quick summary of the most interesting races directly in the terminal for quick viewing. The tool also generates a more comprehensive report that can be viewed in a browser.


Terminal Report

The terminal report for pi.c should look something like the following:

==== Found a race between: 
line 22, column 13 in test.c AND line 22, column 17 in test.c
Shared variable:
 at line 16 of test.c
 16|        double sum = 0.0;
Thread 1:
 20|        for (int i=0; i < N; i++) {
 21|            double x = (i+0.5)*delta;
>22|            sum += 4.0 / (1.0+x*x);
 23|        }
 24|
>>>Stacktrace:
Thread 2:
 20|        for (int i=0; i < N; i++) {
 21|            double x = (i+0.5)*delta;
>22|            sum += 4.0 / (1.0+x*x);
 23|        }
 24|
>>>Stacktrace:

                 1 OpenMP races

Each reported race starts with a summary of where the race was found.

==== Found a race between: 
line 22, column 13 in test.c AND line 22, column 17 in test.c

Next the report shows the variable name and location on which the race occurs.

Shared variable:
 at line 16 of test.c
 16|        double sum = 0.0;

This shows that the race occurs on the variable sum declared on line 16.

Going and finding the race location in the code may be a little tedious so the report also shows a preview of the file at that location.

Thread 1:
 20|        for (int i=0; i < N; i++) {
 21|            double x = (i+0.5)*delta;
>22|            sum += 4.0 / (1.0+x*x);
 23|        }

The code snippet shows that this racing access is a write to sum as part of an OpenMP parallel for loop.

Taking a closer look at the source code we can see the root cause is the commented out “reduction”. 

#pragma omp parallel for //reduction(+:sum)

Un-commenting reduction(+:sum) removes the data race on sum and allows the program to calculate pi correctly.


HTML Report

The terminal is great to get a quick idea about what races are reported, but the full report can be viewed in a browser.

We can also save the race report to a directory specified via the command option-o <dir>. 

coderrect -o report gcc -fopenmp pi.c

This created a directory named report and a file named index.html within that directory. To view the full report open the index.html file in a browser.


Detect pthread races #

This tutorial shows how to use Coderrect to detect races in a single file multi-threaded C++ program written with the POSIX threads (Pthreads) library.

coderrect g++ pthread-race.cc -lpthread

Prerequisites

This tutorial assumes you have successfully installed the Coderrect software following the quick start.

Detecting races in pthread-race.cc

Our sample code pthread-race.cc starts two children threads in the main method. Each child thread increments a shared variable “x” by one, but there is a race.

Copy the source for pthread-test.cc below to your system.

#include <pthread.h>

#include <cstdlib>
#include <iostream>

using namespace std;
int x = 0;

void *PrintHello(void *threadid) {
    long tid;
    tid = (long)threadid;
    x++;
    cout << "Hello World! Thread ID, " << tid << endl;
    pthread_exit(NULL);
}

pthread_t load_data_in_thread(long id) {
    pthread_t thread;
    void *arg = (void *)id;
    int rc = pthread_create(&thread, NULL, PrintHello, arg);
    if (rc) {
        cout << "Error:unable to create thread," << rc << endl;
        exit(-1);
    }
    return thread;
}

int main() {
    pthread_t thread1, thread2;

    cout << "main() : creating thread 1 " << endl;
    thread1 = load_data_in_thread(1);
    cout << "main() : creating thread 2 " << endl;
    thread2 = load_data_in_thread(2);
    pthread_join(thread1, 0);
    // pthread_join(thread2,0);
    cout << "Final value of x: " << x << endl;
}

Check that pthread-race.cc can be compiled and run with the following commands

g++ pthread-race.cc -lpthread
./a.out

If run ./a.out multiple times, you should see output that looks something like:

main() : creating thread 1 
main() : creating thread 2 
Hello World! Thread ID, 1
Final value of x: 1
Hello World! Thread ID, 2

Run Coderrect

The easiest way to get started on a single file is to run

coderrect -t g++ pthread-race.cc -lpthread

-t switch is used to generate a quick summary report in terminal, for more information on Coderrect configurations, please check out reference page.

This will automatically detect the races and report the follow in the terminal:

==== Found a race between: 
line 12, column 5 in pthread-race.cc  AND  line 38, column 41 in pthread-race.cc
Shared variable: 
 at line 7 of pthread-race.cc
 7|int x =0;
Thread 1: 
pthread-race.cc@12:5
Code snippet: 
 10| long tid;
 11|    tid = (long)threadid;
>12|    x++;
 13|    cout << "Hello World! Thread ID, " << tid << endl;
 14|    pthread_exit(NULL);
>>>Stack Trace:
>>>PrintHello(void*) [pthread-race.cc:20]
Thread 2: 
pthread-race.cc@38:41
Code snippet: 
 36|      //pthread_join(thread2,0);
 37|
>38|        cout << "Final value of x: " << x << endl;
 39|}
>>>Stack Trace:
>>>main()
>>>  std::basic_ostream<char, std::char_traits<char> >& std::operator<<<std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) [pthread-race.cc:38]
>>>    std::char_traits<char>::length(char const*) [/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/ostream:562]
detected 1 races in total.

2020/02/25 01:18:23 Generating the race report ...
To check the race report, please open './index.html' in your browser

Interpret the Results

Each reported race starts with a summary of where the race was found.

==== Found a race between: 
line 12, column 5 in pthread-race.cc  AND  line 38, column 41 in pthread-race.cc

Next the report shows the name and location of the variable on which the race occurs.

Shared variable: 
 at line 7 of pthread-race.cc
 7|int x =0;

This shows that the race occurs on the variable declared on line 7.

Next the tool reports information about the two unsynchronized accesses to x.
For each of the two accesses a location, code snippet, and stack trace is shown.

The location shows the file, line, and column of the access.

pthread-race.cc@12:5
pthread-race.cc@38:41

So the above access occurs in pthread-race.cc at line 12 column 5 and at line 38 column 41, respectively.

Going and finding this location in the code may be a little tedious so the report also shows a preview of the file at that location.

Code snippet: 
 10| long tid;
 11|    tid = (long)threadid;
>12|    x++;
 13|    cout &lt;&lt; "Hello World! Thread ID, " << tid << endl;
 14|    pthread_exit(NULL);

The code snippet shows that this access is an unsynchronized write to x in each child thread.

The last piece of information shown for each access is the stack trace.

>>>Stack Trace:
>>>PrintHello(void*) [pthread-race.cc:20]

>>>main()
>>> std::basic_ostream<char, std::char_traits<char> >& std::operator<<<std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) [pthread-race.cc:38]
>>> std::char_traits<char>::length(char const*) [/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/ostream:562]

The stack trace shows the call stack under which the racing access occurred.
Each line in the call stack shows the name of the function, and the location the function was called from.

In the example above the stack trace shows PrintHello being called from line 20 in pthread-race.cc.


HTML Report

The full report can be viewed in a browser.


Detect races in GPU kernels #

This tutorial showcases Coderrect on detecting block-level and warp-level race hazards in GPU/CUDA kernels. Both examples are from NVIDIA’s official documentation: block_error.cu and warp_error.cu. Note that if you do not already have nvidia-cuda-toolkit installed on your machine you must use the command listed below.

$ sudo apt install nvidia-cuda-toolkit
$ coderrect -t nvcc block_error.cu
==== Found a race between: 
line 9, column 5 in block_error.cu AND line 14, column 25 in block_error.cu
Shared variable: 
smem at line 3 of block_error.cu
 3|__shared__ int smem[THREADS];
Thread 1: 
 7|{
 8|    int tx = threadIdx.x;
>9|    smem[tx] = data_in[tx] + tx;
 10|
 11|    if (tx == 0) {
>>>Stack Trace:
Thread 2: 
 12|        *sum_out = 0;
 13|        for (int i = 0; i < THREADS; ++i)
>14|            *sum_out += smem[i];
 15|    }
 16|}
>>>Stack Trace:
The OpenMP region this bug occurs:
/CUDA/benchmarks/t/block_error.cu
>27|    sumKernel<<<1, THREADS>>>(data_in, sum_out);
 28|    cudaDeviceSynchronize();
 29|
 30|    cudaFree(data_in);
 31|    cudaFree(sum_out);
 32|    return 0;
Gets called from:
>>>main
detected 1 races in total.
To check the race report, please open '/CUDA/benchmarks/t/.coderrect/report/index.html' in your browser

$ coderrect -t nvcc wrap_error.cu
==== Found a race between: 
line 12, column 5 in wrap_error.cu AND line 19, column 32 in wrap_error.cu
Shared variable: 
smem_first at line 5 of wrap_error.cu
 5|__shared__ int smem_first[THREADS];
Thread 1: 
 10|{
 11|    int tx = threadIdx.x;
>12|    smem_first[tx] = data_in[tx] + tx;
 13|    //__syncwarp();
 14|    if (tx % WARP_SIZE == 0) {
>>>Stack Trace:
Thread 2: 
 17|        smem_second[wx] = 0;
 18|        for (int i = 0; i < WARP_SIZE; ++i)
>19|            smem_second[wx] += smem_first[wx * WARP_SIZE + i];
 20|    }
 21|
>>>Stack Trace:
The OpenMP region this bug occurs:
/CUDA/benchmarks/t/wrap_error.cu
>40|    sumKernel<<<1, THREADS>>>(data_in, sum_out);
 41|    cudaDeviceSynchronize();
 42|
 43|    cudaFree(data_in);
 44|    cudaFree(sum_out);
 45|    return 0;

Note that in the above code line 13 is commented out, which disables the warp-level synchronization. 

13| //__syncwarp();

If line 13 is uncommented, the race will be fixed, and the tool will report no races: 

detected 0 races in total.

Detect races in a Makefile-based project #

This tutorial shows how to use Coderrect to detect races in a Makefile-based project using pbzip2-0.9.4 as the example. pbzip2 is a parallel implementation of the bzip2 file compressor, and it contains a known race condition in version 0.9.4.

git clone https://github.com/sctbenchmarks/sctbenchmarks.git
cd sctbenchmarks/1CB-0-2/pbzip2-0.9.4/bzip2-1.0.6
make clean
make
cd ..
cd pbzip2-0.9.4
make clean
coderrect make

Prerequisites

This tutorial assumes you have successfully installed the Coderrect software following the quick start.


Background

In pbzip2, the program will spawn consumer threads that (de)compress the input file and spawn an output thread that writes data to the output file. However, the main thread only joins the output thread but does not join the consumer threads. So there is an order violation bug between the time when the main thread is destroying resources and the time when consumer threads are using the resources.

An interleaving that triggers the error looks like:

void main(...) { 
  ...
  for (i=0; i < numCPU; i++) {              
    ret = pthread_create(..., consumer, 
                              fifo);  
    ...             
  }
  ret = pthread_create(..., fileWriter, 
                            OutFilename);
  ...
  // start reading in data
  producer(..., fifo);
  ...
  // wait until exit of thread
  pthread_join(output, NULL);
  ...
  fifo->empty = 1;
  ...
  // reclaim memory
  queueDelete(fifo);
  fifo = NULL;



}
void *decompress_consumer(void *q) {
  ...
  for (;;) {


















    pthread_mutex_lock(fifo->mut);
    ...
  }
}

Since queueDelete will release the fifo queue used by consumer threads, the access on fifo->mut will result in a segmentation fault.


Detect the race using Coderrect

Make sure the code can be compiled

cd sctbenchmarks/1CB-0-2/pbzip2-0.9.4/pbzip2-0.9.4
make

You should see a pbzip2 executable under the same folder.


Run Coderrect

make clean 
coderrect -t -o report make

NOTE: The make clean command is to ensure there is no pre-built binaries so that Coderrect is able to analyze every piece of source code in the project.

The coderrect -t -o report make command will compile and analyze the problem, the reported races is stored under ./report directory as specified by -o option.

-t switch is used to generate a quick summary report in terminal.


Interpret the Results

The coderrect tool reports a quick summary of the most interesting races directly in the terminal for quick viewing. The tool also generates a more comprehensive report that can be viewed in the browser.

Terminal Report

The terminal races reports looks like following:

==== Found a race between: 
line 1048, column 3 in pbzip2.cpp  AND  line 553, column 28 in pbzip2.cpp
Shared variable: 
 at line 991 of pbzip2.cpp
 991| q = new queue;
Thread 1: 
pbzip2.cpp@1048:3
Code snippet: 
 1046| pthread_mutex_destroy(q->mut);
 1047| delete q->mut;
>1048| q->mut = NULL;
 1049| }
 1050|
>>>Stack Trace:
>>>main
>>>  queueDelete(queue*) [pbzip2.cpp:1912]
Thread 2: 
pbzip2.cpp@553:28
Code snippet: 
 551| for (;;)
 552| {
>553| pthread_mutex_lock(fifo->mut);
 554| while (fifo->empty)
 555| {
>>>Stack Trace:
>>>pthread_create [pbzip2.cpp:1818]
>>>  consumer_decompress(void*) [pbzip2.cpp:1818]

Each reported race starts with a summary of where the race was found.

==== Found a race between: 
line 1048, column 3 in pbzip2.cpp  AND  line 553, column 28 in pbzip2.cpp

Next the report shows the name and location of the variable on which the race occurs.

Shared variable: 
 at line 991 of pbzip2.cpp
 991| q = new queue;

This shows that the race occurs on the variable queue allocated at line 991.

Next the tool reports information about the two unsynchronized accesses to queue.
For each of the two accesses a location, code snippet, and stack trace is shown.

The location shows the file, line, and column of the access.

Thread 1: 
pbzip2.cpp@1048:3

So the above access occurs in pbzip2.cpp at line 1048 column 3.
The report also shows a preview of the file at that location.

Code snippet: 
 1046| pthread_mutex_destroy(q->mut);
 1047| delete q->mut;
>1048| q->mut = NULL;
 1049| }
 1050|

The code snippet shows that this access is a write to q->mut (set it to NULL).

HTML Report

The terminal is great to get a quick idea about what races are reported, but the full report can be viewed in a browser.


Detect Fortran OpenMP Races #

This tutorial shows how to use Coderrect to detect races in a single file multi-threaded Fortran program written with OpenMP.


Prerequisites

  • Our sample code relies on OpenMP to achieve parallelism. You will need a compiler that supports OpenMP. We will be using gfortran, but other modern alternatives should also have OpenMP support. 

Detecting a race

We will be using a benchmark from DataRaceBench, a suite of OpenMP data race benchmarks designed to evaluate the effectiveness of data race detection tools developed by a group at Lawrence Livermore National Lab. DataRaceBench is full of great test cases (try using it to evaluate coderrect!).

We will be using the DRB001-antidep1-orig-yes.f95 case from DataRaceBench version 1.3.0.1. You can get the source code from the DataRaceBench repository on github, but we have included a snippet here for convenience.

!!!~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~!!!
!!! Copyright (c) 2017-20, Lawrence Livermore National Security, LLC
!!! and DataRaceBench project contributors. See the DataRaceBench/COPYRIGHT file for details.
!!!
!!! SPDX-License-Identifier: (BSD-3-Clause)
!!!~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~!!!

!A loop with loop-carried anti-dependence.
!Data race pair: a[i+1]@25:9 vs. a[i]@25:16

program DRB001_antidep1_orig_yes
use omp_lib
    implicit none
    integer :: i, len
    integer :: a(1000)

    len = 1000

    do i = 1, len
        a(i) = i
    end do

    !$omp parallel do
    do i = 1, len-1
        a(i) = a(i+1) + 1
    end do
    !$omp end parallel do

    print 100, a(500)
    100 format ('a(500)=',i3)
end program

Start by checking that the program compiles successfully on your machine. Coderrect works by intercepting compiler calls. If the code cannot be compiled, Coderrect cannot run it’s analysis. The DRB001 benchmark can be built with the following command

gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001
./DRB001

You should see output that looks something like:

a(500)=502

Although your results may vary because there is a data race in this code. In the parallel for loop, the value of each iteration depends on the next iteration i+1.

This means that thread 0 could be executing a[0] = a[1] + 1 at the same time thread 1 is running a[1] = a[2] + 1. Both threads are accessing a[1] in parallel, causing a data race.


Run Coderrect

The easiest way to run the tool is by passing the build command to coderrect:

coderrect -t gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001

Remember, the command to build DRB001 was gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001.

the -t switch is used to generate a quick summary report in terminal.

Calling coderrect in this way ensures all the required compilation flags can be passed on the command line. For a project using a build system such as makecoderrect tool can be called with the same build command used to build the project. For an example: checkout out detecting races in a Makefile-based project.


Interpret the Results

The coderrect tool reports a quick summary of the most interesting races directly in the terminal for quick viewing. The tool also generates a more comprehensive report that can be viewed in a browser.

Terminal Report

The terminal report for DRB001 should look something like the following:

==== Found a data race between: 
line 25, column 0 in DRB001-antidep1-orig-yes.f95 AND line 25, column 1 in DRB001-antidep1-orig-yes.f95
Shared variable:
 at line 0 of 
 0|
Thread 1:
 23|    !$omp parallel do
 24|    do i = 1, len-1
>25|        a(i) = a(i+1) + 1
 26|    end do
 27|    !$omp end parallel do
>>>Stacktrace:
Thread 2:
 23|    !$omp parallel do
 24|    do i = 1, len-1
>25|        a(i) = a(i+1) + 1
 26|    end do
 27|    !$omp end parallel do
>>>Stacktrace:

Each reported race starts with a summary of where the race was found.

==== Found a race between: 
line 25, column 0 in DRB001-antidep1-orig-yes.f95 AND line 25, column 1 in DRB001-antidep1-orig-yes.f95

Next the report shows the variable name and location on which the race occurs. (Though this is sometimes not present for fortran programs)

Shared variable:
 ...

Going and finding the race location in the code may be a little tedious so the report also shows a preview of the file at that location.

Thread 1:
 23|    !$omp parallel do
 24|    do i = 1, len-1
>25|        a(i) = a(i+1) + 1
 26|    end do
 27|    !$omp end parallel do

The code snippet shows that this racing access is a write to a(i) as part of an OpenMP parallel for loop.

This is the race we expected to find for this DataRaceBench case. You can try running coderrect in the same way on the other DataRaceBench fortran benchmarks at dataracebench/micro-benchmarks-fortran.


HTML Report

The terminal is great to get a quick idea about what races are reported, but the full report can be viewed in a browser.

We can also save the race report to a directory specified via the command option-o <dir>

coderrect -o report gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001

This created a directory named report and a file named index.html within that directory. To view the full report open the index.html file in a browser.

Powered by BetterDocs