Starter Cases #
Detect OpenMP Races #
This tutorial shows how to use Coderrect to detect races in a single file multi-threaded C program written with OpenMP.
Prerequisites
- This tutorial assumes you have successfully installed the Coderrect software following the quick start.
- Our sample code relies on OpenMP to achieve parallelism. You will need a compiler that supports OpenMP. We will be using gcc, but clang and other modern alternatives also have OpenMP support.
Detect a race in pi.c
We will start by detecting a race in a small single file program pi.c. The program is designed to compute pi=3.1415926…. We provide this example program in the Examples directory under Coderrect installation, or you can copy the code below to your system.
//pi.c #include <omp.h> #include <stdio.h> #define N 1000000000 int main () { double delta = 1.0/(double) N; int MAX_THREADS = omp_get_max_threads(); // Compute parallel compute times for 1-MAX_THREADS for (int j=1; j<= MAX_THREADS; j++) { printf(" running on %d threads: ", j); omp_set_num_threads(j); double sum = 0.0; double start = omp_get_wtime(); #pragma omp parallel for //reduction(+:sum) for (int i=0; i < N; i++) { double x = (i+0.5)*delta; sum += 4.0 / (1.0+x*x); } // Out of the parallel region, finalize computation double pi = delta * sum; double time_lapse = omp_get_wtime() - start; printf("PI = %.12g computed in %.4g secondsn", pi, time_lapse); } }
Check that pi.c can be compiled and run with the following commands
gcc -fopenmp pi.c -o pi ./pi
You should see output that looks something like:
running on 1 threads: PI = 3.141592653589971 computed in 12.84 seconds running on 2 threads: PI = 3.141593993623682 computed in 6.928 seconds running on 3 threads: PI = 3.141594228301372 computed in 7.741 seconds running on 4 threads: PI = 3.141595112697573 computed in 8.376 seconds
As you can see from the results, running on different number of threads generated different values of PI, indicating the existence of a concurrency bug.
Run Coderrect
The easiest way to run the tool is by passing the build command to coderrect:
coderrect -t gcc -fopenmp pi.c
Remember, the command to build pi.c was gcc -fopenmp pi.c.
-t switch is used to generate a quick summary report in terminal.
Calling coderrect in this way ensures all the required compilation flags can be passed on the command line. For a project using a build system such as make, coderrect tool can be called with the same build command used to build the project. For an example: checkout out detecting races in a Makefile-based project.
Interpret the Results
The coderrect tool reports a quick summary of the most interesting races directly in the terminal for quick viewing. The tool also generates a more comprehensive report that can be viewed in a browser.
Terminal Report
The terminal report for pi.c should look something like the following:
==== Found a race between: line 22, column 13 in test.c AND line 22, column 17 in test.c Shared variable: at line 16 of test.c 16| double sum = 0.0; Thread 1: 20| for (int i=0; i < N; i++) { 21| double x = (i+0.5)*delta; >22| sum += 4.0 / (1.0+x*x); 23| } 24| >>>Stacktrace: Thread 2: 20| for (int i=0; i < N; i++) { 21| double x = (i+0.5)*delta; >22| sum += 4.0 / (1.0+x*x); 23| } 24| >>>Stacktrace: 1 OpenMP races
Each reported race starts with a summary of where the race was found.
==== Found a race between: line 22, column 13 in test.c AND line 22, column 17 in test.c
Next the report shows the variable name and location on which the race occurs.
Shared variable: at line 16 of test.c 16| double sum = 0.0;
This shows that the race occurs on the variable sum declared on line 16.
Going and finding the race location in the code may be a little tedious so the report also shows a preview of the file at that location.
Thread 1: 20| for (int i=0; i < N; i++) { 21| double x = (i+0.5)*delta; >22| sum += 4.0 / (1.0+x*x); 23| }
The code snippet shows that this racing access is a write to sum as part of an OpenMP parallel for loop.
Taking a closer look at the source code we can see the root cause is the commented out “reduction”.
#pragma omp parallel for //reduction(+:sum)
Un-commenting reduction(+:sum) removes the data race on sum and allows the program to calculate pi correctly.
HTML Report
The terminal is great to get a quick idea about what races are reported, but the full report can be viewed in a browser.
We can also save the race report to a directory specified via the command option-o <dir>.
coderrect -o report gcc -fopenmp pi.c
This created a directory named report and a file named index.html within that directory. To view the full report open the index.html file in a browser.
Detect pthread races #
This tutorial shows how to use Coderrect to detect races in a single file multi-threaded C++ program written with the POSIX threads (Pthreads) library.
coderrect g++ pthread-race.cc -lpthread
Prerequisites
This tutorial assumes you have successfully installed the Coderrect software following the quick start.
Detecting races in pthread-race.cc
Our sample code pthread-race.cc
starts two children threads in the main method. Each child thread increments a shared variable “x” by one, but there is a race.
Copy the source for pthread-test.cc below to your system.
#include <pthread.h> #include <cstdlib> #include <iostream> using namespace std; int x = 0; void *PrintHello(void *threadid) { long tid; tid = (long)threadid; x++; cout << "Hello World! Thread ID, " << tid << endl; pthread_exit(NULL); } pthread_t load_data_in_thread(long id) { pthread_t thread; void *arg = (void *)id; int rc = pthread_create(&thread, NULL, PrintHello, arg); if (rc) { cout << "Error:unable to create thread," << rc << endl; exit(-1); } return thread; } int main() { pthread_t thread1, thread2; cout << "main() : creating thread 1 " << endl; thread1 = load_data_in_thread(1); cout << "main() : creating thread 2 " << endl; thread2 = load_data_in_thread(2); pthread_join(thread1, 0); // pthread_join(thread2,0); cout << "Final value of x: " << x << endl; }
Check that pthread-race.cc
can be compiled and run with the following commands
g++ pthread-race.cc -lpthread ./a.out
If run ./a.out
multiple times, you should see output that looks something like:
main() : creating thread 1 main() : creating thread 2 Hello World! Thread ID, 1 Final value of x: 1 Hello World! Thread ID, 2
Run Coderrect
The easiest way to get started on a single file is to run
coderrect -t g++ pthread-race.cc -lpthread
-t switch is used to generate a quick summary report in terminal, for more information on Coderrect configurations, please check out reference page.
This will automatically detect the races and report the follow in the terminal:
==== Found a race between: line 12, column 5 in pthread-race.cc AND line 38, column 41 in pthread-race.cc Shared variable: at line 7 of pthread-race.cc 7|int x =0; Thread 1: pthread-race.cc@12:5 Code snippet: 10| long tid; 11| tid = (long)threadid; >12| x++; 13| cout << "Hello World! Thread ID, " << tid << endl; 14| pthread_exit(NULL); >>>Stack Trace: >>>PrintHello(void*) [pthread-race.cc:20] Thread 2: pthread-race.cc@38:41 Code snippet: 36| //pthread_join(thread2,0); 37| >38| cout << "Final value of x: " << x << endl; 39|} >>>Stack Trace: >>>main() >>> std::basic_ostream<char, std::char_traits<char> >& std::operator<<<std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) [pthread-race.cc:38] >>> std::char_traits<char>::length(char const*) [/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/ostream:562] detected 1 races in total. 2020/02/25 01:18:23 Generating the race report ... To check the race report, please open './index.html' in your browser
Interpret the Results
Each reported race starts with a summary of where the race was found.
==== Found a race between: line 12, column 5 in pthread-race.cc AND line 38, column 41 in pthread-race.cc
Next the report shows the name and location of the variable on which the race occurs.
Shared variable: at line 7 of pthread-race.cc 7|int x =0;
This shows that the race occurs on the variable x declared on line 7.
Next the tool reports information about the two unsynchronized accesses to x.
For each of the two accesses a location, code snippet, and stack trace is shown.
The location shows the file, line, and column of the access.
pthread-race.cc@12:5 pthread-race.cc@38:41
So the above access occurs in pthread-race.cc at line 12 column 5 and at line 38 column 41, respectively.
Going and finding this location in the code may be a little tedious so the report also shows a preview of the file at that location.
Code snippet: 10| long tid; 11| tid = (long)threadid; >12| x++; 13| cout << "Hello World! Thread ID, " << tid << endl; 14| pthread_exit(NULL);
The code snippet shows that this access is an unsynchronized write to x in each child thread.
The last piece of information shown for each access is the stack trace.
>>>Stack Trace: >>>PrintHello(void*) [pthread-race.cc:20] >>>main() >>> std::basic_ostream<char, std::char_traits<char> >& std::operator<<<std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) [pthread-race.cc:38] >>> std::char_traits<char>::length(char const*) [/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/ostream:562]
The stack trace shows the call stack under which the racing access occurred.
Each line in the call stack shows the name of the function, and the location the function was called from.
In the example above the stack trace shows PrintHello being called from line 20 in pthread-race.cc.
HTML Report
The full report can be viewed in a browser.
Detect races in GPU kernels #
This tutorial showcases Coderrect on detecting block-level and warp-level race hazards in GPU/CUDA kernels. Both examples are from NVIDIA’s official documentation: block_error.cu and warp_error.cu. Note that if you do not already have nvidia-cuda-toolkit installed on your machine you must use the command listed below.
$ sudo apt install nvidia-cuda-toolkit $ coderrect -t nvcc block_error.cu
==== Found a race between: line 9, column 5 in block_error.cu AND line 14, column 25 in block_error.cu Shared variable: smem at line 3 of block_error.cu 3|__shared__ int smem[THREADS]; Thread 1: 7|{ 8| int tx = threadIdx.x; >9| smem[tx] = data_in[tx] + tx; 10| 11| if (tx == 0) { >>>Stack Trace: Thread 2: 12| *sum_out = 0; 13| for (int i = 0; i < THREADS; ++i) >14| *sum_out += smem[i]; 15| } 16|} >>>Stack Trace: The OpenMP region this bug occurs: /CUDA/benchmarks/t/block_error.cu >27| sumKernel<<<1, THREADS>>>(data_in, sum_out); 28| cudaDeviceSynchronize(); 29| 30| cudaFree(data_in); 31| cudaFree(sum_out); 32| return 0; Gets called from: >>>main detected 1 races in total. To check the race report, please open '/CUDA/benchmarks/t/.coderrect/report/index.html' in your browser
$ coderrect -t nvcc warp_error.cu
==== Found a race between: line 12, column 5 in wrap_error.cu AND line 19, column 32 in wrap_error.cu Shared variable: smem_first at line 5 of warp_error.cu 5|__shared__ int smem_first[THREADS]; Thread 1: 10|{ 11| int tx = threadIdx.x; >12| smem_first[tx] = data_in[tx] + tx; 13| //__syncwarp(); 14| if (tx % WARP_SIZE == 0) { >>>Stack Trace: Thread 2: 17| smem_second[wx] = 0; 18| for (int i = 0; i < WARP_SIZE; ++i) >19| smem_second[wx] += smem_first[wx * WARP_SIZE + i]; 20| } 21| >>>Stack Trace: The OpenMP region this bug occurs: /CUDA/benchmarks/t/warp_error.cu >40| sumKernel<<<1, THREADS>>>(data_in, sum_out); 41| cudaDeviceSynchronize(); 42| 43| cudaFree(data_in); 44| cudaFree(sum_out); 45| return 0;
Note that in the above code line 13 is commented out, which disables the warp-level synchronization.
13| //__syncwarp();
If line 13 is uncommented, the race will be fixed, and the tool will report no races:
detected 0 races in total.
Detect races in a Makefile-based project #
This tutorial shows how to use Coderrect to detect races in a Makefile-based project using pbzip2-0.9.4 as the example. pbzip2 is a parallel implementation of the bzip2 file compressor, and it contains a known race condition in version 0.9.4.
git clone https://github.com/sctbenchmarks/sctbenchmarks.git cd sctbenchmarks/1CB-0-2/pbzip2-0.9.4/bzip2-1.0.6 make clean make cd .. cd pbzip2-0.9.4 make clean coderrect make
Prerequisites
This tutorial assumes you have successfully installed the Coderrect software following the quick start.
Background
In pbzip2, the program will spawn consumer threads that (de)compress the input file and spawn an output thread that writes data to the output file. However, the main thread only joins the output thread but does not join the consumer threads. So there is an order violation bug between the time when the main thread is destroying resources and the time when consumer threads are using the resources.
An interleaving that triggers the error looks like:
void main(...) { ... for (i=0; i < numCPU; i++) { ret = pthread_create(..., consumer, fifo); ... } ret = pthread_create(..., fileWriter, OutFilename); ... // start reading in data producer(..., fifo); ... // wait until exit of thread pthread_join(output, NULL); ... fifo->empty = 1; ... // reclaim memory queueDelete(fifo); fifo = NULL; }
void *decompress_consumer(void *q) { ... for (;;) { pthread_mutex_lock(fifo->mut); ... } }
Since queueDelete will release the fifo queue used by consumer threads, the access on fifo->mut will result in a segmentation fault.
Detect the race using Coderrect
Make sure the code can be compiled
cd sctbenchmarks/1CB-0-2/pbzip2-0.9.4/pbzip2-0.9.4 make
You should see a pbzip2 executable under the same folder.
Run Coderrect
make clean coderrect -t -o report make
NOTE: The make clean command is to ensure there is no pre-built binaries so that Coderrect is able to analyze every piece of source code in the project.
The coderrect -t -o report make command will compile and analyze the problem, the reported races is stored under ./report directory as specified by -o option.
-t switch is used to generate a quick summary report in terminal.
Interpret the Results
The coderrect tool reports a quick summary of the most interesting races directly in the terminal for quick viewing. The tool also generates a more comprehensive report that can be viewed in the browser.
Terminal Report
The terminal races reports looks like following:
==== Found a race between: line 1048, column 3 in pbzip2.cpp AND line 553, column 28 in pbzip2.cpp Shared variable: at line 991 of pbzip2.cpp 991| q = new queue; Thread 1: pbzip2.cpp@1048:3 Code snippet: 1046| pthread_mutex_destroy(q->mut); 1047| delete q->mut; >1048| q->mut = NULL; 1049| } 1050| >>>Stack Trace: >>>main >>> queueDelete(queue*) [pbzip2.cpp:1912] Thread 2: pbzip2.cpp@553:28 Code snippet: 551| for (;;) 552| { >553| pthread_mutex_lock(fifo->mut); 554| while (fifo->empty) 555| { >>>Stack Trace: >>>pthread_create [pbzip2.cpp:1818] >>> consumer_decompress(void*) [pbzip2.cpp:1818]
Each reported race starts with a summary of where the race was found.
==== Found a race between: line 1048, column 3 in pbzip2.cpp AND line 553, column 28 in pbzip2.cpp
Next the report shows the name and location of the variable on which the race occurs.
Shared variable: at line 991 of pbzip2.cpp 991| q = new queue;
This shows that the race occurs on the variable queue allocated at line 991.
Next the tool reports information about the two unsynchronized accesses to queue.
For each of the two accesses a location, code snippet, and stack trace is shown.
The location shows the file, line, and column of the access.
Thread 1: pbzip2.cpp@1048:3
So the above access occurs in pbzip2.cpp at line 1048 column 3.
The report also shows a preview of the file at that location.
Code snippet: 1046| pthread_mutex_destroy(q->mut); 1047| delete q->mut; >1048| q->mut = NULL; 1049| } 1050|
The code snippet shows that this access is a write to q->mut (set it to NULL).
HTML Report
The terminal is great to get a quick idea about what races are reported, but the full report can be viewed in a browser.
Detect Fortran OpenMP Races #
This tutorial shows how to use Coderrect to detect races in a single file multi-threaded Fortran program written with OpenMP.
Prerequisites
- This tutorial assumes you have successfully installed the Coderrect software following the quick start.
- Our sample code relies on OpenMP to achieve parallelism. You will need a compiler that supports OpenMP. We will be using gfortran, but other modern alternatives should also have OpenMP support.
Detecting a race
We will be using a benchmark from DataRaceBench, a suite of OpenMP data race benchmarks designed to evaluate the effectiveness of data race detection tools developed by a group at Lawrence Livermore National Lab. DataRaceBench is full of great test cases (try using it to evaluate coderrect!).
We will be using the DRB001-antidep1-orig-yes.f95 case from DataRaceBench version 1.3.0.1. You can get the source code from the DataRaceBench repository on github, but we have included a snippet here for convenience.
!!!~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~!!! !!! Copyright (c) 2017-20, Lawrence Livermore National Security, LLC !!! and DataRaceBench project contributors. See the DataRaceBench/COPYRIGHT file for details. !!! !!! SPDX-License-Identifier: (BSD-3-Clause) !!!~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~!!! !A loop with loop-carried anti-dependence. !Data race pair: a[i+1]@25:9 vs. a[i]@25:16 program DRB001_antidep1_orig_yes use omp_lib implicit none integer :: i, len integer :: a(1000) len = 1000 do i = 1, len a(i) = i end do !$omp parallel do do i = 1, len-1 a(i) = a(i+1) + 1 end do !$omp end parallel do print 100, a(500) 100 format ('a(500)=',i3) end program
Start by checking that the program compiles successfully on your machine. Coderrect works by intercepting compiler calls. If the code cannot be compiled, Coderrect cannot run it’s analysis. The DRB001 benchmark can be built with the following command
gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001 ./DRB001
You should see output that looks something like:
a(500)=502
Although your results may vary because there is a data race in this code. In the parallel for loop, the value of each iteration i depends on the next iteration i+1.
This means that thread 0 could be executing a[0] = a[1] + 1 at the same time thread 1 is running a[1] = a[2] + 1. Both threads are accessing a[1] in parallel, causing a data race.
Run Coderrect
The easiest way to run the tool is by passing the build command to coderrect:
coderrect -t gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001
Remember, the command to build DRB001 was gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001.
the -t switch is used to generate a quick summary report in terminal.
Calling coderrect in this way ensures all the required compilation flags can be passed on the command line. For a project using a build system such as make, coderrect tool can be called with the same build command used to build the project. For an example: checkout out detecting races in a Makefile-based project.
Interpret the Results
The coderrect tool reports a quick summary of the most interesting races directly in the terminal for quick viewing. The tool also generates a more comprehensive report that can be viewed in a browser.
Terminal Report
The terminal report for DRB001 should look something like the following:
==== Found a data race between: line 25, column 0 in DRB001-antidep1-orig-yes.f95 AND line 25, column 1 in DRB001-antidep1-orig-yes.f95 Shared variable: at line 0 of 0| Thread 1: 23| !$omp parallel do 24| do i = 1, len-1 >25| a(i) = a(i+1) + 1 26| end do 27| !$omp end parallel do >>>Stacktrace: Thread 2: 23| !$omp parallel do 24| do i = 1, len-1 >25| a(i) = a(i+1) + 1 26| end do 27| !$omp end parallel do >>>Stacktrace:
Each reported race starts with a summary of where the race was found.
==== Found a race between: line 25, column 0 in DRB001-antidep1-orig-yes.f95 AND line 25, column 1 in DRB001-antidep1-orig-yes.f95
Next the report shows the variable name and location on which the race occurs. (Though this is sometimes not present for fortran programs)
Shared variable: ...
Going and finding the race location in the code may be a little tedious so the report also shows a preview of the file at that location.
Thread 1: 23| !$omp parallel do 24| do i = 1, len-1 >25| a(i) = a(i+1) + 1 26| end do 27| !$omp end parallel do
The code snippet shows that this racing access is a write to a(i) as part of an OpenMP parallel for loop.
This is the race we expected to find for this DataRaceBench case. You can try running coderrect in the same way on the other DataRaceBench fortran benchmarks at dataracebench/micro-benchmarks-fortran.
HTML Report
The terminal is great to get a quick idea about what races are reported, but the full report can be viewed in a browser.
We can also save the race report to a directory specified via the command option-o <dir>.
coderrect -o report gfortran -fopenmp DRB001-antidep1-orig-yes.f95 -o DRB001
This created a directory named report and a file named index.html within that directory. To view the full report open the index.html file in a browser.