Advanced Cases

Advanced Cases #

Analyze static/dynamic library code #

How to detect races in library code without a “main” method? This tutorial showcases this feature with a simple RingBuffer library libringbuffer.a, available on github.

This tutorial assumes that you have gone through one of the three starter case tutorials and have successfully run Coderrect.

Checkout the code and run coderrect with the following commands:

$ git clone https://github.com/coderrect/tutorial.git
$ cd tutorial/ringbuffer && cmake .
$ coderrect -t make

Coderrect will detect two public APIs in this library

1) RingBuffer::Consume       2) RingBuffer::Publish    

Please select APIs by entering their numbers or names (e.g. 1,2,RingBuffer::Consume,RingBuffer::Publish): 

In the terminal, type 1,2,3 and press Enter, Coderrect will report a race:

==== Found a race between: 
line 24, column 13 in ringbuffer_lib.cpp AND line 31, column 13 in ringbuffer_lib.cpp
Thread 1: 
 22|            buffer_[write_pos_] = value;
 23|            write_pos_++;
>24|            available_++;
 25|            return true;
 26|        }
>>>Stack Trace:
>>>pthread_create
>>>  coderrect_cb.1
>>>    RingBuffer::Publish(int)
Thread 2: 
 29|    }
 30|    bool RingBuffer::Consume(int *r) {
>31|        if (available_ == 0) {
 32|            return false;
 33|        }
>>>Stack Trace:
>>>pthread_create
>>>  coderrect_cb.2
>>>    RingBuffer::Consume(int*)
detected 1 races in total.

Behind the scene, Coderrect “simulates” the library’s behavior that RingBuffer::Publish and RingBuffer::Consume can be executed simultaneously by multiple threads. Even though there exists no “main” function, Coderrect will create two concurrent threads to invoke each API and detects the race.


Configuring Entry Points 

The library contains two public APIs: “RingBuffer::Publish” and “RingBuffer::Consume”, which can be called by multiple threads on a shared RingBuffer. To detect races in this library, users can also specify these APIs as entry points in a configuration file “.coderrect.json

// .coderrect.json
{
  "entryPoints": [
    "RingBuffer::Publish",
    "RingBuffer::Consume"
  ]
}

Note: the namespace “RingBuffer::” can be skipped when no ambiguity exists.

Place the above configuration file under where you run Coderrect. Then run coderrect -t make again. You will see that the same race is reported.


More Advanced Option

Even better, configuring the entry points is unnecessary, but these entry points can be discovered automatically by Coderrect with the option -racedetect.analyzeApi. 

$ coderrect -t -racedetect.analyzeApi make

You will see that the same race as before is reported without any configuration. 

Caveat: Coderrect currently does not infer if an API must be executed before another. If that’s the case, false positives may be generated. To avoid them, explicitly configure entry points. 


The full library code is shown below.

//"ringbuffer_lib.h"
#ifndef RINGBUFFER_LIB_H
#define RINGBUFFER_LIB_H

#include <cstddef>
#include <stdexcept>
    class RingBuffer {
    private:
        int *buffer_;
        size_t write_pos_;
        size_t available_;
        size_t capacity_;

    public:
        RingBuffer(size_t capacity);
        ~RingBuffer();

        bool Publish(int value);
        bool Consume(int *r);
    };

#endif //RINGBUFFER_LIB_H
//"ringbuffer_lib.cpp"
  
#include "ringbuffer_lib.h"

    RingBuffer::RingBuffer(size_t capacity) : capacity_(capacity) {
        if (capacity == 0)
            throw std::invalid_argument("capacity must be greater than 0");

        buffer_ = new int[capacity];
        available_ = 0;
        write_pos_ = 0;
    }
    RingBuffer::~RingBuffer() {
        if (buffer_ != nullptr)
            delete[] buffer_;
    }
    bool RingBuffer::Publish(int value) {
        if (available_ < capacity_) {
            if (write_pos_ >= capacity_) {
                write_pos_ = 0;
            }
            buffer_[write_pos_] = value;
            write_pos_++;
            available_++;
            return true;
        }

        return false;
    }
    bool RingBuffer::Consume(int *r) {
        if (available_ == 0) {
            return false;
        }
        int next_slot = write_pos_ - available_;
        if (next_slot < 0) {
            next_slot += capacity_;
        }
        *r = buffer_[next_slot];
        available_--;
        return true;
    }

Here is the file “CMakeLists.txt“:

cmake_minimum_required(VERSION 3.13)
project(ringbuffer)
set(CMAKE_CXX_STANDARD 11)
find_package(Threads)
add_library(ringbuffer STATIC
            ringbuffer_lib.cpp
            ringbuffer_lib.h)

Coderrect Fast and Exhaust Modes #

At different phases of the software development, developers may have different time and resource constraints. When editing code in an IDE, users may like to run Coderrect repeatedly and see the results as fast as possible. When testing code before a release, users may want to run Coderrect overnight on machines with large memory and core counts to exhaustively check all possible errors. 


Pay-as-you-go detection

To fit in different use scenarios, Coderrect offers pay-as-you-go detection: the more time you afford to run Coderrect, the better results, i.e., more code coverage and more precise results, you will likely get.

Coderrect has three modes: default, fast, and exhaust. These modes can be configured by the -mode option:

$ coderrect -mode=[default|fast|exhaust] 

Default mode aims to fit most uses. It exploits a balance between performance, code coverage and precision. The default mode is pretty fast and scalable: it can analyze over a million lines of C/C++ code such as the Linux kernel in less than ten minutes. 

Fast mode is optimized for speed. For large code base it can be many times faster than the default mode, even though it may possibly lose some analysis precision. At code development time, users may prefer to run Coderrect with “-mode=fast” to turn on the fast mode to reduce the waiting time. 

$ coderrect -mode=fast make

Exhaust mode is optimized for coverage and will exhaustively check all possible races under all code paths, all files and modules, as well as all call chains and dependencies. 

At nightly build time, you may prefer to run Coderrect with “-mode=exhaust” to find all possible race conditions as precise as possible when there is enough time to run (e.g., eight hours): 

$ coderrect -mode=exhaust make

Sample Performance Results on Redis

The following shows the performance of the fast mode on analyzing the Redis server:

The fast mode finishes in less than two minutes and detects 12 races:

The following shows the performance of the exhaust mode on analyzing the Redis server:

The exhaust mode finishes in around 75 mins and detects 22 races:


Blacklist and rank race conditions through configuration #

How to config Coderrect to intentionally ignore certain race conditions or rank them by priority? This tutorial showcases this feature on an open-source project: memcached.

This tutorial assumes that you have gone through one of the three starter case tutorials and have successfully run Coderrect.

Run the following commands to clone the Git repository of memcached, install the libevent dependency, configure the build, and then run Coderrect:

$ git clone https://github.com/memcached/memcached.git
$ cd memcached
$ apt-get install libevent-dev autotools-dev automake
$ ./autogen.sh
$ ./configure
$ coderrect -t make

Detecting races

In a few seconds, you are expected to see a list of executables built by memcached:

The project creates multiple executables. Please select one from the list below
to detect races. 
In the future, you can specify the executable using the option "-e" if you know 
which ones you want to analyze. 
    coderrect -e your_executable_name1,your_executable_name2 your_build_command_line
 1) timedrun
 2) sizes
 3) memcached-debug
 4) memcached
 5) testapp

Select 3 to detect races in memcached-debug. Coderrect will report:

line 128, column 9 in assoc.c AND line 107, column 28 in assoc.c
Shared variable: 
hashpower at line 35 of assoc.c
 35|unsigned int hashpower = HASHPOWER_DEFAULT;
Thread 1: 
 126|        if (settings.verbose > 1)
 127|            fprintf(stderr, "Hash table expansion startingn");
>128|        hashpower++;
 129|        expanding = true;
 130|        expand_bucket = 0;
>>>Stack Trace:
>>>pthread_create [assoc.c:277]
>>>  assoc_maintenance_thread [assoc.c:277]
>>>    assoc_expand [assoc.c:256]
Thread 2: 
 105|
 106|    if (expanding &&
>107|        (oldbucket = (hv & hashmask(hashpower - 1))) >= expand_bucket)
 108|    {
 109|        pos = &old_hashtable[oldbucket];
>>>Stack Trace:
>>>pthread_create [crawler.c:505]
>>>  item_crawler_thread [crawler.c:505]
>>>    crawler_expired_eval [crawler.c:419]
>>>      do_item_unlink_nolock [crawler.c:219]
>>>        assoc_delete [items.c:531]
>>>          _hashitem_before [assoc.c:172]
...

...
...
...
detected 25 races in total.

So Coderrect detected 23 shared data races, 2 mismatched API issues, 4 TOCTOU. However, the races reported might be false positives.


Blacklisting races

To ignore race detections in certain part of the code that you don’t care about, you can configure Coderrect in three ways: 

1) ignore the function entirely through option “ignoreRacesInFunctions”,

2) ignore certain locations by file name and line number through option “ignoreRacesAtLocations”, and

3) ignore certain variables by their name through option “ignoreRaceVariables”.

All you need to do is to write a config file named coderrect.json

// .coderrect.json
{
  "ignoreRacesInFunctions": [
        "assoc_expand",
        "logger_create"
    ],
  "ignoreRacesAtLocations": [
        "items.c:1277",
        "extstore.c:493"
    ],
  "ignoreRaceVariables": [
        "crawlers",
        "wrap"
    ]
}

and put it under the path where you run Coderrect (In this case, it is the root path of memcached).

The config file above allows Coderrect to bypass all methods whose name matches “assoc_expand” or “logger_create”, ignore potential races in file “items.c” line “1277” and file “extstore.c” line “493“, and on global variables “crawlers” and “wrap“.

Now rerun Coderrect

$ make clean && coderrect -t make

Only 19 shared data races, 2 mismatched API issues, 3 TOCTOU were reported this time (and all of them are likely real true races):


Configure race priority

If you think certain races are more critical or less critical than others, you can configure them through the options “lowPriorityRaces“, “lowPriorityFiles“, and “highPriorityRaces“:

"highPriorityRaces": [
        "heads",
        "tails"
    ],
"lowPriorityRaces": [
        "stats*",
        "settings*"
    ]

The config above allows Coderrect to rank races whose variable’s names match “heads” and “tails” higher in the report, and those match “stats*”and “settings*” lower in the report. 

"lowPriorityFiles": [
        "*debug*"
    ],
"highPriorityFiles": [
        "*mem*"
    ]

The config above allows Coderrect to rank races whose file names match “*mem*” higher in the report, and those match “*debug*” lower in the report.


Can They Run in Parallel? A Must-Not-Run-In-Parallel Analysis #

This tutorial assumes that you have gone through one of the three starter case tutorials and have successfully run Coderrect.

While locks and semaphores are frequently used to prevent data races in critical sections, developers also use standard synchronizations such as condition variables and barriers to prevent threads from running in parallel on a pair of functions. Coderrect performs a pretty precise “must-not-run-in-parallel” analysis that takes into consideration most common synchronization APIs such as pthread fork, join, wait, signal, broadcast, barrier, etc, as well as mutually exclusive branch conditions. More technical details can be found in a research paper.

If the user’s code relies on customized synchronizations to prevent threads from running in parallel, such as the once-only execution in the Open vSwitch OVS project. Coderrect may not recognize them and hence may report false warnings.

To address the issue, Coderrect supports user-specified not-run-in-parallel APIs through the configuration file .coderrect.json.

For example, adding the following to .coderrect.json will help Coderrect to recognize function1 and function2 cannot run in parallel. Thus Coderrect will not report data races between these two functions (as well as code called from them).

//.coderrect.json"
"notParallelFunctionPairs": {
    "function1" : "function2"
}

You can add multiple pairs of such functions and can also use wildcard (*) to match with any function. For example, bioInit and bioProcessBackgroundJobs are one pair of not-run-in-parallel functions,  ovsrcu_init_module and ovsrcu_init_module are another pair, meaning that ovsrcu_init_module can only be executed once and it cannot race with itself.

taosThreadToOpenNewFile and * are another pair, meaning that taosThreadToOpenNewFile can only be executed in a sequential environment and thus cannot race with any other function.

"notParallelFunctionPairs": {
    "bioInit":"bioProcessBackgroundJobs",
    "ovsrcu_init_module":"ovsrcu_init_module",
    "taosThreadToOpenNewFile" : "*"
}

Note that for each pair declared in notParallelFunctionPairs, the order of the specified APIs are interchangeable.

Powered by BetterDocs