Can We Detect and Debug Multi-Threading Issues?

I recently came across a question on StackOverflow asking, “How to detect and debug multi-threading problems?“. This is an interesting, albeit broad question. Those who took computer science classes in college likely remember how easy it is to quickly end up with cryptic errors or erratic behavior as soon as threads are introduced.

While there are some good answers by clearly smart people on the StackOverflow question, none really truly address the question being asked. Because there is no real answer.

Is it possible to detect and debug problems coming from multi-threaded code?

This is a sub question quoted from the StackOverflow question, and fortunately, it does have an answer. In theory, it is definitely possible to detect and debug problems coming from multi-threaded code. In practice, less so.

There are so many different types of concurrency bugs and each comes with its own techniques and challenges for detecting and debugging. One of the most commonly discussed type of concurrency bugs are data races, which involve two threads accessing the same memory in parallel.

Data races are notoriously difficult to detect and debug because they depend on a specific timing to be observed. Not only can you as the programmer not reliably control the timing, but trying to debug can even change the timing and prevent the race from occurring.

In my experience data races are often exposed only when some non-deterministic Heisenbug cannot be explained and someone manually inspects the code searching for a potential data race. The manual inspection can be tedious, but there is a concrete set of circumstances to search for.

A data race occurs when:

  1. Two threads access the same location in memory (with at least one access being a write)
  2. Both threads are running in parallel
  3. The accesses are un-synchronized (e.g. they are not guarded by a lock)

Once some non-deterministic behavior is observed, you can search through your code manually for some code that matches the three criteria listed above. This process is generally tedious and in complex software, there is not guarantee that manual effort alone will succeed.

Are there any special logging frameworks,debugging techniques, or code inspectors to solve multi-threading issues?

This question is a paraphrased quote from the original StackOverflow question, and is why I stated earlier that the question has no real answer.

While there are plenty of logging frameworks, debugging techniques, and code inspectors designed for multi-threaded issues, none solve the problem. Multi-threaded bugs remain among the most challenging problems faced by software today. With that said, there are some great tools that can be extremely helpful. There are a few good tools that can aid in data race detection specifically.

Tools like Intel Inspector and Google’s Thread Sanitizer can profile the execution of a program and report if a data race is observed at runtime. However, these tools rely on the data race actually occurring during the profiled execution.

As mentioned above, data races are notoriously difficult to detect and debug because they depend on a specific timing to be observed. If the race is not triggered during the execution, these tools will not report the data race.

We are working on a data race detector that addresses the limitations of tools like Intel inspector. Our tool, Coderrect Scanner, does not rely on profiling the execution, but scans the source code directly and reasons about any possible races. By not relying on actually executing the code, we are able to eliminate the most difficult part of detecting data races (their non-deterministic nature during execution).

Conclusion

Even in the year 2020, with all of the advancements in computing and software development, the question “How to detect and debug multi-threading problems?” is nearly unanswerable. Software is increasingly relying on parallelism, yet concurrency bugs remain one of the hardest problems in computing. As software moves towards a highly parallel future, developers will likely rely more and more on tools like Thread Sanitizer and the Coderrect Scanner.

Leave a Reply