HPC, what and why?
Over a century ago, the U.S. population had grown so large that it took several years to compute the U.S. Census results on a tabulating machine. Nowadays, it would take less than a second. The reason is simple: today’s computers are far faster than the machine 100 years ago.
People call the ability to compute at high speeds as High-performance computing (HPC). One of the best-known types of HPC solution is the supercomputer. Today, the world’s fastest supercomputer can compute over 4 x1017 complex calculations in one second — that is, more than four thousand million million times faster than a tabulating machine. HPC benefits the world in many ways, such as predicting weather in “real time”, enabling advanced artificial intelligence (AI), and finding drugs for COVID-19.
HPC programming is different, and is hard
I have a piece of C/C++/Python code running on my PC, but it is too slow. Can I run it on a supercomputer and get a faster compute result? Yes, that’s possible but some hard work is needed.
HPC programming is different from programming for a PC or a mobile phone. A supercomputer contains many compute nodes that work together to complete one or more tasks. This is called parallel processing. It’s similar to having millions of PCs networked together, combining compute power to complete tasks faster. Historically, parallel programming has been a very buggy process. Because concurrency bugs such as race conditions and deadlocks can be easily introduced into parallel programs, but these bugs are notoriously difficult to detect and debug.
These bugs are nasty. Weeks can easily be lost, if not months, tracking down a race condition at scale.
Jeff Huang, Professor of Computer Science, Texas A&M University
“You can port your code to run on a supercomputer and get good speedups using parallelization models such as OpenMP” says Jeff Huang, a professor of Computer Science at Texas A&M. “However, be cautious of race conditions. These bugs are nasty. You can get non-deterministic results that are non-reproducible because of them. Weeks can easily be lost, if not months, tracking down a race condition at scale.”
Coderrect: a new tool to rescue HPC code
Huang and a team of researchers recently published a new technique that helps ensure that HPC programs written with OpenMP are race-free — that is, the code will not have any erroneous race condition. The technique and tool, namely OMPRacer, was developed at Coderrect Inc. and will be presented at this year’s International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’20). OMPRacer is now a part of Coderrect, a powerful debugging tool for multithreaded programs.
“What’s really exciting about Coderrect is that it’s designed to catch all possible races at compile time so you can fix the races before the code actually runs,” says Huang. “You then have piece of mind before the code starts that it’s going to do the right thing and not to mess anything up because of a race condition.”
Because Coderrect does not rely on any run time information, it is also privacy preserving. The user’s data will never be disclosed to Coderrect.
“Despite that Coderrect knows nothing about the computing data at run time, it is very good at detecting races by only looking at the code itself and is extremely fast in doing so, thanks to a new type of data-flow analysis we invented in this work.” Huang adds.
Huang believes that Coderrect will benefit the HPC industry in a huge range of applications.
“From scientific simulations to drug discovery to countless data-driven applications in AI – they all need large amounts of computing resources,” says Huang. “HPC is becoming more and more critical.”
- Bradley Swain, Texas A&M University and Coderrect Inc.
- Yanze Li Texas A&M University
- Peiming Liu Texas A&M University
- Ignacio Laguna Lawrence Livermore National Laboratory
- Giorgis Georgakoudis Lawrence Livermore National Laboratory
- Jeff Huang Texas A&M University