{"id":2415,"date":"2020-08-28T11:13:34","date_gmt":"2020-08-28T17:13:34","guid":{"rendered":"https:\/\/coderrect.com\/?p=2415"},"modified":"2022-04-18T09:15:45","modified_gmt":"2022-04-18T15:15:45","slug":"100x-faster-pi-calculation-with-one-line-of-parallelization","status":"publish","type":"post","link":"https:\/\/coderrect.com\/100x-faster-pi-calculation-with-one-line-of-parallelization\/","title":{"rendered":"100x Faster Calculation of Pi with just One Line of Code?"},"content":{"rendered":"
\n\t\n\t\n\t\n\t
\n\t\t\n\t\t

Archives<\/h1>\n\t<\/div>\n<\/header>\n\n\n

<\/h3>\n\n\n\n

Every year on March 14th (3\/14), we celebrate PI (Greek letter \u201c\u03c0\u201d) day. Most people know that value of \u03c0 is approximately 3.14 – and some are able to remember it to a few more decimal places – but do you know how to calculate it?<\/p>\n\n\n\n

 In this article, we break down a simple numeric method to calculate \u03c0 using C. We also show you how to make the code 100x faster by adding a single line of OpenMP<\/a> parallelization.<\/p>\n\n\n\n

<\/p>\n\n\n\n

\"\"
Pi numeric integration<\/figcaption><\/figure><\/div>\n\n\n\n

The formula shown above uses numeric integration to calculate \u03c0. When N is large enough, the calculated value will be infinitely close to \u03c0.<\/p>\n\n\n\n

The code shown below implements this numeric method in C:<\/p>\n\n\n\n

#include <stdio.h>\n#define N 1000000000\nint main () {\n    double delta = 1.0\/(double) N;\n    double sum = 0.0;\n    for (int i=0; i < N; i++) {\n        double x = (i+0.5)*delta;\n        sum += 4.0 \/ (1.0+x*x);\n    }\n    double pi = delta * sum;\n    printf(\"PI = %.12g\\n\", pi);\n}<\/pre>\n\n\n\n

Compile and run this code in your terminal:<\/p>\n\n\n\n

$ gcc pi.c -o pi\n$ .\/pi<\/pre>\n\n\n\n

It will print the result:<\/p>\n\n\n\n

PI = 3.14159265359<\/pre>\n\n\n\n

The calculation takes around 3-6 seconds using a standard computer.<\/p>\n\n\n\n


\n\n\n\n

100X Speedup with OpenMP Parallelization<\/h2>\n\n\n\n

There are many ways to make the above code run faster. But one extremely<\/em> easy way is to use OpenMP parallelization<\/a>: adding the following line right before the \u201cfor\u201d loop.<\/p>\n\n\n\n

#pragma omp parallel for reduction(+:sum)<\/pre>\n\n\n\n

The line above looks like a comment in the code. However, it actually tells the compiler to parallelize the \u201cfor\u201d loop by using multithreading; this way the code will run much faster than before when ran on a multicore machine.<\/p>\n\n\n\n

The code shown below is the full version, and we added an outer loop to simulate different numbers of threads:<\/p>\n\n\n\n

#include <omp.h>\n#include <stdio.h>\n#define N 1000000000\nint main () {\n    double delta = 1.0\/(double) N;\n    int MAX_THREADS = omp_get_max_threads();\n    \/\/ Compute parallel compute times for 1-MAX_THREADS\n    for (int j=1; j<= MAX_THREADS; j++) {\n        printf(\" running on %d threads: \", j);\n        omp_set_num_threads(j);\n        double sum = 0.0;\n        double start = omp_get_wtime();\n        #pragma omp parallel for reduction(+:sum)\n        for (int i=0; i < N; i++) {\n            double x = (i+0.5)*delta;\n            sum += 4.0 \/ (1.0+x*x);\n        }\n        \/\/ Out of the parallel region, finalize computation\n        double pi = delta * sum;\n        double time_lapse = omp_get_wtime() - start;\n        printf(\"PI = %.12g computed in %.4g seconds\\n\", pi, time_lapse);\n    }\n}<\/pre>\n\n\n\n

If you pass the flag \u201c-fopenmp\u201d to gcc, and run the code above with:<\/p>\n\n\n\n

$ gcc -fopenmp pi.c -o pi\n$ .\/pi<\/pre>\n\n\n\n

It will print the following results:<\/p>\n\n\n\n

running on 1 threads: PI = 3.14159265359 computed in 3.4 seconds\nrunning on 2 threads: PI = 3.14159265359 computed in 1.705 seconds\nrunning on 3 threads: PI = 3.14159265359 computed in 1.137 seconds\nrunning on 4 threads: PI = 3.14159265359 computed in 0.858 seconds\nrunning on 5 threads: PI = 3.14159265359 computed in 0.687 seconds\nrunning on 6 threads: PI = 3.14159265359 computed in 0.574 seconds\nrunning on 7 threads: PI = 3.14159265359 computed in 0.4946 seconds\nrunning on 8 threads: PI = 3.14159265359 computed in 0.4324 seconds\n...\nrunning on 126 threads: PI = 3.14159265359 computed in 0.03336 seconds\nrunning on 127 threads: PI = 3.14159265359 computed in 0.03072 seconds\nrunning on 128 threads: PI = 3.14159265359 computed in 0.0281 seconds<\/pre>\n\n\n\n

As we can see, the code calculates the same result, but it runs more than 100X faster than before when running 128 threads.<\/p>\n\n\n\n


\n\n\n\n

reduction(+:sum)<\/h2>\n\n\n\n

Pay attention to reduction(+:sum)<\/code>. It is very important here: it is a reduction clause<\/strong> provided by OpenMP and it tells the compiler that sum<\/code> is a reduction variable on the +<\/code> operator. The compiler will then put the +<\/code> operation on sum<\/code>  in a critical section<\/strong> so that the final stores to sum<\/code> from different threads will be properly synchronized and correctly summed.<\/p>\n\n\n\n

If reduction(+:sum)<\/code> is not added \u2014 a common mistake for OpenMP beginners \u2014 the code will not only calculate an incorrect result but will also run slower than without parallelization:<\/p>\n\n\n\n

running on 1 threads: PI = 3.14159265359 computed in 3.403 seconds\nrunning on 2 threads: PI = 1.43071963533 computed in 5.701 seconds\nrunning on 3 threads: PI = 1.34809292029 computed in 5.424 seconds\nrunning on 4 threads: PI = 1.08601649484 computed in 5.929 seconds\nrunning on 5 threads: PI = 0.903822885536 computed in 6.059 seconds\nrunning on 6 threads: PI = 0.62312876914 computed in 5.552 seconds\nrunning on 7 threads: PI = 0.596798835465 computed in 5.976 seconds\nrunning on 8 threads: PI = 0.50340501324 computed in 6.086 seconds<\/pre>\n\n\n\n

When reduction(+:sum)<\/strong> is not added, a data race exists on the shared variable sum<\/strong>. It is this data race that leads to both incorrect results and poor cache performance.<\/p>\n\n\n\n

To give readers peace of mind, we\u2019d like to introduce Coderrect<\/a> as an easy solution to catch this race, and others like it. Next up, we show how to use Coderrect with this example.<\/p>\n\n\n\n


\n\n\n\n

Prerequisites<\/h2>\n\n\n\n

This tutorial assumes you have successfully installed the Coderrect software following our quick start<\/a> guide.<\/p>\n\n\n\n


\n\n\n\n

Detect the race using Coderrect<\/h2>\n\n\n\n

Compile the code with coderrect<\/code>:<\/p>\n\n\n\n

$ coderrect -t gcc -fopenmp pi.c -o pi<\/pre>\n\n\n\n

The coderrect -t<\/code> command will compile and analyze the program, with the flag -t<\/code> printing the race report to the terminal.<\/p>\n\n\n\n

==== Found a data race between:\nline 22, column 13 in pi.c AND line 22, column 17 in pi.c\nShared variable:\nat line 16 of pi.c\n16| double sum = 0.0;\nThread 1:\n20| for (int i=0; i < N; i++) {\n21| double x = (i+0.5)*delta;\n>22| sum += 4.0 \/ (1.0+x*x);\n23| }\n24|\n>>>Stacktrace:\nThread 2:\n20| for (int i=0; i < N; i++) {\n21| double x = (i+0.5)*delta;\n>22| sum += 4.0 \/ (1.0+x*x);\n23| }\n24|\n>>>Stacktrace:\n\n1 OpenMP races<\/pre>\n\n\n\n
\n\n\n\n

Interpret the Results<\/h2>\n\n\n\n

The reported race starts with a summary of where the race was found.<\/p>\n\n\n\n

==== Found a data race between:\nline 22, column 13 in pi.c AND line 22, column 17 in pi.c<\/pre>\n\n\n\n

Next, the report shows the name and location of the variable on which the race occurs.<\/p>\n\n\n\n

Shared variable:\nat line 16 of pi.c\n16| double sum = 0.0;<\/pre>\n\n\n\n

This output shows that the race occurs on the variable sum<\/code> allocated at line 16 and reports information about the two unsynchronized accesses to sum<\/code>.<\/p>\n\n\n\n

20| for (int i=0; i < N; i++) {\n21| double x = (i+0.5)*delta;\n>22| sum += 4.0 \/ (1.0+x*x);\n23| }\n24|<\/pre>\n\n\n\n

The code snippet shows that line 22 has racy accesses, both read and write, to sum<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"

Every year on March 14th (3\/14), we celebrate PI (Greek letter \u201c\u03c0\u201d) day. Most people know that value of \u03c0 is approximately 3.14 – and some are able to remember it to a few more<\/p>\n","protected":false},"author":182999569,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"jetpack_anchor_podcast":"","jetpack_anchor_episode":"","jetpack_anchor_spotify_show":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[1396],"tags":[1369,1371,1364,1362,1368,1374,1402,1397,1375,1398,1365,1363,1366,1373,1367,1372],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_likes_enabled":false,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pbGzvs-CX","_links":{"self":[{"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/posts\/2415"}],"collection":[{"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/users\/182999569"}],"replies":[{"embeddable":true,"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/comments?post=2415"}],"version-history":[{"count":18,"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/posts\/2415\/revisions"}],"predecessor-version":[{"id":5839,"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/posts\/2415\/revisions\/5839"}],"wp:attachment":[{"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/media?parent=2415"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/categories?post=2415"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/coderrect.com\/wp-json\/wp\/v2\/tags?post=2415"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}