Enabling the use of GPUs in a scientific application has the potential to increase performance by offloading computationally intensive parts to the accelerator. Although GPUs can in theory often deliver at least a hundred times more performance (in terms of floating-point operations per second) than conventional CPUs, porting an application to a GPU will not always guarantee that level of performance increase. It is widely accepted that GPU programming is more difficult than CPU programming; a consequence of this is often code that makes poor use of the available resources and thus can lead to suboptimal performance of GPU-based applications. In some pathological cases, the performance of a GPU-enabled application can be worse than the performance of a CPU-only solution.
OpenMP is reputedly easy to learn: simply add a handful of compiler directives to a code and you will end up with a parallel application. Although this may be broadly true, it does not tell the full story. OpenMP is indeed easier to learn than say, MPI, however this does not necessarily translate into “easier to program”. The result is often poor performance of hybrid MPI-OpenMP applications. This article looks at the most common pitfalls that programmers should be aware of and gives some tips as to how they can be avoided.
The HMPP Hybrid Compiler is a directive-based compiler to build parallel gpu accelerated applications.
Based on C and FORTRAN directives, HMPP offers a high-level abstraction for hybrid programming that fully leverages the computing power of stream processors without the complexity associated with GPU programming.