Today’s increasingly heterogeneous and specialized hardware mandates the use of complex programming models such as data parallel languages (e.g., CUDA or OpenCL) or the development of specialized domain specific compilers. This makes adding accelerator support into a larger code base not only costly, but also introduces additional complexity that hinders long-term maintenance. With Polly-ACC, we introduce a new automatic compiler that brings us closer to the dream of automatically exploiting GPU accelerators without the need for manual re-targeting of existing code bases. Starting from a sequential program representation, we automatically generate a hybrid program that – in combination with a new data management system – transparently runs suitable code regions on the available accelerator. Our approach is almost regression free for a wide range of applications while improving a range of compute kernels as well as two complete applications – a Lattice Boltzmann simulation and a Cactus ADM solver for Einstein equations. Polly ACC is a first important step towards reducing the initial cost of accelerator usage and freeing software developer time for the development of new algorithms that can further increase the benefit of modern hardware. Polly-ACC is funded by the Swiss Platform for Advanced Scientific Computing (PASC) initiative.
Image: 3D rendering of a stencil execution on the GPU