In my PhD work I show that it is possible to run unmodified Python/NumPy code on modern GPUs. This is done by using the Bohrium runtime system to translate the NumPy array operations into an array based bytecode sequence. Executing these byte-codes on two GPUs from different vendors shows great performance gains. Scientists working with computer simulations should be allowed to focus on their field of research and not spend excessive amounts of time learning exotic programming models and languages. We have with Bohrium achieved very promising results by starting out with a relatively simple approach. This has lead to more specialized methods as I have shown with the work done with both specialized, and parametrizied kernels. Both have their benefits and recognizable use cases. We achieved clear performance benefits without any significant negative impact on overall application performance. Even in the cases where we were not able to gain any performance boost by specialization, the added cost, for kernel generation and extra bookkeeping, is minimal. Many of the lessons learned developing and optimizing the Bohrium GPU vector engine has proven to be valuable in a broader perspective, which has made it possible to generalize the developments and made them benefit the complete Bohrium project.
The Niels Bohr Institute, Faculty of Science, University of Copenhagen, 2015