Abstract
This paper presents a new technique for introducing and tuning parallelism for heterogeneous shared-memory systems (comprising a mixture of CPUs and GPUs), using a combination of algorithmic skeletons (such as farms and pipelines), Monte–Carlo tree search for deriving mappings of tasks to available hardware resources, and refactoring tool support for applying the patterns and mappings in an easy and effective way. Using our approach, we demonstrate easily obtainable, significant and scalable speedups on a number of case studies showing speedups of up to 41 over the sequential code on a 24-core machine with one GPU. We also demonstrate that the speedups obtained by mappings derived by the MCTS algorithm are within 5–15% of the best-obtained manual parallelisation.
Original language | English |
---|---|
Pages (from-to) | 583-602 |
Number of pages | 20 |
Journal | International Journal of Parallel Programming |
Volume | 48 |
DOIs | |
Publication status | Published - 10 Jun 2020 |
Keywords
- Heterogeneous Architecture
- Monte Carlo Method
- Refactoring
- Scheduling algorithm