TY - GEN
T1 - Mapping parallel programs to heterogeneous CPU/GPU architectures using a Monte Carlo Tree Search
AU - Goli, Mehdi
AU - McCall, John
AU - Brown, Christopher
AU - Janjic, Vladimir
AU - Hammond, Kevin
PY - 2013/7/15
Y1 - 2013/7/15
N2 - The single core processor, which has dominated for over 30 years, is now obsolete with recent trends increasing towards parallel systems, demanding a huge shift in programming techniques and practices. Moreover, we are rapidly moving towards an age where almost all programming will be targeting parallel systems. Parallel hardware is rapidly evolving, with large heterogeneous systems, typically comprising a mixture of CPUs and GPUs, becoming the mainstream. Additionally, with this increasing heterogeneity comes increasing complexity: not only does the programmer have to worry about where and how to express the parallelism, they must also express an efficient mapping of resources to the available system. This generally requires in-depth expert knowledge that most application programmers do not have. In this paper we describe a new technique that derives, automatically, optimal mappings for an application onto a heterogeneous architecture, using a Monte Carlo Tree Search algorithm. Our technique exploits high-level design patterns, targeting a set of well-specified parallel skeletons. We demonstrate that our MCTS on a convolution example obtained speedups that are within 5% of the speedups achieved by a hand-tuned version of the same application.
AB - The single core processor, which has dominated for over 30 years, is now obsolete with recent trends increasing towards parallel systems, demanding a huge shift in programming techniques and practices. Moreover, we are rapidly moving towards an age where almost all programming will be targeting parallel systems. Parallel hardware is rapidly evolving, with large heterogeneous systems, typically comprising a mixture of CPUs and GPUs, becoming the mainstream. Additionally, with this increasing heterogeneity comes increasing complexity: not only does the programmer have to worry about where and how to express the parallelism, they must also express an efficient mapping of resources to the available system. This generally requires in-depth expert knowledge that most application programmers do not have. In this paper we describe a new technique that derives, automatically, optimal mappings for an application onto a heterogeneous architecture, using a Monte Carlo Tree Search algorithm. Our technique exploits high-level design patterns, targeting a set of well-specified parallel skeletons. We demonstrate that our MCTS on a convolution example obtained speedups that are within 5% of the speedups achieved by a hand-tuned version of the same application.
KW - Heterogeneous Architecture
KW - Heuristic Algorithm
KW - Montecarlo Tree Search
KW - Parallel Programming
KW - Static Mapping
UR - http://www.scopus.com/inward/record.url?scp=84881588847&partnerID=8YFLogxK
U2 - 10.1109/CEC.2013.6557926
DO - 10.1109/CEC.2013.6557926
M3 - Conference contribution
AN - SCOPUS:84881588847
SN - 9781479904532
SP - 2932
EP - 2939
BT - 2013 IEEE Congress on Evolutionary Computation, CEC 2013
PB - IEEE
T2 - 2013 IEEE Congress on Evolutionary Computation, CEC 2013
Y2 - 20 June 2013 through 23 June 2013
ER -