TY - GEN
T1 - Learning-Based Dynamic Pinning of Parallelized Applications in Many-Core Systems
AU - Chasparis, Georgios C.
AU - Janjic, Vladimir
AU - Rossbory, Michael
AU - Hammond, Kevin
PY - 2019/3/21
Y1 - 2019/3/21
N2 - This paper introduces a learning-based framework for dynamic placement of threads of parallel applications to the cores of Non-Uniform Memory Access (NUMA) architectures. Adaptation takes place in two levels, where at the first level each thread independently decides on which group of cores (NUMA node) it will execute, and on the second level it decides to which particular core from the group it will be pinned. Naturally, these two adaptation levels run on different time-scales: a low-frequency switching for the NUMA-node adaptation, and a high-frequency switching for the CPU-node level adaptation. In addition, the learning dynamics have been designed to handle measurement noise and rapid variations in the performance of the threads. The advantage of the proposed learning scheme is the ability to easily incorporate any multi-objective criterion and easily adapt to performance variations during runtime. Our objective is to demonstrate that this framework is appropriate for supervising parallel processes and intervening with respect to better resource allocation. Under the multi-objective criterion of maximizing total completed instructions per second (i.e., both computational and memory-access instructions), we compare the performance of the proposed scheme with the Linux operating system scheduler. We have observed that performance improvement could be significant especially under limited availability of resources and under irregular memory-access patterns.
AB - This paper introduces a learning-based framework for dynamic placement of threads of parallel applications to the cores of Non-Uniform Memory Access (NUMA) architectures. Adaptation takes place in two levels, where at the first level each thread independently decides on which group of cores (NUMA node) it will execute, and on the second level it decides to which particular core from the group it will be pinned. Naturally, these two adaptation levels run on different time-scales: a low-frequency switching for the NUMA-node adaptation, and a high-frequency switching for the CPU-node level adaptation. In addition, the learning dynamics have been designed to handle measurement noise and rapid variations in the performance of the threads. The advantage of the proposed learning scheme is the ability to easily incorporate any multi-objective criterion and easily adapt to performance variations during runtime. Our objective is to demonstrate that this framework is appropriate for supervising parallel processes and intervening with respect to better resource allocation. Under the multi-objective criterion of maximizing total completed instructions per second (i.e., both computational and memory-access instructions), we compare the performance of the proposed scheme with the Linux operating system scheduler. We have observed that performance improvement could be significant especially under limited availability of resources and under irregular memory-access patterns.
UR - http://www.scopus.com/inward/record.url?scp=85063880618&partnerID=8YFLogxK
U2 - 10.1109/EMPDP.2019.8671569
DO - 10.1109/EMPDP.2019.8671569
M3 - Conference contribution
AN - SCOPUS:85063880618
SN - 9781728116457
T3 - 2019 27TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP)
SP - 1
EP - 8
BT - Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019
PB - IEEE
T2 - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019
Y2 - 13 February 2019 through 15 February 2019
ER -