Computer Science and Engineering
menu MENU

Dissertation Defense

Rethinking Context Management of Data Parallel Processors in an Era of Irregular Computing

Jonathan Beaumont
SHARE:

Rethinking Context Management of Data Parallel Processors in an Era of Irregular Computing

Data parallel architectures such as general purpose GPUs and those using SIMD extensions have become increasingly prevalent in high performance computing due to their power efficiency, high throughput, and relative ease of programming. They offer increased flexibility and cost efficiency over custom ASICs, and greater performance per Watt over multicore systems. However, an emerging class of irregular workloads threatens the continued ubiquity of these platforms as general solutions. Indirect memory accesses and conditional execution result in significantly underutilized hardware resources. The nondeterministic behavior of these workloads combined with the massive context size associated with data parallel architectures make it difficult to manage resources and achieve desired performance.

This dissertation explores new strategies for scheduling irregular computational tasks. Specifically, we characterize the performance loss associated with current thread block scheduling policies in GPU architectures and evaluate possible extensions to enable better performance. Common patterns exist in irregular workloads which allow the architecture to dynamically respond to changing execution conditions. We analyze how these strategies can entail high overhead in many-thread architectures due to their large context sizes and explore methods to limit this cost. Our solution is able to achieve significant increases in throughput with minor augmentations to traditional GPU architectures and full support for legacy software.

We further identify potential correctness issues when generalizing these strategies to heterogeneous multi-core SIMD systems. After presenting data motivating the support for context switching in these systems, we demonstrate how modifications to their runtimes guarantee correctness and propose simple extensions to the ISA which enable the benefits of these dynamic solutions.

Sponsored by

CSE

Faculty Host

Trevor Mudge