Dissertation Defense
Enabling Efficient Resource Utilization on Multitasking Throughput Processors
Add to Google Calendar
ABSTRACT: Graphics processing units (GPUs) are increasingly adopted in modern computer systems
beyond their traditional role of processing graphics to accelerating data-parallel applications.
The single instruction multiple thread (SIMT) programming model used by OpenCL and CUDA
enabled programmers to easily offload data-parallel kernels to the GPUs, and achieve performance
improvements by several orders of magnitude with large energy efficiency. As a result, many
supercomputers, cloud services, and data centers are utilizing GPUs as general-purpose throughput
processors. In these environments, enabling efficient resource utilization of GPUs with multitasking
becomes an important problem.
This thesis proposes a framework with hardware/software extensions to enable efficient resource
utilization on multitasking GPUs. The framework identifies characteristics coming from the SIMT
programming model or determines the best policy based on multiple candidates either at compile
time or using runtime software. It gathers runtime statistics using hardware performance counters
to guide the decisions in the runtime software, and implements individual mechanisms in hardware
with low overhead. The framework consists of three components in detail. First, the framework uses
compiler hints to improve resource utilization when running memory-intensive kernels alone, which
can provide synergistic improvement when multitasking is enabled. Second, the framework provides
a collaborative preemption mechanism for efficient preemptive multitasking, which can handle both
latency-sensitive and throughput-oriented applications. Lastly, this framework implements a dynamic
resource management scheme to maximize the resource utilization among multiple kernels.