Efficient Data Movement for Accelerating Data-Intensive Applications
This event is free and open to the publicAdd to Google Calendar
Virtual Event: Zoom Passcode:203540
Abstract: We are witnessing the deployment of data-intensive applications across-the-board in the autonomous vehicle (AV) industry, financial sector, social network analysis, etc. The current surge of generated data coupled with the need to timely analyze it, has kept the computational demand for data-intensive applications growing. This has been partially accelerated with the adoption of massively parallel systems (e.g., GPUs, PIMs). However, these systems still remain hindered from reaching their full potential by the ever-increasing computational-to-memory bandwidth gap.
In this dissertation, we characterize the execution of data-intensive applications and identify key inefficiencies across the memory hierarchy including interconnects. Based on our observation, we architect (1) a heterogeneous interconnect data movement, (2) offloading of computations to the memory subsystem, and (3) profit-based on-chip cache usage. We first propose a compute-capable interconnect to process and fuse irregular commutative operations propagating to update the same destination. Then, we use multicast to support the one-to-many movement of data prevalent in graph-based workloads. Third, we deliver optimizations tailored to the spatio-temporal locality of applications. Either we propose fine-grained data movement in lieu of conventional cache line transfer or dedicate more on-chip resources depending on the spatio-temporal locality. Finally, we propose a hardware-software co-design that translates the two-way gather-scatter operations common in data-intensive applications (e.g., graph analytics) into a one-way offload operation. Our solution synergizes compiler-assisted annotations of gather-scatter primitives and a compute engine sitting near last-level cache to efficiently process offloaded primitives.