Optimizing Spare Linear Algebra on Reconfigurable Architecture
This event is free and open to the publicAdd to Google Calendar
Virtual dissertation defense (Passcode: 177456)
The rise of cloud computing and deep machine learning in recent years have led to a tremendous growth of workloads that are not only large, but also have highly sparse representations. A large fraction of machine learning problems are formulated as sparse linear algebra problems in which the entries in the matrices are mostly zeros. Not surprisingly, optimizing linear algebra algorithms to take advantage of this sparseness is critical for efficient computation on these large datasets.
This thesis presents a detailed analysis of several approaches to sparse matrix-matrix multiplication, a core computation of linear algebra kernels. While the arithmetic count of operations for the nonzero elements of the matrices are the same regardless of the algorithm used to perform matrix-matrix multiplication, there is significant variation in the overhead of navigating the sparse data structures to match the nonzero elements with the correct indices. This work explores approaches to minimize these overheads as well as the number of memory accesses for sparse matrices. To provide concrete numbers, the thesis examines inner product, outer product and row-wise algorithms on Transmuter, a flexible accelerator that can reconfigure its cache and crossbars at runtime to meet the demands of the program. This thesis shows how the reconfigurability of Transmuter can improve complex workloads that contain multiple kernels with varying compute and memory requirements, such as the Graphsage deep neural network and the Sinkhorn algorithm for optimal transport distance. Finally, we examine a novel Transmuter feature: register-to-register queues for efficient communication between its processing elements, enabling systolic array style computation for signal processing algorithms.