Cooperative Data and Computation Partitioning for Decentralized Architectures
Add to Google Calendar
Scalability of future wide-issue processor designs is severely hampered by the use of centralized resources such as register files, memories and interconnect networks. While the use of centralized resources eases both hardware design and compiler code generation efforts, they can become performance bottlenecks as access latencies increase with larger designs. The natural solution to this problem is to adapt the architecture to use smaller, decentralized resources. Decentralized architectures use smaller, faster components and exploit distributed instruction-level parallelism across the resources to increase performance. While these processors can be more efficient and scalable, they introduce many difficulties to the compiler, which must carefully take the distributed resources into account.
In this talk, I will describe my work on compiler code generation technology for decentralized architectures. Code generation is one of the most difficult challenges in achieving high performance in the presence of decentralized resources. First, I will present two techniques for partitioning data accesses across distributed data memories. Next, I will present an efficient, region-based algorithm for partitioning computation operations across multiple processing elements. Finally, I will show how these techniques combine to form a compiler method that extracts data-aware fine-grain threads of computation to effectively exploit decentralized architectures.