Routing and Topology Reconfiguration for Networkson- Chips Runtime Health
Add to Google Calendar
As silicon technology evolves, chip multi-processor (CMP) and system-on-chip (SoC)
designs are dramatically changing from limited, robust and homogeneous logic blocks to integrating
billions of fragile transistors into complex and heterogeneous cores/IPs. This increased integration has
compelled architects to design resource-heavy, complex and power-hungry on-chip interconnects,
moving towards network-on-chip (NoC) structures. In addition, the waning reliability of silicon poses
a great threat to these communication structures as they could potentially be a single point of failure.
Further, the heterogeneity and fast time-to-market of upcoming devices makes it nearly impossible
to thoroughly verify NoC architectures and optimize them for power at design-time. Failure of NoC
architectures to meet correctness, reliability and power-budget requirements has detrimental effects
on the runtime operation of NoC-based CMPs and SoCs. Therefore, highly efficient detection and
reconfiguration mechanisms are becoming a key requisite to unlock the full potential of future CMPs
and SoCs. Such mechanisms can overcome both functional bugs that escaped design-time verification
and device failures due to an unreliable silicon substrate. Similarly, runtime reconfiguration solutions
can also be leveraged to optimize the communication paths dynamically; particularly, to minimize
power dissipation and prevent overheating of the NoC structures.
The goal of this dissertation is to develop mechanisms to mitigate threats to NoC runtime health.
The proposed solutions are based on monitoring the execution activity of NoCs in a localized and
distributed manner using lightweight checkers. Based on the events observed, routing scheme and
network topology are updated at runtime to avoid experiencing the same failures in future operation.
This thesis specifically focuses on three aspects of NoC runtime health: correct behavior to avoid
functional bugs, reliable execution to circumvent faults and power-aware reconfiguration to avert
overheating emergencies. The work presented in the thesis will enable designers to aggressively push
heterogeneity and time-to-market limits with respect to NoC design.