Dissertation Defense
Energy Efficient Heterogeneous Processor Architectures for General Purpose Applications
Add to Google Calendar
In light of the end of Dennard scaling, significant design changes in the core microarchitecture are critical to enable continued performance scaling. Rising transistor numbers have allowed for the emergence of general-purpose heterogeneous multicore architectures, which combine Out-of-Order (OoO) and In-Order (InO) cores. InO cores are popular for their low relative area and energy costs. However, their utility is impeded by the faster OoO cores, with their ability to speculatively reorder instructions and create highly optimized instruction schedules. Existing systems run high-performance application phases on OoO cores that consume more energy, and migrate to slower InO cores primarily for the non-critical phases, limiting potential energy savings. Higher energy-savings can be achieved by utilizing the low-power InO more frequently. This thesis offers better trade-offs between energy-efficiency and performance using architectures that harness heterogeneity to improve InO performance and thus utilization.
This thesis observes that a majority of the dynamically reordered instruction schedules that give the OoO its performance advantage tend to be repetitive. Since schedule generation consumes significant energy, it is wasteful to recreate identical schedules on the OoO. Moreover, the thesis finds that the InO core, with some modifications, can attain near OoO performance by executing dynamically reordered code. The proposed DynaMOS architecture exploits heterogeneity by creating optimal schedules on the OoO, and reusing them on the InO, thereby achieving high-performance at a fraction of OoO's energy costs. The thesis further proposes the Mirage multi-core architecture that allows schedule migration between one OoO and many InO cores, thus increasing system throughput within the same energy and area budget. Finally this thesis describes scheduling techniques that identify repeatable low-performance application phases at the frequency of a few hundred instructions. By predicting the onset of an oncoming low-performance phase, the scheduler can preemptively migrate to the InO for higher energy savings.