Dissertation Defense

Dependable Computing On Inexact Hardware Through Anomaly Detection

Daya Shanker Khudia

The reliability of transistors is on the decline with the continual trend of shrinking transistor size. Transistor sizing combined with aggressive voltage scaling makes the problem of reliability worse. Scaled-down transistors are more susceptible to transient faults as well as permanent in-field hardware failures. In order to reap the benefits of the continued technology scaling, challenges of decreasing reliability of devices should be tackled for mainstream commodity market at a very low cost. Along with the worsening reliability, achieving energy efficiency and performance improvement by scaling increasingly provides diminishing marginal returns. More than any other time in history, semiconductor industry faces the crossroad of unreliability and the need to improve energy efficiency.

These challenges of technology scaling can be tackled by categorizing the target applications in the following two categories: traditional applications that have relatively strict correctness requirement on outputs and emerging class of soft applications, from various domains such as multimedia, machine learning, and computer vision, that are inherently inaccuracy tolerant to a certain degree. Traditional applications can be protected against hardware failures by low-cost detection and protection methods while soft applications can trade off quality of outputs to achieve better performance or energy efficiency.

For traditional applications, I propose efficient software-only application analysis and transformation solution to detect data and control flow transient faults. The intelligence of data flow solution is garnered through the use of dynamic application information such as control flow, memory and value profiling. Control flow protection technique achieves its efficiency by simplifying signature calculations in each basic block and by performing checking at a coarse-grain level. For soft applications, I develop a quality control technique if these applications are run on an approximation accelerator. The developed solution employs continuous light-weight checking to ensure that the approximation is controlled and application output is acceptable. Overall, I have developed efficient and practical solutions that produce dependable results on commodity systems constructed from inexact hardware.

Sponsored by

Scott Mahlke

Faculty Host

Scott Mahlke