Dissertation Defense

Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis

David Devecsery

As software continues to evolve, software systems increase in complexity. With software
systems composed of many distinct but interacting components, today's system program-
mers, users, and administrators find themselves requiring automated ways to find, under-
stand, and handle system mis-behavior. Recent information breaches such as the Equifax
breach of 2017, and the Heartbleed vulnerability of 2014 show the need to understand and
debug prior states of computer systems.
In this thesis I focus on enabling practical entire-system retroactive analysis, allowing
programmers, users, and system administrators to diagnose and understand the impact of
these devastating mishaps. I focus primarly on two techniques. First, I discuss a novel
deterministic record, and replay system which enables fast, practical recollection of entire
systems of computer state. Second, I discuss optimistic hybrid analysis, a novel optimiza-
tion method capable of dramatically accelerating retroactive program analysis.
Record and replay systems greatly aid in solving a variety of problems, such as fault
tolerance, forensic analysis, and information providence. These solutions, however, assume
ubiquitous recording of any application which may have a problem. Current record and
replay systems are forced to trade-off between disk space and replay speed. This trade-off
viihas historically made it impractical to both record and replay large histories of system-level
computation. I present Arnold, a novel record and replay system which efficiently records
and replays years of computation on a commodity hard-drive. Arnold combines caching
with a unique process-group granularity of recording to produce both small, and quickly
recalled recordings. My experiments show that under a desktop workload, Arnold could
store 4 years of computation on a commodity 4TB hard drive.
Dynamic analysis is used to retroactively identify and address many forms of system
mis-behaviors including: programming errors, data-races, private information leakage, and
memory errors. Unfortunately, the runtime overhead of dynamic analysis has precluded
its adoption in many instances. I present a new dynamic analysis methodology called
optimistic hybrid analysis (OHA). OHA uses knowledge of the past to predict program
behaviors in the future. These predictions, or likely invariants are speculatively assumed
true by a static analysis. This creates a static analysis which can be far more accurate than
its traditional counterpart. Once this predicated static analysis is created, it is speculatively
used to optimize a final dynamic analysis, creating a far more efficient dynamic analysis
than otherwise possible. I demonstrate the effectiveness of OHA by creating an optimistic
hybrid backward slicer, OptSlice, and optimistic data-race detector OptFT. OptSlice and
OptFT are just as accurate as their traditional hybrid counterparts, but run on average 8.3x
and 1.6x faster respectively.
In this thesis I demonstrate that Arnold's ability to record and replay entire computer
systems, combined with optimistic hybrid analysis's ability to quickly analyze prior com-
putation, enable a practical and useful entire system retroactive analysis that has been pre-
viously unrealized.

Sponsored by

Peter Chen