Enabling Fairness in Cloud Computing Infrastructures
Add to Google Calendar
Cloud-scale datacenter management systems utilize techniques like virtualization and containerization to provide performance isolation while maximizing the utilization of the underlying hardware infrastructure. However, such techniques do not provide complete performance isolation as VMs and containers still compete for non-reservable shared resources (like caches, network, I/O bandwidth etc.) This becomes highly challenging to address in datacenter environments housing tens of thousands of VMs, causing degradation in application performance. Addressing this problem for production datacenters requires a non-intrusive scalable solution that 1) detects performance intrusion and 2) investigates both the intrusive VMs/containers causing interference, as well as the resource(s) for which the VMs are competing for 3) estimates the magnitude of performance intrusion 4) Mitigates the performance of applications that are suffering from QoS violations. In this context, we provide solutions that can detect performance intrusive VMs and identifies its root causes from among the arbitrary VMs and containers running in shared datacenters across 4 key hardware resources – network, I/O, cache, and CPU. We utilize robust statistical approaches that require no special profiling phases, standing in stark contrast to a wide body of prior work that assumes pre-acquisition of application level information prior to its execution. By detecting performance degradation and identifying the root cause VMs, containers and their metrics, we improve the performance outcomes of applications executing in large-scale datacenters.