Computer Engineering Seminar
Physically Dense Server Architectures
Add to Google Calendar
Distributed, in-memory key-value stores have emerged as one of
today's most important data center workloads. Being critical for
the scalability of modern web services, vast resources are
dedicated solely to key-value stores in order to ensure that quality
of service guarantees are met. These resources include: many
server racks to store terabytes"”possibly petabytes"”of key-value
data, the power necessary to run all of the machines, networking
equipment and bandwidth, and the data center warehouses used to
house the racks.
There is, however, a mismatch between the key-value store
software and the commodity servers on which it is run, leading to
inefficient use of resources. The primary cause of this inefficiency
is the overhead incurred from processing individual network
packets, which typically carry small payloads of less than a few
kilobytes, and require minimal compute resources. Thus, one of the
key challenges as we enter the peta-scale era is how to best adjust
to the paradigm shift from compute-centric data centers, to storage-
centric data centers.
This dissertation presents a hardware/software solution that
addresses the in- efficiency issues present in the modern data
centers on which key-value stores are currently deployed. First, it
proposes two physical server designs, both of which use 3D-
stacking technology and low-power CPUs to improve density and
efficiency. The first 3D architecture"”Mercury"”consists of stacks
of low-power CPUs with 3D- stacked DRAM, as well as NICs.
The second architecture"”Iridium"”replaces DRAM with 3D
NAND Flash to improve density.
The second portion of this dissertation proposes and enhanced
version of the Mercury server design"”called KeyVault"”that
incorporates integrated, zero-copy net- work interfaces along with
an integrated switching fabric. In order to fully utilize the
integrated networking hardware, as well as reduce the response
time of requests, a custom networking protocol is proposed. Unlike
prior works on accelerating key-value stores"”e.g., by completely
bypassing the CPU and OS when processing requests"”this work
only bypasses the CPU and OS when placing network payloads
into a process' memory. The insight behind this is that because
most of the overhead comes from processing packets in the OS
kernel"”and not the request processing itself"”direct placement of
packet's payload is sufficient to provide higher throughput and
lower latency than prior approaches. The need for complex
hardware or software is also eliminated.