Computer Engineering Seminar

Physically Dense Server Architectures

Anthony Thomas Gutierrez

Distributed, in-memory key-value stores have emerged as one of

today's most important data center workloads. Being critical for

the scalability of modern web services, vast resources are

dedicated solely to key-value stores in order to ensure that quality

of service guarantees are met. These resources include: many

server racks to store terabytes"”possibly petabytes"”of key-value

data, the power necessary to run all of the machines, networking

equipment and bandwidth, and the data center warehouses used to

house the racks.

There is, however, a mismatch between the key-value store

software and the commodity servers on which it is run, leading to

inefficient use of resources. The primary cause of this inefficiency

is the overhead incurred from processing individual network

packets, which typically carry small payloads of less than a few

kilobytes, and require minimal compute resources. Thus, one of the

key challenges as we enter the peta-scale era is how to best adjust

to the paradigm shift from compute-centric data centers, to storage-

centric data centers.

This dissertation presents a hardware/software solution that

addresses the in- efficiency issues present in the modern data

centers on which key-value stores are currently deployed. First, it

proposes two physical server designs, both of which use 3D-

stacking technology and low-power CPUs to improve density and

efficiency. The first 3D architecture"”Mercury"”consists of stacks

of low-power CPUs with 3D- stacked DRAM, as well as NICs.

The second architecture"”Iridium"”replaces DRAM with 3D

NAND Flash to improve density.

The second portion of this dissertation proposes and enhanced

version of the Mercury server design"”called KeyVault"”that

incorporates integrated, zero-copy net- work interfaces along with

an integrated switching fabric. In order to fully utilize the

integrated networking hardware, as well as reduce the response

time of requests, a custom networking protocol is proposed. Unlike

prior works on accelerating key-value stores"”e.g., by completely

bypassing the CPU and OS when processing requests"”this work

only bypasses the CPU and OS when placing network payloads

into a process' memory. The insight behind this is that because

most of the overhead comes from processing packets in the OS

kernel"”and not the request processing itself"”direct placement of

packet's payload is sufficient to provide higher throughput and

lower latency than prior approaches. The need for complex

hardware or software is also eliminated.

Sponsored by

Professor Trevor N. Mudge