CloudLab Hardware

CloudLab will be distributed infrastructure, building clusters at three sites. (We hope to add more in the future.) Each site will be a variation on a “reference” architecture. The reference architecture comprises approximately 5,000 cores and 300–500 Terabytes of storage in the latest virtualization-capable hardware. CloudLab will provide 2x10 Gbps network interfaces to every node via software-defined networking (at least OpenFlow, and we hope to provide other SDN technologies as well.) A 100 Gbps full-mesh SDN interconnect lets researchers instantiate a wide range of in-cluster experimental topologies, e.g., fat trees, rings, hypercubes, etc. Each site will leverage CC-NIE infrastructure to provide at least one connection to AL2S, the SDN-based 100 Gbps network that is part of Internet2's Innovation Platform; this will enable high-speed, end-to-end SDN between all CloudLab sites. CloudLab will provide two major types of storage: per-server storage (a mix of high-performance flash and high-capacity magnetic disks at a ratio of about 1 disk per every 4 cores), and a centralized storage system. This storage mix enables a range of experiments with file systems, storage technologies, and big data, while providing convenient, reliable file systems to researchers who are not interested in storage experiments. Our reference infrastructure is sized to be large enough to enable valuable experiments (dozens of concurrent small experiments, or a few medium-large ones).

Diverse

We are building CloudLab with a different commercial partner at each site. This gives us diversity: while sharing a common reference architecture, each site has a slightly different focus and implementation of similar concepts. This means that researchers can evaluate whether the behavior of their systems is tightly bound to a particular realization of the architecture, or whether their findings are more universal. It also ensures that the needs of particular research communities (for example, storage or green computing) are specifically addressed by at least one cluster.

Open and interoperable

CloudLab will be federated with a wealth of existing research infrastructure, giving users access to a diverse set of hardware resources at dozens of locations. It will be a member of the GENI federation, meaning that GENI users can access CloudLab with their existing accounts, and CloudLab users have access to all of the hardware resources federated with GENI.

CloudLab sites interconnect with each other via IP and Layer-2 links to regional/national research networks using techniques now being adopted by many campuses under the NSF CC-NIE/CC-IIE program. Thus CloudLab experiments can also connect at Layer-2 to the core GENI Network, US Ignite cities, and advanced HPC clusters across the US. A single experiment can span all of these resources: in addition to CloudLab's own clusters, it might include GENI Racks (small clusters distributed across the United States), local fiber in a US Ignite city (cities with advanced high-speed municipal networks), and cyber-physical systems such as the UMASS CASA distributed weather radar system.

Canvas 1 Map Other GENI Sites CloudLab Sites

This hardware is now available for use!

This represents the first half of the Utah cluster; the remainder will be built in about a year

The University of Utah is partnering with HP to build a cluster with 64-bit ARM processors and OpenFlow 1.3 support throughout. This clusters will consist of 7 HP Moonshot Chassis, each having 45 8-core ARM servers (315 servers, 2,520 cores total) with 64 GB of RAM (20 TB total), 120 TB of SATA flash storage (38 TB total). Each chassis has two "top of rack" (ToR) switches, and each server has two 10Gb NICs, one connected to each of the ToRs. Each ToR has 4x 40 Gb of uplink capacity to a large core switch, for a total of 900 Gbits of connectivity within the chassis and 320 Gbits of connectivity to the core. One option for allocation will be to allocate an entire chassis at a time; when allocated this way, the user will have complete administrative access to the ToR switches in addition to the nodes. Users allocating entire chassis will also be given administrative access to a "slice" of the core switch using MDC, which gives the user a complete virtual switch, including full control over layer 2 and 3 features and a dedicated OpenFlow datapath.

The specifics of the hardware are:

m400 nodes: 45 per chassis, 315 total
  • Processor/Chipset: Applied Micro X-Gene system-on-chip
  • Eight 64-bit ARMv8 (Atlas/A57) cores at 2.4 GHz
  • MSlim cluster: 4 additional ARM A5 processors at 500 MHz
    • Security co-processor for encryption acceleration
    • Packet DMA engine with advanced options such as checksumming during DMA transfer
  • 64 GB of ECC RAM (8x 8 GB DDR3-1600 SO-DIMMs)
    • 4 memory channels
  • 120 GB of flash (SATA3 / M.2, Micron M500, hardware AES-256 encryption)
  • Dual-port Mellanox ConnectX-3 10 GB NIC (connected via PCIe v3.0, 8 lanes)
    • Supports SR-IOV
    • Supports RoCE
    • TCP/UDP/IP offload
  • Rated about 50W for each cartridge
  • Out-of-band management (serial console and power control) for each m400 individually through iLO
  • Power and temperature monitoring available for all nodes individually
45XGc switches: 2 per chassis, 14 total
  • 45 10Gb ports (one to each m400 in the chassis)
  • 4x40Gb ports (used for uplink to the core)
  • Latency under 1 microsecond (cut-through)
  • 128K entry MAC table, 16K entry IPv4 routing table
  • Supports DCB, FCoE, IPv4, IPv6, TRILL, 802.1Qbg
  • QoS with eight traffic classes
  • Runs Comware 7
  • OpenFlow 1.3.1 supported
  • 9 MB packet buffers
  • Connected to m400s via in-chassis traces
Core switch: HP 12910
  • 4 24-port 40 Gb FC-series linecards (96 ports total) (JG889A)
  • 5 "Type B" 3.84 Tbps fabric modules
  • 12K TCAM entries
  • Latency: 6 to 16 microseconds (store and forward)
  • VOQ queuing
  • Runs Comware 7
  • OpenFlow 1.3 (coming soon)
  • Each Moonshot switch is connected to its own ASIC on the linecard
  • 4GB packet buffer per ASIC
  • Connected to to Moonshot switches via copper direct-attach cables

This hardware is now available for use!

This represents the first half of the Wisconsin cluster; the remainder will be built in about a year

The University of Wisconsin-Madison is partnered with Cisco Systems to build a powerful and diverse cluster that closely reflects the technology and architecture used in modern commercial data centers. The initial cluster will have 100 servers with a total of 1,600 cores connected in a CLOS Fat-Tree topology. Future acquisitions in 2015 and 2016 will grow the system to at least 240 servers. The servers are currently broken into two categories, each offering different capabilities and enabling different types of cloud experiments. In the initial cluster all servers will have the same CPU (2x 8C @ 2.4GHz), RAM (128GB), and network (2x 10Gbps to ToR) configuration, but will differ in their storage configurations. Each of the ninety servers in the first category will have 2x 1.2TB disks. We expect these to be used for experimenting with exciting new cloud architectures and paradigms, management frameworks, and applications. Each of the ten servers in the second category will have a larger number (1x 1TB, 12x 3TB donated by Seagate) of slower disks. This category is targeted toward supporting experiments that stress storage throughput. Each server will also have an SSD (480GB) to enable sophisticated experiments that explore storage hierarchies in the cloud.

The servers use Nexus switches from Cisco for top-of-rack (ToR) switching. Each ToR Nexus is connected to six spine switches via dedicated 40Gbps links. Each spine will connect via a 40Gbps link to a Nexus WAN switch for campus connectivity to Internet2 and the other two CloudLab facilities. We selected the Cisco Nexus series because it offers several unique features that enable broad and deep instrumentation, as well as a wide variety of cloud networking experiments. Examples of these features include OpenFlow 1.0; monitoring instantaneous queue lengths in individual ports; tracing control plane actions at fine time-scales; and support for a wide-range of routing protocols.

The specific details of the hardware are as follows:

Compute/Storage: 90x Cisco UCS SFF 220 M4, 10x Cisco UCS LFF 240 M4 nodes: 100 total

  • Chipset: Intel C610 series chipset
  • Processor: 2x Intel E5-2630 v3 85W 8C at 2.40 GHz for a total of 16 cores (plus Hyper-Threading)
    • Haswell architecture with EM64T instruction set
    • Each core has 32KB L1 instruction and data caches
    • Each core has a 256KB L2 cache
    • All cores share a 20MB L3 cache
    • Support for VT-x/VT-d virtualization, AVX2 vector and SSE4.2 SIMD extensions, and AES instructions
  • RAM: 128 GB of ECC RAM (8x 16 GB DDR4 2133 MHz PC4-17000 dual rank RDIMMs)
    • 4 memory channels
  • Storage: 525 TB Total
    • Each 220 M4 node have the following disk configuration for a total of 150 TB:
      • 2x 1.2 TB 10K RPM 6G SAS SFF HDD
      • 1x 480 GB 6G SAS SSD
    • Each 240 M4 node have the following disk configuration for a total of 375 TB
      • 1x 1 TB 7.2K RPM SAS 3.5" HDD
      • 1x 480 GB 6G SAS SSD
      • 12x 3 TB 3.5” HDD donated by Seagate
  • Network: Each 220 M4 and 240 M4 have:
    • Cisco UCS VIC1227 VIC MLOM - Dual Port 10Gb SFP+ (connected via PCIe v3.0, 8 lanes)
    • Supports SR-IOV, VMQ, Netqueue and up to 256 virtual adapters.
    • Onboard Intel i350 1Gb
  • Out-of-band management (serial console and power control) for each node individually through mLOM
  • Power and temperature monitoring available for all nodes individually

Network

  • Cisco Nexus C3172PQ Leaf Switch
    • 48 SFP+ 10G Ethernet Ports
    • 6 QSFP 40G Ethernet Ports
    • 1.4-Tbps switching capacity
    • Forwarding rate of up to 1 Bpps
    • Ultra-low latency cut-through switching technology
  • Cisco Nexus C3132Q Spine Switch
    • 32 QSFP 40G Ethernet Ports
    • 2.5 Tbps switching capacity
    • Forwarding rate up to 1.4 Bpps
    • Ultra-low latency cut-through switching technology

The Clemson system, developed in close cooperation with Dell, will have three major components: bulk block storage, low density storage for MapReduce/Hadoop-like computing, and generic VM nodes used to provision virtual machines. All nodes have 16 cores per node (2 CPUs), one on-board 1 Gb Ethernet and a dual port 56/40/10 GBps card. Bulk storage nodes will provide block level services to all nodes over a dedicated 10 Gb/s Ethernet. Storage nodes will consist of 12x4 TB disk drives in each node, plus 8x1 TB disks in each node. The nodes are configured with 256 GB of memory. Hadoop nodes have 4x1 TB disk and 256 GB of memory. VM nodes have 256 GB memory each. The large memory configuration reflects the need for significant memory in VMs today and allows use to increase performance by reduced paging/swapping in the VMs.

The focus of this system will be to provision significantly sized environments that can be linked to national and international resources. It will also be able to directly connect to Clemson Condo HPC system with nearly 2000 nodes. The interactions between bare metal systems in the Condo cluster with VMs in the cloud system will allow prototyping of next generation HPC and CS environments in a SDN-enabled network.

CloudLab users will have access to all hardware that is is federated with the GENI testbed. This comprises thousands of cores and hundreds of terabytes of storage across, spread across dozens of sites across the country. This will help CloudLab users build highly-distributed infrastructures, such as CDN-type services, and it will enable applications that require that low latency to end users and devices. This federation also provides access to a variety of resources that go beyond simple clusters, such as campus-scale wireless networks.

Most of this equipment is interconnected (and connected to CloudLab) through a programmable layer 2 network.