Hyperscale Data Centers Driving Next-Generation 400G Ethernet Interconnects

Rita Horner, Senior Technical Marketing Manager, Synopsys

The exponential growth of data traffic due to smartphones, media applications, video streaming, and the broad range of connected devices has driven the construction of high-capacity hyperscale data centers to quickly scale and respond to such dynamic workloads. 

According to Cisco’s 2018 Global Cloud Index Forecast, hyperscale data centers are projected to grow to 628 centers by 2021 (up from 338 centers at the end of 2016), representing 53% of all installed data center servers. The report also notes that by 2021, traffic within hyperscale data centers will be accounted for 55% of the total traffic within all data centers, and 94% of all workloads will be processed by cloud data centers versus only 6% by traditional data centers.

As hyperscale data centers transition to faster, flatter, and more scalable network architectures, such as the 2-tier leaf-spine as shown in Figure 1, the need for higher bandwidth with efficient connectivity increases. 

Figure 1: Leaf-and-spine architecture

The leaf-spine architecture requires massive interconnects as each leaf switch fans-out to every spine switch, maximizing connectivity between servers. Hardware accelerators, artificial intelligence, and deep learning functions in data centers all consume high bandwidth, forcing high-end data centers to quickly move to next generation interconnects operating at higher data rates. Because of this reason, majority of the hyperscale data centers using 100 Gb/s Ethernet links would need to transition to 200 Gb/s and 400 Gb/s Ethernet links to achieve higher throughput.

The move toward 400 Gb/s Ethernet promises both power and area savings, as 400 Gb/s optical modules are expected to consume only 2.5x the power of a 100 Gb/s Ethernet links and maintain the same small form factors, increasing interconnect densities.

With the completion of the 200 Gb/s and 400 Gb/s IEEE 802.3bs specification (now part of the IEEE 802.3-2018 release) and the completion of the 50 Gb/s, 100 Gb/s, and 200 Gb/s IEEE 802.3cd specifications, hyperscale data centers will start moving to 200/400 Gb/s interconnects by the end of 2018. By the recent announcement of vendor demonstrations and the expected availability of components and optical modules, this transition is inevitable.

As described in Figure 2, majority of 100/200/400 Gb/s Ethernet are based on multi-lane 25 Gb/s or 50 Gb/s interfaces. Next generation 100/200/400 Gb/s Ethernet rates will be based on the new 100 Gb/s serial specification that IEEE 802.3ck working group is defining, enabling beyond 400 Gb/s Ethernet.

Figure 2: Evolution of Ethernet speeds

As 100 Gb/s Ethernet matures in hyperscale data centers, the cost sensitive enterprise data center ecosystem will start taking advantage of the current generation technologies and begin their own transition from today’s 10 Gb/s and 40 Gb/s links to 100 Gb/s and higher rates.

Network engineers can choose interconnects based on application space, length requirements, density, form factor, power consumption, and available products. This article describes the different variants of 50 Gb/s, 100 Gb/s, 200 Gb/s, and 400 Gb/s interconnects that are based on single- and multi-lane 50 Gb/s Ethernet, including:

  • Electrical chip-to-chip and chip-to-module for connection to optical modules supporting multi-mode fiber (MMF) for rack-to-rack interconnect lengths from few hundred meters to longer kilometer (km) ranges can only be supported with single-mode fiber (SMF) for connection between buildings and smaller data centers
  • Shielded balanced copper cabling for short distances of few meters (m) targeting box-to-box and intra-rack connections
  • Electrical backplane for chassis-based systems for interconnects within the chassis

Chip-to-Chip and Chip-to-Module Interconnects

Chip-to-chip and chip-to-module (C2M) are the simplest forms of interconnects that consist of a short printed circuit board (PCB) trace with one or no connector. The chip-to-chip electrical interface is between two ICs on the same PCB plane, while the chip-to-module interface is between a port ASIC and a module device with a signal conditioning IC, such as a retimer placed on a separate PCB plane.

The IEEE 802.3 has defined the attachment unit interfaces (AUIs) based on 50 Gb/s per lane electrical for different types of optical modules. Depending on the interconnect length and throughput requirement, a network implementer may select different chip-to-module interfaces for connection to an optical module. For example:


50GAUI-1
is a single-lane C2M interface to:

  • 50GBASE-SR: supports 50 Gb/s serial transmission over a single lane (total of two fibers) MMF cabling with reach of up to at least 100 m
  • 50GBASE-FR: supports 50 Gb/s serial transmission over one-wavelength SMF cabling with reach of up to at least 2 km
  • 50GBASE-LR: supports 50 Gb/s serial transmission over one-wavelength SMF cabling with reach of up to at least 10 km

100GAUI-2 is a two-lane C2M interface to:

  • 100GBASE-SR2: supports 100 Gb/s transmission over two lanes (total of four fibers) of MMF cabling with reach up to at least 100 m
  • 100GBASE-DR: supports 100 Gb/s serial transmission over one-wavelength duplex SMF cabling with reach up to at least 500 m
     

200GAUI-4 is a four-lane C2M interface to: 

  •  200GBASE-SR4: supports 200 Gb/s transmission over four lanes (total of 8 fibers) of MMF cabling with reach up to at least 100 m
  • 200GBASE-DR4: supporting 200 Gb/s transmission over four lanes of SMF cabling with reach up to at least 500 m
  • 200GBASE-FR4: supports 200 Gb/s transmission over four wavelength division multiplexed (WDM) lanes (total of two fibers) of SMF cabling with reach up to at least
    2 km
  • 200GBASE-LR4: supports 200 Gb/s transmission over four WDM lanes of SMF cabling with reach up to at least 10 km
     

400GAUI-8 is an eight-lane C2M interface that includes:

  • 400GBASE-FR8: supports 400 Gb/s transmission over eight WDM lanes of SMF cabling with reach up to at least 2 km
  • 400GBASE-LR8: supports 400 Gb/s transmission over eight WDM lanes of SMF cabling with reach up to at least 10 km
  • 400GBASE-DR4: supports 400 Gb/s transmission over four lanes of SMF cabling with reach up to at least 500m


There are many optical modules used in Ethernet links. Ideally, the higher rate modules use the same form-factors as the previous generations for backward compatibility and ease of adoption. However, in the past, larger form-factor modules were initially introduced into the market. As technologies matured and form-factors optimized, larger form-factors were replaced with smaller, denser, and lower cost technologies. This was observed with the adoption of 25 Gb/s Ethernet links as well as the 10 Gb/s Ethernet links where the initial bulky modules were replaced by much smaller pluggable (SFP+) modules.

The module form-factors are defined by different multi-source agreements (MSAs). The higher density pluggable optical transceivers (Figure 4) that support 400 Gb/s Ethernet links are:
 

  • QSFP-DD (Quad small form-factor pluggable – double density) supports 8-lanes of 50 Gb/s PAM-4 per lane, providing solution up to 400 Gb/s Ethernet
  • OSFP (Octal small form-factor pluggable) supports 8x50 Gb/s Ethernet for 400 Gb/s Ethernet 

Image courtesy of QSFP-DD: http://qsfp-dd.com; image courtesy of OSFP: http://osfpmsa.org

Figure 3: 400 Gb/s transceiver form factors - the QSFP-DD (left) and OSFP (right) 

Shielded Balanced Copper Cabling

In addition to the C2C and C2M portside interfaces for optical cable connectivity, the IEEE 802.3 has also defined the electrical for 50 Gb/s, 100 Gb/s, and 200 Gb/s transmission over single- or multi-lane twinaxial copper cabling:
 

  • 50GBASE-CR: 50 Gb/s transmission over 1-lane of shielded twinaxial copper cabling with reach up to at least 3 m
  • 100GBASE-CR2: 100 Gb/s transmission over 2-lanes of shielded twinaxial copper cabling with reach up to at least 3 m
  • 200GBASE-CR4: 200 Gb/s transmission over 4-lanes of shielded twinaxial copper cabling, with reach up to at least 3 m


These shorter copper-cable interconnects are the most cost-effective cabling solution for connectivity within a rack. Such cables are used for server connection to uplink switches that may be mounted either at the top of the rack or in the middle of the rack to minimize interconnect lengths.

Electrical Backplane Interface

The more complex and larger switches and servers are chassis-based where there is a need for interconnect within a box. The IEEE 802.3cd defined the electrical for 50 Gb/s, 100 Gb/s, and 200 Gb/s transmission across backplanes:

 

  • 50GBASE-KR: 50 Gb/s transmission over 1-lane backplane channel with total insertion loss of less than 30 dB at 13.218125 GHz
  • 100GBASE-KR2: 100 Gb/s transmission over 2-lane backplane channel with total insertion loss of less than 30 dB at 13.218125 GHz
  • 200GBASE-KR4: 200 Gb/s transmission over 4-lane backplane channel with total insertion loss of less than 30 dB at 13.218125 GHz

Summary

The rising data rates required to process high-performance workloads such as deep learning and video streaming in hyperscale data centers are driving the need for faster, flatter, and more scalable network architectures operating at 400 Gb/s and beyond. The increased bandwidth demand is also driving changes in both Ethernet interconnects and PHY technologies. It is important for system and SoC designers to understand the characteristics of different types of interconnects and the PHY technologies for their target applications.

Designers need reliable interface IP that can support different 50 Gb/s Ethernet electrical interfaces in a single PHY, which is independent of the channel type, and is verified and licensed from a single IP vendor. This combination provides the flexibility, optimal cost, and ROI as well as short time-to-market windows. Synopsys’ silicon proven DesignWare 56G Ethernet PHY IP has the necessary features and capabilities to drive and aggregate all different 50 GE/100 GE/200 GE/ 400 GE interfaces necessary for ASICs applications in hyperscale data centers.