Cloud native EDA tools & pre-optimized hardware platforms
Mingchi Liu, Sr. Staff Technical Marketing Manager, Synopsys
Reducing power while boosting performance at the same time is a big challenge in advanced process nodes. As the process geometry shrinks, dynamic and leakage power scale differently. However, innovations in process, IP and system-level implementation are addressing this challenge. For example, IP designers are developing new foundation IP with long channel length, low VDDmin with read/write assist for SRAM, back bias and different low power modes to reduce power. The latest innovation involves using embedded MRAM (eMRAM) to reduce both system-on-chip (SoC) and system power. This article explains how utilizing eMRAM can dramatically reduce power for SoC designs targeting 22nm and smaller.
Embedded Magnetoresistive Random Access Memory (eMRAM) is different from conventional embedded memories like SRAM and Flash, where electric charge is used to store information. eMRAM uses its spin to store data (i.e., “spintronics.”) The heart of eMRAM’s spintronic nature is made of many ferromagnetic and nonmagnetic materials called as magnetic tunnel junction (MTJ). MTJ can hold its polarization virtually forever when unpowered, making eMRAM a type of non-volatile memory (NVM) along with Flash, FeRAM, and EEPROM.
To system designers in the 1960s and 1970s, the standard way of thinking was “look to volatile memories when speed and density are important but power is not” and “look to non-volatile memories when low power is important but density and speed are not.” However, today’s memory landscape includes so many options that making such a strict distinction is no longer possible. The “ideal memory” for any given design can combine the strengths of many technologies. One such candidate is the next generation of eMRAM, based on spin-torque technology, known as STT-MRAM. eMRAM promises to combine the nonvolatility of Flash, the density of DRAM, the speed of SRAM, and write endurance not found in any other existing memory technology.
There are a few memory technologies used in advanced-node SoC designs including SRAM, eFlash, eMRAM, PCRAM and ReRAM (Table 1). As Moore’s law continues, eFlash development is slowing down on advanced nodes and currently stopping at 28nm. The only way to utilize Flash is die stacking, or system-in-package (SiP), at 22nm and below. eMRAM is a promising candidate to replace SRAM and flash over PCRAM and ReRAM. When comparing eMRAM to SRAM, eMRAM provides smaller area, lower dynamic power, lower leakage, higher capacity, better radiation immunity, lower cost and is non-volatile. Comparing eMRAM to PCRAM and ReRAM, eMRAM has a simpler manufacturing process, longer endurance, and production-level yields. Compared to external flash, eMRAM has smaller form factor at the system level, higher performance, longer battery life, SRAM like interface, better user experience, shorter system design turnaround time (TAT), higher yield, predictable product cost and stable supply to avoid flash shortages due to the nature of flash market. Compared to embedded flash, eMRAM enables designs to align with Moore’s law in advanced nodes from 22nm to FinFET processes.
|
SRAM |
DRAM |
FLASH (NAND) |
PCRAM |
RRAM |
MRAM (STT-MRAM) |
Architecture |
Planar |
Discrete 3D |
Monolithic 3D |
Planar |
Planar |
Planar |
Device |
6T |
1T/1C |
1T |
1T ǁ 1BJT/1R |
1T ǁ 1BJT/1R |
1T/1MTJ |
Feature Size |
7nm |
18nm |
19nm |
20nm |
27nm |
40nm |
Cell size |
40-60F2 |
6-8F2 |
4F2 |
4F2 |
4-6F2 |
8-14F2 |
Capacity |
16Mb |
16Gb/Die |
1Tb/Die |
16Gb |
16Gb |
1Gb |
Endurance |
∞ |
1016 |
105 |
109 |
109 |
1016 |
Write energy |
8pW/bit/Mhz |
100fJ/bit |
10fJ/bit |
5pJ/bit |
5pJ/bit |
5pJ/bit |
Leakage |
|
|
~0.8x |
0.01x |
|
0.01x |
Cost |
1x |
0.1x-1x |
0.01x |
0.1x |
0.1x |
1x |
Source: Synopsys
Table 1: Comparison of standard memory options.
While SRAM bitcell consists of 6 transistors, eMRAM bitcell requires only 1 transistor, resulting in a dramatically smaller area. As modern SoC designs require more memory, smaller area is of greater importance. The percentage of SRAM area can be 30% to 45% of an SoC die. In the case of frame buffer applications, the area can grow as high as 50%. For AI applications, it can be up to 70% of the die. AI applications can reduce the area required for memory by 25% when using eMRAM instead of SRAM. eMRAM is a perfect fit for applications with a large memory requirement.
Figure 1: Comparing SRAM and eMRAM bitcell architectures
When there is a write operation in SRAM, there are total 6 active transistors (Figure 1). Up to four transistors are flipped if an existing bit cell value is the opposite of the write content. Two of the pass gates must be turned on to allow data to flow from the bit line content into the latch. On the other hand, eMRAM only requires one transistor for both read/write operations, resulting in lower dynamic power. In addition, leakage power in SRAM can occur in both the array and peripheral logic such as column/row decoders, word line driver, sense amp, read/write assist circuit, level shifter, power gating cell, self-timing path etc. For eMRAM, the array is in an off state, so any leakage would be contributed only from peripheral logic. No power supply is needed to maintain content in MRAM.
This is a great news to digital designer who no longer needs to take traditional approaches to reduce standby power such as utilizing deep sleep and array back bias. SRAM needs a few steps to enter and exit its retention (deep sleep) mode, resulting in a longer response time and more power consumption if the duration of sleep time is not long enough. SRAM array bias also adds cost and complexity to SoC design by requesting external power supplies. Using eMRAM arrays, designers can expect lower leakage.
eMRAM is available from many foundries as, in a given process technology, it is much simpler to develop than RRAM or PCRAM. For example, as shown in Figure 2, only three additional masks are needed for eMRAM in back end of line (BEOL) processes. The front end of line (FEOL) process is the same as what we have today, which makes IP development much easier. Also, independent, non-embedded MRAM chips are currently available. The market acceptance of eMRAM is a far ahead of RRAM.
Figure 2: MTJ cell for STT-MRAM. Source: Lam Research
eMRAM’s non-volatile characteristic is a perfect fit for low power-oriented design or battery powered IoT applications. If data needs to be reused after sleep or power down, the CPU first needs to flush SRAM data to Flash memory. When powering back, the CPU reads the data again. Using Flash memory requires twice the operation time to charge and discharge the resistor and capacitor along the path as well as along the IOs access. For example, when charging RC from 0 to 1, roughly 50% of the energy is converted and the rest is dissipated through heat. But when discharging RC from 1 to 0, 100% of the energy is totally wasted. By utilizing eMRAM, the memory does not need to go through the charge/discharge process, resulting in lower system-level power consumption (Figure 3).
Figure 3: System power comparison for different combinations. Source: Qualcomm & TDK, IEDM, 2015
The maximum capacity of eMRAM can be as high as 1Gb while SRAM’s maximum capacity normally is around 2Mb per macro. A single die can contain more memory with eMRAM, or a design with eMRAM can reduce its die size while keeping the same amount of memory as if it had used SRAM.
SRAM bitcells are vulnerable to alpha particle attacks. The capacitance inside an SRAM bit cell is very small compared to its logic. Therefore, a less charged deposit caused by radiation can alter the value stored in a bit cell, resulting in soft error. eMRAM uses MTJ (Figure 2) to store the data and is naturally immune to radiation. Considering peripheral circuit around MRAM bit cell, the overall resistance to radiation is much higher for MRAM.
Applications such as smart phones, wireless audio earbuds, and wearables require smaller form factors to provide flexibility for more stylish designs or to save space for a bigger battery. When using Flash SiP, the height of chip can’t be reduced, or the size of PCB becomes bigger if not using SiP. By using eMRAM, a designer can use flip chip packaging. Flip chip has the smallest height among all packaging choices resulting in low IR drop characteristics, which improve performance and are extremely important for SoC design. For applications requiring frequent firmware updates, eMRAM can store boot code and intermediate data generated during operation such as GPS satellite maps, sensor data from engines, etc. Storing this data offers a smoother consumer experience. eMRAM performance is much higher than flash, resulting in higher chip performance or faster remote firmware updates.
eMRAM utilizes SRAM interface without the need for a SPI interface. Using eMRAM doesn’t require a new bus protocol. Digital designers can easily integrate eMRAM macros just like regular SRAM as shown in Figure 4.
Figure 4: Converting an SoC with on-chip SRAM and external Flash to a non-volatile SoC with eMRAM
Automotive microcontroller units (MCUs) need embedded memory, and typical MCUs have used embedded flash. But embedded flash is not currently available in 22nm and below, preventing MCU designers from leveraging the benefits of smaller geometry processes. eMRAM is a perfect solution for MCU designers looking to move to advanced nodes. It is stable and can meet automotive temperature grade requirements.
MRAM has advanced into the embedded space and is replacing SRAM’s diverse configurations. Synopsys offers eMRAM compiler IP instead of hard macros. Using eMRAM compilers offers designers just-in-time compilation of eMRAM hard macros within a few minutes. By providing a full front-end view of an eMRAM instance from the compiler, a designer can evaluate and kick off the design immediately. This greatly reduced TAT and enables faster time to market.
Synopsys TestMAX STAR Memory System® (SMS) solution tests, repairs and diagnoses both on-chip memories (single/dual/two/multiport RAM/Register File/ROM including CPU and GPU cache, CAM, eflash) and off chip memories (DDR/LPDDR/HBM). By collaborating with leading foundries, Synopsys has augmented SMS to include algorithms specific to eMRAM architectures with trimming/calibration capabilities. Synopsys also offers an ISO 26262 certified STAR ECC solution that can be leveraged to improve manufacturing yield of eMRAM as well as improve in-field reliability for memories in application areas such as automotive, military and aerospace. The SMS solution for eMRAM is silicon validated and offers the capabilities such as at-speed test, high test coverage with march algorithms, and programmability via JTAG. The STAR Memory System's eMRAM algorithms target failure mechanisms of embedded MRAM and other types of non-volatile memories during production and in-field test. Support for multiple background patterns and complex addressing modes accelerates automated test equipment (ATE) vector generation, resulting in the highest test coverage for eMRAM, maximized manufacturing yield, and improved SoC reliability. In addition, augmented design acceleration capabilities in the STAR Memory System automate the test and repair logic's planning, generation, insertion, and verification steps for embedded MRAMs to reduce the overall integration effort.
While eMRAM technology has adequate endurance and read/write latencies, susceptibility to process variation can cause reliability issues. One of the drawbacks of MTJs bit cell is the small read window, i.e., the difference between high and low resistance states is typically just 2-3X. Sensing the value of an MTJ bit-cell is much more difficult than in an SRAM bit-cell. eMRAM switching is a stochastic process. This means that while reducing write current improves energy efficiency, it increases the probability of write errors with degraded yield.
To meet an acceptable eMRAM yield and maintain in-field reliability, designers need to implement a sophisticated error correcting code (ECC) solution. ECC math demonstrates that to achieve a certain chip failure rate (CFR), the memory bit error rate (BER) that foundries must achieve becomes increasingly stringent at larger array sizes. Assuming random defects for a 64Mb memory array size, an application targeting the most stringent automotive ASIL-D level (equivalent to SoC level FIT rate of 10) would need at least a DECTED (Double Error Correct, Triple Error Detect) level of ECC with foundry achievable levels of BER for the MTJ bit cell today. Figure 5 shows that if you need to achieve 99% yield of a 64Mb eMRAM macro, without ECC, foundry bit cell D0 needs to achieve under 0.1ppm. While adding 1bit or 2bit Error Correct ECC, bit cell D0 can be relaxed to 1ppm or 10ppm separately. STAR Memory System ECC automatically generates ECC Verilog code, testbenches and scripts for single-port and multiport eMRAM memories. This can greatly improve eMRAM macro yield.
Figure 5: To achieve 99% yield of a 64Mb eMRAM macro without ECC (blue line), the foundry bit cell D0 needs to achieve under 0.1ppm. However, by adding 1bit or 2bit ECC, bit cell D0 can be relaxed to 1ppm or 10ppm separately.
Synopsys works with leading foundries to provide eMRAM IP, with silicon-proven 28nm in volume production. The Synopsys eMRAM IP in 22nm is also silicon proven, and eMRAM IP for FinFET nodes is in development.
eMRAM is a promising memory technique for low-power SoCs requiring endurance and small area. Discrete MRAMs are already available and designers are leveraging embedded MRAM for greater PPA efficiency. As no two SoCs’ configuration demands are the same, Synopsys offers eMRAM compiler IP which can generate a wide variety of configurations to meet your specific design requirements.
For more information: