Power- and Area-Efficient Floating Point Unit for Synopsys ARC Processors
The Synopsys ARC® Floating Point Unit (FPU) options for ARC EM, ARC HS, ARC VPX, and ARC EV processors add performance-efficient hardware acceleration to enable floating point math acceleration. There are four versions of FPU:

ARC FPU for ARC EM – Supports ARC EM4, EM6, EM5D, EM7D, EM9D and E11D processors

The ARC FPU for the EM processor family has full single-precision (SP) hardware support with double-precision (DP) acceleration extensions that speed up some of the more common double-precision operations and is designed specifically to be compact in area and power to match the requirements for applications the ARC EM processors target.

ARC FPU for ARC HS – Supports the ARC HS3x processors

The ARC FPU for the HS3x processors has support for IEEE-754 compliant single-, and double-precision operations. The HS double-precision hardware support is comprehensive with a larger set of DP instructions and 64-bit data paths to and from the HS core registers. All the SP instructions in this FPU feature DP equivalents along with full DP to SP conversions. The HS3x FPU is designed to take advantage of the 10-stage pipeline and high-performance capabilities of the ARC HS3x processor cores.

ARC Fast FPU for ARC HS – Supports the ARC HS4x processors

The ARC Fast FPU for the HS processors has support for IEEE-754 compliant half, single-, and double-precision operations. The FPU features reduced latency and increased performance. The HS double-precision hardware support is comprehensive with a large set of DP instructions and 64-bit data paths to and from the HS core registers. All the SP instructions in this FPU feature DP equivalents along with full DP to SP conversions. HP support is included for converting between HP and SP representations. The Fast HS FPU is designed to take advantage of the 10-stage, superscalar pipeline and high-performance capabilities of the ARC HS4x processors.

Both the EM and HS FPUs are supported by the ARC MetaWare C/C++ Compiler and, when used together, both of these ARC FPUs comply with the IEEE-754-2008 Standard for Binary Floating Point Arithmetic.

ARC Vector FPU (VFPU) for ARC VPX – Integrated within the VPX DSP processors

The ARC VPX VLIW/SIMD family of DSPs has up to three parallel floating point processing pipelines, optional IEEE-754 compliant vector floating point units that supports both full (32-bit) and half (16-bit) floating point operations. The VPX cores also have the option to add a dedicated math engine vector floating point pipe, supporting an extensive set of math functions including: div, √x, 1/√x, sin(x), cos(x), log_2 (x), 2^x, and e^x.

ARC VFPU for ARC EV – Supports the ARC EV7x processors

The vector FPU (VFPU) can be integrated into the EV7x embedded vision processor's vector DSP core. The VFPU is IEEE 754-compliant for ADAS applications, self-driving vehicles, powertrain, and automotive ADAS sensor fusion (linear algebra). Combined with the ARC MetaWare EV Development Toolkit, the VFPU offers performance levels of up to 328 Gigaflops for single precision operations and 655 Gigaflops for half precision operations.

ARC TFPU for ARC NPX – Supports the ARC NPX processors

The tensor FPU (TFPU) can be integrated into the NPX NPU IP’s data paths to add BF16 and FP16 computations to the convolution and tensor accelerators. The TFPU uses existing datapaths minimizing the area impact of the additional floating point capabilities. The TFPU option can be combined with the data compression (OCP MX) option to add microscaling DMA capabilities.

Silicon-Efficient Floating Point Extensions for ARC Processors

Synopsys' ARC® FPX Floating Point Extensions add high-performance single- and double-precision math instructions to the ARC 600 and ARC 700 processor families. ARC FPX dramatically accelerates computations where data sets have a large dynamic range and when high precision is required.

When used with the ARC MetaWare compiler, Synopsys ARC FPX complies with the IEEE-754 Standard for Binary Floating Point Arithmetic. Synopsys ARC cores with Synopsys ARC FPX provide an ideal solution for system-on-chips (SoCs) that perform graphics and image processing, complex computations or control algorithms, especially where power and area budgets are constrained.

Synopsys ARC Fast Floating Point Unit for ARC HS4x Processors Datasheet
Synopsys ARC Floating Point Unit for ARC EM Processors Datasheet
Synopsys ARC Floating Point Unit for ARC HS3x Processors Datasheet
Synopsys ARC Floating Point Extensions Datasheet

Highlights

Features

Products

Downloads and Documentation

Very small die area and power

ARC FPU and FPX are both implemented using the APEX extensibility feature of the Synopsys ARC processor architectures
In contrast to the very large floating point co-processors required by competitive cores, ARC FPU, VFPU, and FPX instructions are integrated into the ARC cores themselves at build time
Synopsys' approach achieves similar floating point performance to a co-processor, but with much smaller die area and power

Flexible configuration options

SoC designers using ARC FPU can specify single precision extensions, double precision or both, as required in their application
SoC designers using ARC VFPU can specify half precision, single precision, or both single and double precision, as required in their application

Compiler math library optimizes performance

The ARC MetaWare compiler takes full advantage of the Synopsys ARC FPX and FPU instructions to accelerate transcendental and other functions specified in IEEE-754
The ARC MetaWare EV OpenCL C compiler takes full advantage of the Synopsys ARC VFPU instructions to accelerate transcendental and other functions specified in IEEE-754

FPU (ARC EM Family)

IEEE 754-2008 compliant
Full hardware single-precision support with double-precision acceleration
Full hardware single-precision (SP) and double-precision (DP) support (FPU for HS only)
Support for float-to-integer and integer-to-float conversion
Full support for SP to DP conversion (FPU for HS only)
Full 64-bit data paths to and from core registers (FPU for HS only)
Full clock-gating support for power efficiency
Power save features on all data paths and intermediate registers
Optional divide and square root support
Optional fused multiply/add and multiply/subtract
Single-cycle multiplier with two-cycle multiply option for higher frequency operation
Fewer than 14K gates in a single-precision, area-optimized configuration
Peak performance 1.0 Mflops / MHz

VFPU (ARC EV Family)

IEEE-754 compliant
Configurable – 1, or 2 or 3 Vector FPU pipes per core
Up to 512 GFLOPS (16 bit FP multiply-add, 4 cores @ 1GHz). Alternatively, VFPUs are coupled with VLIW slotting that allows all three units to execute in parallel, providing up to 200GFLOPS/second with dedicated parallel execution of algebra math functions at 12.5GFLOPS/second
Support both single (32-bit) and half (16-bit) floating point precision
ISA: multiply-add/sub, compare, min/max, reductions, convert, property
Supports 4-way SIMD vector math (Division, SQRT, 1/SQRT, log2(x), 2^x, Sine, Cosine, Arctan)

FPX (ARC 600 and ARC 700 Families)

Single Precision

MUL, ADD, SUB implemented directly in hardware
3 CPU cycles latency per instruction, pipelined
13 – 23x faster than an optimized software library
Approx. 10 - 20K gates

Double Precision

5 CPU cycles per ADD or SUB instruction, 7 CPU cycles per MUL instruction
9X - 19X faster than optimized software library
Peak performance: 200 Kflops / MHz
Approx. 25K - 30K gates

ARC MetaWare Math Library for FPU and FPX

Optimized for Synopsys ARC FPX and FPU hardware
Provides additional arithmetic and transcendental functions
Complies with IEEE-754
Allows re-linking of existing object files

IEEE754 compliant single and/or double precision floating point unit for ARC EM processor cores.	STARs	Subscribe
Fast floating point option for the ARC HS4x and HS4xD processors	STARs	Subscribe
High performance vector floating point unit option for EV6x, EV7x, VPX processor families	STARs	Subscribe

Description:	Fast floating point option for the ARC HS4x and HS4xD processors
Name:	dwc_arc_hs_fast_fpu_option
Version:	4.10a
ECCN:	3E991/NLR
STARs:	Open and/or Closed STARs
myDesignWare:	Subscribe for Notifications
Product Type:	DesignWare Cores
Documentation:	Show Documents... Hide Documents... Datasheet Synopsys ARC Fast Floating Point Unit for ARC HS4x Processors Datasheet ( PDF )
Download:	arc_hs_processor
Product Code:	F490-0

Description:	High performance vector floating point unit option for EV6x, EV7x, VPX processor families
Name:	dwc_vector_fpu_option
Version:	2.20b
ECCN:	3E991/NLR
STARs:	Open and/or Closed STARs
myDesignWare:	Subscribe for Notifications
Product Type:	DesignWare Cores
Documentation:	Contact Us for More Information
Download:	ev-vision_processor
Product Code:	B869-0, B870-0, B871-0, C173-0, C623-0, C624-0, C769-0, D107-0, E941-0, E942-0, E943-0

Description:	IEEE754 compliant single and/or double precision floating point unit for ARC EM processor cores.
Name:	dwc_arc_fpu
Version:	5.70b
ECCN:	3E991/NLR
STARs:	Open and/or Closed STARs
myDesignWare:	Subscribe for Notifications
Product Type:	DesignWare Cores
Documentation:	Show Documents... Hide Documents... Datasheets Synopsys ARC Floating Point Unit for ARC EM Processors Datasheet ( PDF ) Synopsys ARC Floating Point Unit for ARC HS3x Processors Datasheet ( PDF )
Download:	arc_em_processor
Product Code:	A630-0