Introduction

With technological advancements in artificial intelligence, electric autonomous vehicles, demand for custom extensible processors is growing at an unprecedented pace. The RISC-V open standard Instruction Set Architecture (ISA), with its modular design and collaborative community has been instrumental in ushering in a new wave of processor designs enabling these technologies. As the foundational building blocks of new technologies, from machine screws to URL, HTML, and HTTP internet protocols, standards have always accelerated innovation. The standard RISC-V ISA enables designers to create highly efficient processors while saving software development time and thereby facilitating faster time-to-market. 

While the standard architecture enables custom designs through a common specification, customization in addition to the standard may be required for specific applications. The RISC-V community recognizes this need and evaluates many of these customizations for adoption back into the standard. Developing processor IP combining RISC-V defined vector extensions (RVV) with custom DSP instructions promises attractive benefits for low power embedded applications requiring signal processing capabilities. This is especially true for microcontrollers (MCU’s) which are extremely sensitive to power, area, and performance trade-offs.  

Evolution of Microcontrollers

Over the past several decades, MCUs have evolved from simple embedded systems to sophisticated, connected devices. In the late 1990s, MCUs saw significant improvements in performance and power efficiency along with the integration of analog to digital converters (ADCs), timers, and UARTs onto a single chip. During the 2010s, MCUs drove the IoT boom by integrating wireless connectivity technologies such as Wi-Fi, Bluetooth and Zigbee as well as security modules into the hardware. Vendors now offer DSP-enhanced versions of their general-purpose CPUs that can be programmed in the field. Figure 1 illustrates the convergence of vector processing, which has evolved from use in supercomputers to integration into MCUs, enabling efficient execution of control-oriented tasks. 

snps1498831778

Figure 1: Evolution of microcontrollers combining general purpose and SIMD/DSP functionality 

Vector processors are designed to operate on one-dimensional arrays of data, which includes Single Instruction, Multiple Data (SIMD). Common SIMD extensions in application processors include Intel’s MMX, SSE and AVX, Arm’s Neon and Motorola-Freescale’s Altivec for PowerPC. This hybrid general purpose + SIMD/DSP architecture also extends to MCUs.  

Unlike concurrent workloads that perform different computations simultaneously, a SIMD executes the exact same instructions in parallel but on different data. Common examples include adjusting the contrast in a digital image or the volume of digital audio. The basic architecture includes a separate scalar register file (Integer and/or Float) and a vector register file. Typically, the SIMD minimal element within the vector is 8-bit wide. Therefore, a 64-bit vector register can offer various programmable options: 1x64-bit, 2x32-bit, 8x8-bit, and 4x16-bit. This flexibility allows engineers to select the desired level of accuracy with the minimal word width, depending on the application.  

Recently, MCUs have increasingly integrated or applied advanced data techniques such as SIMD instructions and AI/ML vectorized operations for neural networks. In these instances, a single MCU handles front end digital signal processing of sensor data such as filtering and sensor fusion with a backend that can run AI/ML models for voice trigger, object detection and other low power AIoT applications on a single MCU. For example, Synopsys ARC® EMxD processors combine efficient DSP and AI/ML processing that benefit from reduced power consumption, simple design, software reuse, and system cost savings.  

MCUs were the first category of processors to adopt RISC-V and remain a key growth driver for RISC-V in the automotive and consumer markets. MCUs benefit significantly from RISC-V’s modular and simple design, which allows for greater efficiency and flexibility in cost-sensitive, low-power, and diverse embedded applications.  

Rise of RISC-V and Open Standard Model

The RISC-V open Instruction Set Architecture (ISA) reveals how close global collaboration on open-source software and hardware development can accelerate technical progress. Although RISC-V is not “open source” in the same way as software (where the actual code is freely available), it is an open specification in which the community can participate in discussions and contribute to the ratification of latest changes to the specification. 

Key benefits of RISC-V processors include design attributes that enable processor design configurability, extensibility, and software compatibility, fostering a rich ecosystem.  

The RISC-V ecosystem today includes over 4,300 members spanning areas of physical hardware, IP, system-on-chips (SoCs), development boards, full software stack from toolchains to operating systems, tool suppliers, debug vendors, emulators and simulators, verification services and educational materials. There are more than 10 billion RISC-V cores in the market today, and tens of thousands of engineers are working on RISC-V initiatives globally. 

Vector Processing Benefits in Microcontrollers with RISC-V Vector Specification

The RISC-V Vector Specification version 1.0 (RVV 1.0) is a ratified vector processing extension of the RISC-V ISA. Vector processing enhances performance through parallel processing. Common processing types include image and audio signal processing, rendering graphics, animations and game physics, data compression and encryption like cryptographic operations, and machine learning at the edge for tasks such as inference, feature extraction, and data preprocessing.

RVV 1.0 combines an efficient control plane with a performant data plane and designed with software development in mind. Its unique adjustable vector lengths ranging from 32-bit to 2048-bit allows engineers to dynamically configure their designs to specific performance requirements. Additionally, designers can leverage RVV’s built-in efficiencies and performance enhancements, such as vector chaining, to achieve even greater improvements. Application software developers can write vector-length agnostic code, regardless of the actual vector-length of the hardware it will execute on which allows for maximum software reuse. RVV’s simplified design optimizes for smaller code size rather than memory utilization. For example, each vector uses the same number of elements in the vector as defined by the vector length, regardless of whether all elements are in use. This approach requires fewer instructions to execute the code, resulting also in greater power efficiency. Additionally, RVV benefits from a rich software development environment, enabling other open-source toolchains like LLVM and GCC, to support auto-vectorization, which further simplifies the development of vector processing applications.

Enhancing RVV with Custom DSP Instructions for Efficient Signal Processing

Optional custom extensions tailored to specific applications have gained popularity among SoC designers. Although RVV supports some fixed-point operations, adding DSP instructions for signal processing applications (such as FFT, FIR, and matrix multiplication) and multimedia processing applications (including audio, video, and image processing) can further optimize the processor’s power, performance, and area (PPA).  

Synopsys ARC-V™ RMX-100D Series processor (Figure 2) integrates the RVV1.0 standard with custom DSP instructions, creating a highly optimized and cost-effective solution for low-power embedded applications for efficient signal processing. By integrating DSP and RVV capabilities, significant improvements in cycle count performance and power efficiency can be achieved. 

snps1498831778

Figure 2: Synopsys ARC-V RMX-100D Processor IP Block Diagram

Figure 3 illustrates significant improvements in cycle count, performance and efficiency for commonly used algorithms in signal processing: Vector Add, Vector Dot Product, Matrix Multiply, Fast Fourier Transform (FFT) and Finite Impulse Response (FIR). These enhancements are achieved with the Synopsys ARC-V RMX-100D processor by combining RVV with DSP instructions, compared with RVV only standard implementations.

snps1498831778

Figure 3: Speed-up vs RVV, when adding custom DSP instructions with RMX-100D Processor

Conclusion

The RISC-V instruction set architecture (ISA), enabling modular and extensible design implementations, provides an ideal foundation for low-power embedded applications. By extending RVV with DSP capabilities, baseline RISC-V implementations can achieve significant improvements in cycle count performance and power efficiency while maintaining backward compatibility and maximum software reuse for a wide range of signal processing workloads. Synopsys’ ARC-V RMX-100D and RMX-500D series processors incorporate RVV1.0 with custom DSP instructions, offering highly optimized and cost-effective solutions for low-power embedded applications that require advanced signal processing.

For more information, visit: Synopsys RISC-V ARC-V Processor IP and RISC-V International – RISC-V: The Open Standard RISC Instruction Set Architecture (riscv.org)

Synopsys IP Technical Bulletin

In-depth technical articles, white papers, videos, webinars, product announcements and more.

Continue Reading