Cloud native EDA tools & pre-optimized hardware platforms
Gordon Cooper, Staff Product Marketing Manager, Synopsys
Facial recognition – the challenge of identifying and verifying faces in photographs – is a simple problem for humans but a complex problem for computers. After sixty years of research, recent deep learning advancements enable widespread use of face recognition technology across a broad range of applications. Access control for your battery powered device? Authenticating users in banking applications? Replacing the need for car keys in your automobile? More personalized patient care at hospitals? All use cases that could benefit from facial recognition. Implementing facial recognition in edge device applications will require efficient, low power processors with dedicated support for the latest artificial intelligence (AI) algorithms.
Because computerized facial recognition measures a human’s unique physical characteristics, facial recognition is a type of biometrics. Accuracy can be difficult because of the variability of human faces. Any image captured could include variations in expression, position, pose, orientation, or lighting conditions. Individual faces could also vary with skin color, differences in facial hair, or inclusion of eyewear. Still, facial recognition is easier to deploy and implement than other biometric techniques like iris or voice recognition. And unlike fingerprint recognition, facial recognition does not require any physical interaction by the end-user.
For a computer, facial recognition can be broken down into different steps (Figure 1). Face detection locates one or more faces in the image. Face normalization aligns the captured face to be consistent with the faces already stored in the database. Feature extraction is used to extract and measure the distance between key features, such as the eyes or points on the lips, for pattern matching. Finally, face matching is the last step in recognition where the extracted features of the input face are compared against the extracted features of the faces in the database.
Figure 1: Steps of facial recognition
Facial recognition techniques appeared as early as the 1960s when Woodrow (Woody) Bledsoe, a mathematician and computer scientist, first applied computers to the challenge of recognizing human faces. Working with Helen Chan and Charles Bisson at Panoramic Research, Inc (PRI), Bledsoe helped develop a system to manually extract and record the coordinate locations of various facial features (eyes, nose, hairline, mouth, etc.) from a batch of photos (such as a book of mugshots). The metrics were then input into a computer database. Given a new photo, the database determined which original photo most closely resembled the new photo. This manual approach to capturing facial characteristics had its limitations, but Bledsoe continued his work in pattern matching and automated reasoning and is considered one of the founders of AI.
In 1991, Matthew Turk and Alex Petland, from MIT’s Media Laboratory Vision and Modeling Group, had a breakthrough when – applying algorithms based on linear algebra – they were the first to use computers to automatically extract facial feature from photos. By 2001, Paul Viola of Mitsubishi Electric Research Labs and Michael Jones of Compaq released a paper detailing their rapid object detection algorithm – dubbed Viola-Jones. Used as a real-time face detector, Viola-Jones required full view frontal upright faces but could detect faces in real time with high accuracy. It became the standard for facial detection for many years until it was overtaken by deep learning techniques.
The advent of deep learning techniques was a game changer for facial recognition. Trained neural networks (NNs) provided a significant jump in accuracy over programmed algorithms like Viola-Jones. In 2014, Facebook released Deepface, a nine-layer neural network with over 120 million connection weights (or coefficients) that have been trained by Facebook users using four million uploaded images (Figure 2). Deepface had 97.25% accuracy. In 2015, Google researchers developed FaceNet with an accuracy of 99.63%. FaceNet is really more than one neural network graph. It is based on at least two convolutional neural networks (CNNs).
Figure 2: Outline of the DeepFace architecture
Both DeepFace and FaceNet, with their large coefficients, are best suited to online, non-real-time applications – like sorting pictures in the cloud. Large coefficients require large memories, a significant amount of computations and a lot of data movement – all making cloud-based real-time low power implementations difficult to impossible. As market interests for facial recognition shift to low power implementations on edge devices such as smart phones, consumer cameras or laptops, performance limitations required more efficient neural networks and power efficient neural network engines.
Another strong reason for the move from cloud based facial recognition to edge devices is individual privacy. Face recognition – especially in surveillance use cases – can passively capture faces. Recording faces of unsuspecting people and sending unique identifying information to a central/cloud database is causing privacy concerns for many consumers. One advantage of facial recognition done entirely at the network edge is that the computations are done offline instead of using cloud-based services.
Although there are many low-power edge device facial recognition applications, perhaps the most difficult to achieve is an always-on facial recognition system providing access control to a battery-operated device like a laptop, tablet or cellphone. A user expects hands free and instantaneous facial recognition. This means the electronics needs to be constantly looking for a face. One of the best solutions is to divide the facial recognition tasks across two processors.
Small microcontrollers with DSP capabilities – such as the Synopsys DesignWare® ARC® EM9D processor IP family – excel at always-on performance for AIoT applications. While small AI-enabled microcontrollers don’t have the horsepower to handle the full facial recognition neural network tasks, they can detect if a face is present and then wake up a dedicated, power-efficient neural network accelerator, such as in the ARC EV processor IP, to complete the recognition steps.
Facial detection for access control where the input image is prominent and close is a fairly simple AI task that can be performed with minimal input image size and with a neural network graph with minimal computational requirements. Keeping the graph complexity and the number of coefficients low are extremely important to minimizing power and to fitting on a small microcontroller.
The facial detection graph in Figure 3 was developed in collaboration with Technical University of Eindhoven. Captured input images are reduced to 36x36 pixel gray scale images, and four layers and only 1k model parameters are needed. Facial detection applications like this can be run with minimal power consumption on the ARC EM9D processor. The ARC EM9D has a zero latency XY architecture for multiply-accumulator (MAC)-intensive CNN algorithms and can perform the facial detection graph while only consuming µWs of power. To accelerate design closure, the ARC EM9D processor is supported by a highly optimized embARC MLI library that supports a wide range of low/mid-end machine learning applications like always-on facial detection.
Figure 3: Small binary neural network (NN) classifier for 36x36 grayscale images outputs a positive decision for the images of a face and a negative decision on other images
Once the presence of a face is detected, the EM9D processor can wake up another processor capable of executing the more computationally intensive task of facial recognition. Certainly, a CPU or GPU can do the job, but often not with the power efficiency needed to minimize power while meeting frame rate targets. Instead, a dedicated neural network accelerator such as the DesignWare ARC EV7x vision processor IP is optimized for performance and power efficiency. It can be turned on just long enough to perform the facial recognition and then turned off again to conserve power.
The ARC EV7x processor contains a dedicated neural network engine with 880-3520 MACs. The ARC EV7x processor family is optimized to minimize bandwidth, which supports power efficiency since the large set of coefficients, the input images, and the intermediate feature maps, are all stored in external DRAM. Any efforts to minimize the amount of data movement in and out of the DRAM will pay off with lower power consumption.
In addition to power efficiency, neural network accelerators must offer flexibility to support the latest neural network graphs. Today there are more computationally efficient neural networks suitable to low power edge devices than DeepFace and FaceNet. MTCNN or Multi-Task Cascaded Convolutional Neural Network is a popular and accurate face recognition using an image-based approach. Classification graphs often used as a building block to facial detection are MobileNet (which gains its efficiency using depthwise separable convolutions to reduce the computational complexity of standard convolutional layers). The ARC EV7x processors offer the flexibility to support any neural network graph.
An ARC EV71 Vision Processor with an 880 MAC deep neural network engine (EV71 DNN880) can perform facial recognition using only 5mW of power (using a propriety facial recognition algorithm in a 12nm process node). The EV7x also incorporates fine grained power and clock gating which allows for minimizing leakage power consumption. The EV7x processors are supported by the MetaWare EV Development Toolkit, which includes a neural network Compiler that takes the trained neural network graph and automatically optimizes and compiles the graph to run in hardware (Figure 4).
Figure 4: DesignWare ARC EV7x Vision Processor Block Diagram
Implementing facial recognition in edge device applications requires efficient, low power processors with flexible support for the latest AI algorithms and graphs. When combined, the DesignWare ARC EM9D Processor IP and the DesignWare ARC EV71 DNN880 Vision Processor IP provide an extremely power efficient solution for always-on facial detection and power efficient facial recognition.