Neural Processing Unit (NPU) IP

ENLIGHT

Discover a deep learning accelerator that accelerates inferencing computation with excellent efficiency and unmatched compute density.

OPENEDGES MEMORY SUBSYSTEM

Get to know OPENEDGS Memory Subsystem IP that consists of an interconnect, memory controller, and PHY IPs that work in unison to create maximum system synergies.

ENLIGHT Classic

4/8-bit mixed-precision NPU IP

Features a highly optimized network model compiler that reduces DRAM traffic from intermediate activation data by grouped layer partitioning and scheduling. ENLIGHT is easy to customize to different core sizes and performance for customers' targeted market applications and achieves significant efficiencies in size, power, performance, and DRAM bandwidth, based on the industry's first adoption of 4-/8-bit mixed-quantization.

Performs various operations of deep neural networks such as convolution, pooling, and non-linear activation functions for edge computing environments. This NPU IP far surpasses alternative solutions, delivering unparalleled compute density with energy efficiency (power, performance, and area).

Download Datasheet

ENLIGHT HW Architecture Diagram_Horizon.png

Hardware Key Advantages

Mixed-Precision
(4/8-bit) Computation

Higher efficiency in PPAs and DRAM bandwidth

Deep Neural Network (DNN)-optimized Vector Engine

Better adaptation to future DNN changes

Scale-out w/ Multi-core

Even higher performance by parallel processing of DNN layers

Modern DNN Algorithm Support

Depth-wise convolution, feature pyramid network (FPN), swish/mish activation, etc.

Software Key Advantages

High-level Inter-layer Optimization

Grouped layer partitioning and scheduling for reducing DRAM traffic from intermediate data

DNN-layers Parallelization

Efficiently utilize multi-core resources for higher performance & optimize data movements among cores

Aggressive Quantization

Maximize use of 4-bit computation capability

Toolkit Overview

NN Converter
- Converts a network file into internal network format (.enlight)
- Supports ONNX (PyTorch), TF-Lite, and CFG (Darknet)

NN Quantizer
- Generates quantized network: float to 4-/8-bit integer
- Supports per-layer quantization of activation and per-channel quantization of weight

NN Simulator
- Evaluates full precision network and quantized network
- Estimates accuracy loss due to quantization

NN Compiler
- Generates NPU handling code for target architecture and network

Applications

Person, vehicle, bike, traffic sign detection
Parking lot vehicle location detection & recognition
License plate detection & recognition
Detection, tracking, and action recognition for surveillance

Deliverables

ENLIGHT Toolkit is available to all eligible companies with the following items:

RTL design for synthesis
SW toolkits and device driver
User guide
Integration guide

ENLIGHT Pro

Highly scalable inference NPU IP for next-gen AI applications

The state-of-the-art inference neural processing unit (NPU) IP is suitable for high-performance edge devices including automotive, cameras, and more. ENLIGHT Pro is meticulously engineered to deliver enhanced flexibility, scalability, and configurability, enhancing overall efficiency in a compact footprint. ENLIGHT Pro supports the transformer model, a key requirement in modern AI applications, particularly Large Language Models (LLMs). LLMs are instrumental in tasks such as text recognition and generation, trained using deep learning techniques on extensive datasets. The automotive industry is expected to adopt LLMs to offer instant, personalized, and accurate responses to customers' inquiries.

ENLIGHT Pro sets itself apart by achieving 4096 MACs/cycle for an 8-bit integer, quadrupling the speed of ENLIGHT Classic, and operating at up to 1.0GHz on a 14nm process node. It offers performance ranging from 8 TOPS (Terra Operations per Second) to hundreds of TOPS, optimized for flexibility and scalability. ENLIGHT Pro supports tensor shape transformation operations, including slicing, splitting, and transposing, and supports a wide variety of data types --- integer 8, 16, 32, and floating point (FP) 16 and 32 --- to ensure flexibility across computational tasks. The vector processor achieves a 16 floating point 16 MACs/cycle, and includes a 32x2 KB vector register file (VRF). Additionally, single-core, dual-core, and quad-core with scalable task mappings such as multiple models, data parallelism, and tensor parallelism are available.

ENLIGHT Pro incorporates a RISC-V CPU vector extension with custom instructions. This includes support for Softmax and local storage access, enhancing its overall flexibility. It comes with a software toolkit that supports widely used network formats like ONNX (PyTorch), TFLite (TensorFlow), and CFG (Darknet). ENLIGHT SDK streamlines the conversion of floating-point networks to integer networks through a network compiler and generates NPU commands and network parameters via a network compiler.

Download Datasheet

Hardware Key Advantages

Mixed-Precision
Computation
(INT8, INT16, FP16)

Achieving accuracy while preserving power, performance, and area (PPA) efficiencies

Deep Neural Network (DNN)-optimized Vector Engine

Custom instructions for Softmax and local storage access & enhanced adaptability for future DNNs

Scale-out w/ Multi-core

Greater performance by parallel processing of DNN layers

Modern DNN Algorithm Support

Transformer architecture, depth-wise convolution, feature pyramid network (FPN), etc.

Software Key Advantages

High-level Inter-layer Optimization

Optimized layer grouping and scheduling to minimize DRAM traffic from intermediate data

DNN-layers Parallelization

Effective multi-core utilization for elevated performance & optimized core-to-core data transfer

Automated Quantization Flow

Minimization of quantization loss through mixed-precision computation

Toolkit Overview

NN Converter
- Converts a network file into internal network format (.enlight)
- Supports ONNX (PyTorch), TF-Lite, and CFG (Darknet)

NN Quantizer
- Generates quantized network: float to 4-/8-bit integer
- Supports per-layer quantization of activation and per-channel quantization of weight

NN Simulator
- Evaluates full precision network and quantized network
- Estimates accuracy loss due to quantization

NN Compiler
- Generates NPU handling code for target architecture and network

Applications

Automotive
Cameras
Person, vehicle, bike, traffic sign detection
Parking lot vehicle location detection & recognition
License plate detection & recognition
Detection, tracking, and action recognition for surveillance

Deliverables

ENLIGHT Pro Toolkit is available to all eligible companies with the following items:

RTL design for synthesis
SW toolkits and device driver
User guide
Integration guide

ENLIGHT

ENLIGHT

OPENEDGES MEMORY SUBSYSTEM

ENLIGHT Classic

4/8-bit mixed-precision NPU IP

Hardware Key Advantages

Mixed-Precision (4/8-bit) Computation

Higher efficiency in PPAs and DRAM bandwidth​

Deep Neural Network (DNN)-optimized Vector Engine

Better adaptation to future DNN changes

Scale-out w/ Multi-core

Even higher performance by parallel processing of DNN layers

Modern DNN Algorithm Support

Depth-wise convolution, feature pyramid network (FPN), swish/mish activation, etc.

Software Key Advantages

High-level Inter-layer Optimization

Grouped layer partitioning and scheduling for reducing DRAM traffic from intermediate data

DNN-layers Parallelization

Efficiently utilize multi-core resources for higher performance & optimize data movements among cores

Aggressive Quantization

Maximize use of 4-bit computation capability

Toolkit Overview

NN Converter

Converts a network file into internal network format (.enlight)​

Supports ONNX (PyTorch), TF-Lite, and CFG (Darknet)

​

NN Quantizer

Generates ​quantized network: float to 4-/8-bit integer

Supports per-layer quantization of activation and per-channel quantization of weight

​

NN Simulator

Evaluates full precision network and quantized network​

Estimates accuracy loss due to quantization

​

NN Compiler

Generates NPU handling code for target architecture and network​

Applications

Deliverables

ENLIGHT Pro

Highly scalable inference NPU IP for next-gen AI applications

Hardware Key Advantages

Mixed-Precision Computation (INT8, INT16, FP16)

Achieving accuracy while preserving power, performance, and area (PPA) efficiencies

Deep Neural Network (DNN)-optimized Vector Engine

Custom instructions for Softmax and local storage access & enhanced adaptability for future DNNs

Scale-out w/ Multi-core

Greater performance by parallel processing of DNN layers

Modern DNN Algorithm Support

Transformer architecture, depth-wise convolution, feature pyramid network (FPN), etc.

Software Key Advantages

High-level Inter-layer Optimization

Optimized layer grouping and scheduling to minimize DRAM traffic from intermediate data

DNN-layers Parallelization

Effective multi-core utilization for elevated performance & optimized core-to-core data transfer

Automated Quantization Flow

Minimization of quantization loss through mixed-precision computation

Toolkit Overview

NN Converter

Converts a network file into internal network format (.enlight)​

Supports ONNX (PyTorch), TF-Lite, and CFG (Darknet)

​

NN Quantizer

Generates ​quantized network: float to 4-/8-bit integer

Supports per-layer quantization of activation and per-channel quantization of weight

​

NN Simulator

Evaluates full precision network and quantized network​

Estimates accuracy loss due to quantization

​

NN Compiler

Generates NPU handling code for target architecture and network​

Applications

Deliverables

News

Location

Mixed-Precision
(4/8-bit) Computation

Higher efficiency in PPAs and DRAM bandwidth

Converts a network file into internal network format (.enlight)

Generates quantized network: float to 4-/8-bit integer

Evaluates full precision network and quantized network

Generates NPU handling code for target architecture and network

Mixed-Precision
Computation
(INT8, INT16, FP16)

Converts a network file into internal network format (.enlight)

Generates quantized network: float to 4-/8-bit integer

Evaluates full precision network and quantized network

Generates NPU handling code for target architecture and network