The following post gives an overview of the Lattice ECP5 FPGA architecture. It was done in conjunction with theSummer of FPGAs -- OrangeCrab Dev Bd - Review I did for element14 (link to road-test).
Architecture overview
Each ECP5/ECP5-5G device contains an array of logic blocks surrounded by Programmable I/O Cells (PIC). Interspersed between the rows of logic blocks are rows of sysMEM
Embedded Block RAM (EBR) and rows of sysDSP
Digital Signal Processing slices
The building blocks of the ECP5 are:
- Programmable Functional Unit (PFU): implementation of logic, arithmetic, RAM and ROM functions
- sysMEM: 18 Kb of memory for RAM/ROM implementation
- sysDSP: implementation of digital signal processing functions (multipliers and adder/accumulators)
- PIO: programmable input/output ports
- SERDES: 3.2 Gb/s (Note: The ECP5 that comes with the OrangeCrab does not have SERDES I/Os)
- DLL/PLL: clock management with Delay-Locked Loops (DLLs) and Phase-Locked Loops (PLL)
Lattice vision
"FPGAs should be used as a complement to ASICs and ASSPs" vs "FPGAs should be used to replace ASICs/ASSPs"
⇒ Low cost over increased capacity (up to 40% lower cost)
⇒ Low power over high performance (up to 30% lower power)
⇒ High Functional Density (up to 2x functional density)
The ECP5 is rated as a low power/midrange FPGA.
PFU (Programmable Functional Unit)
- Each PFU consists of four interconnected slices (0-3), each one with 2 x (LUT4+carry)
- 50 inputs / 23 outputs
- Slice modes:
- Distributed RAM, ROM
- Logic: LUT4, LUT5. With concatenation of slices LUT6, LUT7 and LUT8
- Arithmetic: add/substract, up/down counter, comparator (fast carry chain)
- 2 x registers per slide (8 FFs)
Clocking
- sysCLOCK PLL: synthesize clock frequencies
- Dynamic Clock Control: disable of clock per quadrants
sysMEM Memory
- Embedded Block RAM (EBR): 18Kb RAM
- True dual-port, pseudo dual-port, single-port RAM, ROM and FIFO (FIFO requires support logic from external PFUs)
- Available parity check
- Write behavior:
- Normal (data available only during read cycle after write cycle)
- Write through (output data is write data during write cycle)
- Read-before-Write (old content, only for x9, x18 and x36 data widths)
sysDSP
- Variable data width configurations
- pipeline
- Symmetry support (Odd/Even taps, 1D, 2D filters)
- Dual-multiplier accumulator
- Fully cascadable DSP blocks, support for symmetric, asymetric and non-symetric filters
- Per slide/block
- one 18 x 36, two 18 x 18 or four 9 x 9 Multipliers
- 36 x 36 by cascading across two sysDSP slices
- MAC: 18x36 or two 18x18, 52bits accumulator
- ALU (Arithmetic Logic Unit)
- Dynamically selectable ALU OPCODE
- Ternary arithmetic (addition/subtraction of three inputs)
- Bit-wise two-input logic operations (AND, OR, NAND, NOR, XOR and XNOR)
- Programmable ALU flags (overflow, underflow, and convergent rounding)
- Saturation and rounding options
- Time Division Multiplexing (TDM)
Simplified sysDSP Slice Block Diagram
PIO (programmable I/O cells)
- Grouped in four PIO cells
- Input/output and tristate register blocks
- Delay elements (high speed interface)
- built-in FIFO logic on some cells
- DDR memory support
- sysI/O Buffer
- LVDS, HSUL, BLVDS, SSTL Class I and II, LVCMOS, LVTTL, LVPECL, and MIPI
- single-ended
- 50Ω, 75 Ω, or 150 Ω
- differential
- 100 Ω
- drive strength, slew rates, bus maintenance (weak pull-up or weak pull-down) and open drain configurable
- SERDES (PCIe, Ehternet, ...) (only selected devices, not available in the OrangeCrab board)
- IEEE 1149.1-Compliant Boundary Scan Testability (Test Access Port - TAP)
Group of Four Programmable I/O Cells
Device configuration
- JTAG
- SPI / SSPI
- x8 CPU SPCM
- Decryption
- Multi-boot (remote configuration with backup)
- TransFR (Transparent Field Reconfiguration) - logic RTL update on the fly
SEU (Single Event Upset)
SEU mitigation with supporting functions:
- Soft Error Detect - during normal operation with error signal generation
- Soft Error Correction
- Soft Error Injection
- Dedicated CRC (Cycle Redundancy Code) logic for bitstream