RoadTest: Summer of FPGAs -- Lattice MACHXO3LF Starter Kit
Author: kvik86
Creation date:
Evaluation Type: Development Boards & Tools
Did you receive all parts the manufacturer stated would be included in the package?: True
What other parts do you consider comparable to this product?: MachXO2 breakout board is very similar, where the FPGA is in a TQFP package.
What were the biggest problems encountered?: The tools, IDEs sometimes went crazy, not sure why. Propel didn't update from the sys_env.xml The schematics in the official user's manual had bad quality, unreadable (workaround found).
Detailed Review:
Lattice MachXO3LF starter kit
This roadtest deals with the Lattice MachXO3LF Starter kit. The board is based on the Lattice MachXO3LF-6900C (LMXO3LF-6900C-6BG256I, 256 pin BGA) FPGA/CPLD. The starter kit contains only minimal additional components: an FTDI chip for programming and UART communication over USB, 8 LEDs, 4 DIP switches, an SPI flash and a small prototype area. Most of the pins are accessible over 4 pcs of standard 2x20 pin headers.
The kit itself can be used to get started with FPGA programming, however in order to connect other peripheries, some prototype board soldering or custom PCB design is needed. This is not necessarily a problem, as unwanted or unused devices are not connected to the FPGA pins, thus no need to deal with such problems.
In order to get the development up and running, only the attached mini USB cable needs to be plugged in and the free Lattice Diamond IDE needs to be installed with a free license.
Smaller programmable logic devices, offering less features are categorized as CLPDs (complex programmable devices) (not mentioning even simpler PALs or PLDs), while more advanced devices are FPGAs (field programmable logic array). MachXO devices can be considered to be somewhere in the middle: small FPGAs or advanced CPLDs. In this review I’m going to use the term FPGA for these devices.
The manufacturer has other devices in its portfolio, among others: even simpler, low power, small devices: iCE40 family, high performance larger FPGAs: ECP5 series or the CrossLink family of devices etc.
FPGAs offer a different architecture and logic for programmable devices compared to microcontrollers. Microcontrollers consist of a processor (executing instructions), memory (storing program and data) and peripheries (dedicated for different purposes), while all these are connected over busses. FPGAs contain much less application specific blocks, instead they contain a large number of universal logic circuit blocks that can be configured to provide specific outputs for given inputs (LUTs). These blocks are connected through configurable switches. FPGAs don’t “run” program code, instead a hardware is described using HDL (hardware descriptor languages), such as Verilog or VHDL. It is possible (and often done) to describe processors in HDL, however usually these run at lower speeds compared to dedicated silicon implementations. FPGAs usually contain some other type of dedicated elements such as configurable memory blocks (as RAMs, FIFOs etc.), DSPs (digital signal processors), PLLs, IO circuits.
There has been three generation of the MachXO devices so far (MachXO, 2 and 3). The first generation consist of “smaller” devices containing fewer logic blocks, however the XO2 and XO3 families have similar in features. The XO3 family offers lower power consumption, there is also a device one step larger (MachXO3LF-9400) compared to the largest of the XO2 family (XO2-7000). However, all XO3 devices come in BGA packages. If you are looking for devices that can be soldered by hand, the XO2 family offers this for you: the XO2-7000 is available in a 144-pin TQFP packaging.
The XO3 family consist of multiple lines of devices: XO3L FPGAs can store the configuration internally in multi time programmable non volatile configuration memory (NVCM), but this memory is not flash memory. Traditionally FPGAs store configuration in external memories. XO3LF devices store configuration internally in flash memory, while part of this memory can be used as user flash memory (UFM), used as non volatile memory during runtime as well. Both devices allow external configuration memory opposed to internal, even fallback, dual-boot is possible. XO3D and NX devices contain embedded cryptography (security) modules (such as ECDSA public key signing, AES-ECB symmetric block encryption, SHA-256 hashing etc).
Each family of devices consist of multiple chips, depending on the number of LUTs, memory blocks (EBR), number of IOs, packaging, speed grade, etc. For more information see the device comparison table in the device family datasheet.
MachXO3 provides configurable IOs for different logic levels: CMOS 3.3V, 2.5V, 1.8V, 1.5V, 1.2V, LVTTL, LVDS, etc. pullups, pulldowns, offering a wide range of connectivity to other devices (even CSI camera and DSI display connection compatibility). The datasheet contains information of the input/output compatibility depending on the IO port supply voltage. Input/output signals can be geared, DDR inputs/outputs are also supported, offering the possibility to use fast signals outside the device while “slow” them down and parallelize inside. This feature is widely used to handle fast serial signals.
The device contains one or two programmable PLLs that can be used to create clock sources running at different frequencies. These PLL blocks might even have multiple outputs. There is one programmable RC oscillator in the FPGA (2.08-133MHz). Also, multiple clock inputs are available as custom clock sources.
Embedded memory in these devices is much more scarce compared to microcontrollers, especially for those who come from microcontroller programming. The 6900-LUT device contains 240kb (~30kB) embedded block RAM, which can be configured in many ways (single or dual port RAM, FIFO etc.), however you may lose capacity as blocks cannot be “broken up”, but multiple blocks can be combined. The largest of the XO3 (9400 LUT) family has almost double the RAM. Less LUTs also come with less RAM and IO as well. Unfortunately, if you need more RAM, you have to go with larger devices (LUTs IOs and packaging) as well. There is no option to go for either many IOs, many LUTs or large RAM capacity. Besides EBRs, the other type of RAM is distributed RAM (registers), naturally the total capacity is significantly less than what EBRs offer.
MachXO FPGAs contain embedded “hardened” I2C and SPI IPs, as these are used quite often in designs, they do not take up additional capacity and also provide interface for configuration (programming) as well. The flash-based device also provides UFM (user flash memory module) which can be accessed during configuration or runtime. The open Wishbone bus can be used in custom designs to access these functions through registers just as in microcontrollers. However, the interfacing is up to the user: we need to implement the programmable logic side of the Wishbone bus. These interfaces are also used to configure the FPGA: either external memories can be connected and read by the device, or programmer devices can assert commands to program the configuration memory. The programmer protocol is not extremely complicated (official documentation is available), and there are examples how to update the configuration of the FPGA by a microcontroller.
None of these devices contain more advanced embedded functions such as DSPs, Ethernet MAC, PCI-E etc., you need to step up in device complexity to get those features.
As MachXO2 and MachXO3 devices are not that different and the same tools can be used, it might be worth looking into MachXO2 examples and tutorials as well.
Useful links:
There are many reference designs available for the devices in the Reference Design section of the product page: https://www.latticesemi.com/Products/FPGAandCPLD/MachXO3
Even more IPs and examples can be used in the Lattice Solution page, which might be general for more products: https://www.latticesemi.com/solutionsearch?qprod=3cd7cb2039554a51b81a8864086eb1b7&qiptype=3614c818569f4eecb0602ba20a521a41,6da9534f318a4969a6b5e7dc9081bdba
The manufacturer provides soft processor cores to be used in its FPGAs: the Mico8 and Mico32. The former is extremely simple and take up very little resources in the FPGA, however its capabilities are also limited. Mico32 is much more advanced, however might not fit into these FPGAs in practical applications.
The Mico8 is an 8-bit processor with 16-32 general purpose registers, the code memory can contain 256-4k instructions, stored internally or connected over Wishbone interface. Scratchpad memory (RAM) can also be internal or over Wishbone. It supports 8-32 deep stack, up to 8 external interrupts. Over the Wishbone interface common peripheries can be connected (UART, GPIO, DMA, SPI, MachXO EFBs – I2C, SPI, Timer) or custom peripheries implemented in the FPGA design. Depending on the device, 40-50MHz operation frequency can be achieved – according to the datasheet. The manufacturer provides an Eclipse-based IDE and (C/C++) toolchain to configure and generate HDL for the processor and code (drivers) for the peripheries. C code is transformed into memory image files that can be downloaded to the device or inserted into the HDL design.
Mico platform IDE, component instantiation and connection
Eclipse-based IDE for Mico software development in C/C++
Useful links:
The manufacturer also provides a RISC-V core using the Lattice Propel IDE. This IDE approaches development from the (soft)processor, allows drag-and drop instantiation and connection of available and custom IPs.
Useful links:
As mentioned before, the starter kit itself does not contain many peripheries, only the necessary ones, most of the IOs are connected to the headers. The user’s guide of the starter kit shares information between multiple versions of the board, equipped with different, but similar devices (XO2-4000, XO3L-6900, XO3LF-6900). There is also a similar kit with a TQFP XO2-7000 device.
The most important information in the manual is the expansion header pin information – which FPGA pin is connected to which standard 100 mil header pin, and the board schematics at the end of the manual. However, the schematics were exported as bitmaps and it is extremely difficult to read. Unacceptable. This is a mistake the manufacturer should fix. (The schematics of the XO2 breakout board is fine.) Fortunately, the OrCAD schematics are available for the board and a free OrCAD viewer can be downloaded from Cadence.
The first page of the schematics provides an overview of the system. The second page contains the FT2232H USB chip, that implements the JTAG interface and UART as well. Both can be disconnected by unsoldering 0R resistors. The JTAG interface is available on header pins as well, allowing to download configuration to other target boards. The UART interface implements all signals (RTS, CTS etc.) not only TX and RX. There is also an option to configure the B port of the USB chip to operate as I2C instead of UART. The FT_Prog utility is needed to reconfigure FTDI chip ports. An external EEPROM contains the configuration for the FTDI chip. The reviewer has not tested this I2C function.
The third page contains the power and power configuration. LDOs are used to create 3.3V and 1.2V from the mini USB 5V connector. Each IO bank of the FPGA contains IO power pins, allowing to use multiple voltage levels on different banks. The datasheet must be checked for compatibility. Mostly higher voltage applied to the bank still allows to use lower input voltage levels, but the IO supply determines the output level. According to the schematics resistor can be resoldered to choose between +3.3V and +1.2V options.
Page four contains the schematics related to bank0. Here you can find the I2C, UART connection, JTAG and other programming pins. J3 header contains the pins connected to Bank 0. Multiple GND pins are available and VCCIO0 bank supply voltage is well. Most pins are grouped into groups of two (naming ending with …A and …B), denoting these two pins can be used for differential in/outputs. If differential signaling is not used, they can be used independently.
The next page contains Bank 1, J4 header is used to connect these signals. The 8 LEDs on the board are also connected to this bank.
Page six details Bank 2. Optional terminator resistors are provided for differential pins on this bank. By default, these are not populated, but can be by the user if needed. The FPGA internally contains a selectable 100R resistors. The external SPI flash is shown on this page. Some inconvenience with this layout is that to connect more pins from the FPGA to the headers less GND connections are used. For comparison the MachXO2 breakout board provides a GND between differential signals, which provides better signal quality – for the price of less IO.
Page seven contains the schematics for Banks 3, 4 and 5. J6 header contains pins from these banks with the bank IO supply pins as well. Looking at the board and the gerber files, it looks like many of the signal wires connected to J6 are differentially routed, which is needed when connecting such signals to the board. This page contains the external switches. Page eight shows the power signals of the FPGA and the LEDs.
By default, the UART and I2C pins are disconnected between the FT2232H chip and the FPGA. In order to use UART, at least R14 and R15 0R resistors should be populated. By default, B port of the FTDI chip was configured as “245 FIFO”, I changed this to “RS232 UART” and set the driver from D2XX to VCP. In Windows Device Manager the virtual COM port still did not show up, but in the list of USB devices there was USB Serial Converter B. Double clicking on this and selecting the “Special” tab, and checking “Load VCP Driver” finally made a new virtual COM port show up.
FT_Prog utility to configure the FT2232H USB chip port, port B is set to UART
Useful Links:
This is the basic software required for MachXO HDL development, it runs both on Windows and Linux. A free license must be requested after registration, the license is tied to the local Ethernet port MAC. Installation is quite straightforward.
The graphical interface of Diamond feels like timetraveling to the end of the 90s, early 2000s. Nothing fancy here. I encountered some problems perhaps related to text encoding, so my suggestion is to stick to plain old ASCII source files, UTF8 (with BOM especially) might cause problems.
Besides the project tree and the source editor, I want to give details of some of the most important features of the IDE.
The Spreadsheet view provides configuration of the device. The Port Assignments tab shows all the in-and output signals of the topmodule, after running the synthesize step. Here you can configure package pin number, IO type (LVCMOS, LVDS, different voltage levels etc.), pullup/down, slewrate, open drain mode, differential resistor, hysteresis etc. for each signal. The Pin Assignments tab shows all available pins of the package and the associated nets.
Spreadsheet view, port assignments
The Global Preferences tab contains general configuration for the device, such as enabling hardened I2C or SPI ports, programming options, setting the UserCode, letting the system know the bank IO supply voltages etc.
Spreadsheet view, global preferences
Timing preferences tab allows to set frequency and hold constraints for the synthesis engine.
Spreadsheet view, timing preferences
The synthesis process is broken into multiple steps. Be wise when you need to restart, it might take some time to finish. There are two engines the user can choose from: LSE Lattice Synthesis Engine or Synplify Pro. It is possible to switch between the engine, by right clicking on the implementation (impl1) and clicking Select Synthesis Tool. Global settings for the engine can be edited by double clicking on the bold (selected) strategy in the project view.
Running the synthesis provides lots of useful information, such as timing analysis. It is extremely important to check whether the design is capable to satisfy the timing constraints, such as clock frequencies that were set up in the spreadsheet view. If not, it shows, which signals suffer the most delay. In this case some reimplementation, pipelining of complex combinational logic is required.
The end result of the synthesis is a bitstream or JEDEC file that can be downloaded to the device. The Tools-Programmer menu allows you to download programmer files. There is a check of validity (correct device code), file time is also shown which is very useful. The operation can be changed whether we want to read back flash contents, erase or verify only, download configuration to RAM, program the external memory etc. Configuration can be secured in the device to make readback impossible. The programmer interface can also be selected (FTDI chip) and even the port, clock, special pin settings can be set.
Diamond programmer tool
The IPExpress tool (Tools->IPExpress) allows you to instantiate and configure embedded blocks, such as EFBs (I2C, SPI, Flash), PLLs, DDR input/outputs, software arithmetic modules (Adders, Multipliers, Comparators, …), EBRs as RAM, ROM or FIFOs.
SPI configuration of the EFB (embedded function block)
RAM modules can be configured as single (RAM_DQ) or dual port (RAM_DP). Dual port RAMs provide read and write operations at the same time even in different clock domains. This is very useful for passing data over clock domains. One EBR block provides 9kbits of storage, meaning 9bit wide memory can contain 1024 words. Word width of address depth can be configured freely, even by utilizing multiple EBR modules. However, if a configured RAM block does not fully utilize the number of EBR blocks, the unused capacity is lost. In order to waste less resources for lower capacity RAM blocks, distributed memory (registers) can be used. RAM and ROM blocks can have initialized contents that can be described in a text file as hex or binary image.
Dual port RAM instantiation
FIFO_DC blocks also provide similar functionality for data domain crossing, but the control logic is embedded, manual addressing is not required. EBR resource usage is similar, 9kbits of data is provided by each EBR block. Data width is the same on both the reader and writer side. Besides full and empty flags, configurable almost empty and almost full signals are also available. For more information on the memory blocks, see the Memory Usage Guide.
FIFO DC instantiation
For more embedded functions such as clock dividers, counters, all sorts of primitives, see the FPGA Libraries Reference Guide, link was given previously.
When using the Synopsis Synplify synthesizer engine, Synplify Pro for Lattice tool can be used. This tool provides a detailed graphical view or the design, which can help a lot to identify connection mistakes or complicated combinational logic networks.
Synopsys Synplify Pro module hierarchy view
Diamond contains a feature called Reveal Analyzer that allows to embed a logic analyzer into the design and inspect specific signals in real operation. This can be accessed by the Tools->Reveal Inserter option. On the Trace Signal Setup panel, the sampled signals can be selected with buffer depth and sample clock. In order to buffer signals samples, free EBRs can be used. The Trigger Signal Setup is used to specify trigger conditions that start the sampling of the selected signals. Fortunately, even manual triggering is also available that can come handy.
Reveal Inserter signal setup
Reveal Analyzer uses the same JTAG connection to communicate with the FPGA that is used for programming. The Analyzer actually modifies and creates new HDL modules that encapsulate our topmodule and connects the sampled internal signals to the predefined modules of Reveal that are responsible for the sampling and communication. Make sure to turn off Reveal when simulating the project.
Reveal Inserter trigger setup
Reveal Analyzer trigger configuration
Reveal Analyzer captured sample view
Previously, until Diamond 3.11 Aldec Active HDL tool was used for HDL simulation. As of 3.12, Mentor ModelSim is included with the IDE.
The design needs a testbench HDL file for simulation. Put the testbench into the input files folder in the project view, click right and select “Include for…->Simulation”. This way the testbench will be the topmodule for simulation, however it will not be evaluated for real implementations. In order to simulate the design, select Tools->Simulation Wizard. Follow the steps, ModelSim opens. Diamond should prepare the libraries needed for specific devices, and the simulation descriptor will be put into the Script files folder in the project view. Double clicking this simulation file allows to modify some parameters and it opens ModelSim again. Make sure the project compiles in ModelSim, set up simulation parameters, set up the signals you want to inspect and let it run.
Useful Links:
The sample blinky project uses the LEDs to create different lighting effects depending on the state of the switches. The project loaded and “compiled” without any problems.
The topmodule (XO3L_SK_blink.v) selects between internal or external clock source depending on the state of SW3 switch. The change is not easily visible, as the internal is 12.09MHz, the external is 12MHz. However, settings the internal clock frequency to 2.08MHz, the change is clearly visible.
Depending on the state of the other 3 switches, different LED patterns alternate as the heartbeat module provides slow clock signals, or the “Knight Rider” effect is used, that is implemented in kitcar.v. The heartbeat module contains a simple counter, where the output is flipped when the counter reaches the half of the selected period time. Kitcar.v is also based on a counter, but it only uses 14 different values. For each counter value, a different pattern is output, providing signals for all 8 LEDs.
By switching the synthesis engine to Synplify Pro, the GUI also managed to show the hierarchy and connection of the modules. Reveal analyzer also worked, however as the clock frequency is very high compared to the changes on the output, mostly the counters can be analyzed.
Simulation files were not present in the demo project, however by implementing a simple testbench Verilog file, simulation started to work. Interestingly ModelSim could not compile the demo project, the wire heartbeat was missing, I suppose it was implicitly defined, but the there was a module with the same name leading to some confusion. Adding the wire definition for heartbeat solved the issue.
ModelSim simulation
The manufacturer provides lots of examples for these devices, it is worth looking into them before trying on our own. Links to these resources were given earlier. There are also MIPI (CSI-2, DSI) examples that are personally very interesting for the me.
The Propel system consists of Propel Builder and Propel SDK. I fired up Propel Builder (this also required a free license) and created a template RISC-V MC project. There is also a simpler RISC-V SM processor available. By selecting the IP Catalog, IP on Server allows to download more IPs, however altogether there is not a lot to choose from. I suppose more can be installed from file.
Project from template
Proper Builder main view
I changed the system clock frequency for the UART module to 38MHz from 50MHz, and changed the size of the system memory to 2048 address depth, pressed Design-Generate. By selecting Design-Run Diamond, Lattice Diamond shows up with a preconfigured project. The topmodule contains all the global IOs: reset input, 8 bidirectional GPIO pins, UART RX and TX. By default, it also instantiates the internal oscillator to 38MHz. I assigned the proper pins for the starter kit, also connecting UART RX and TX.
Project generated by Propel for Diamond
Selecting Design-Run Propel opens Lattice Propel, a modified Eclipse IDE environment for C/C++ programming. First a workspace has to be selected, then the system descriptor sys_env.xml. This should be automatically filled in by Propel Builder. Next a project name must be given and I selected Lattice C/C++ as Project type, using RISC_V Cross GCC, finally selected the default newlib-nano.
The sys_env.xml file shows up in the src folder for the project. Opening it shows an overview of the components and the memory layout. Updating settings in Propel Builder and pressing right click on sys_env and selecting Update Lattice C/C++ project should be able to update the settings however this did not work well in my case. Looking at the contents of the xml did show the updated parameters.
Eclipse-based IDE for C/C++ development with board support packages
The sample code compiled without problems and generated a .mem file. I went back to propel builder and set this .mem file as hex contents for the system memory, pressed Generate. Then I changed back to Diamond, synthetized the project, flashed and it was working right away. The chase light effect showed on the LEDs and the welcome message also appeared on the serial console. Great! There is OpenOCD based debugging capability as well, however I have not tested this.
UART welcome message received from the FPGA, sent by the soft RISC-V processor
The board, all the free tools provided a very good impression, I’ll consider using the products from Lattice and consider these small FPGAs for my projects and I’ll also recommend them to others. As mentioned before, the schematics of the breakout board in the datasheet is so low quality, unreadable, installing an OrCAD viewer, opening and printing the schematics fixed the issue. I haven’t designed BGA boards so far, so for low budget designs I can only consider the MachXO2 family instead of the XO3LF devices. As the number of LUTs is not very large, using embedded processors would result in little free resources for custom logic, but especially little RAM. But still, the possibility is great, to have a C/C++ compiler toolchain available for a simple embedded processor. Nice to have options, such as the Mico processors and RISC-V as well built in. Having RISC-V out of the box seems very up to date and forward looking. Propel IDE was a very positive surprise, I didn’t expect such a tool available for a low end device.
During the time I got to know (started to get to know the device), I encountered problems I first assumed it was the software’s or the device’s fault, but as usual, these problems turned out to be user errors and sometimes not reading the manuals carefully enough (UART RX, TX resistors were not populated between the FPGA and the FTDI chip). Sometimes though Diamond or Propel IDE went crazy, seemed like I couldn’t get spreadsheet working well or Propel configuration could not update. During this time I didn’t do any specific application development, I was only getting used to the tools. I also didn’t get to the point of custom Propel IP creation, which would be interesting to test.
I wish there were FPGAs with less IOs (in TQFP packaging), but with more memory and larger number of LUTs or slices to accommodate a somewhat powerful processor and more internal memory (I know there are way more powerful FPGAs, even with real embedded Cortex microprocessors). As there is so little memory embedded, external memory must be interfaced in most applications. External memories require lots of IO, DRAM controllers require control logic. Why not integrate some DRAM into the chip? Just wishing… I haven't dealt with more advanced FPGAs, it would be interesting to get to know them as well.