RoadTest: NXP I.MX RT1050 EV KIT and Display
Author: zjanosy
Creation date:
Evaluation Type: Development Boards & Tools
Did you receive all parts the manufacturer stated would be included in the package?: True
What other parts do you consider comparable to this product?: STMicro STM32F746 Discovery
What were the biggest problems encountered?: Lots of problems with the integrated debug interface; semihosting is hard to make working; no LCD demo app; SRAM configuration needs lot of reading and tweaking
Detailed Review:
I work as an embedded software engineer in the musical instrument industry. I am primarily interested in audio signal processing and digital sound synthesis.
I often face the problem that we need high processing power (in particular for digital audio signal processing algorithms), but our market is very price sensitive. Therefore I was very excited to learn about this new microcontroller family from NXP, which has all the features I need for audio/DSP product development.
The main features I’m interested in:
The i.MXRT has plenty of other communication interfaces as well, including CSI, CAN, Ethernet, SPI and UART. You can find more information (including data sheet, manuals and application notes) starting from here:
The only drawback is that it does not have any internal flash memory. The reason for omitting on-chip flash is that NOR flash memory requires a different process than the rest of the chip, therefore it would have been more expensive to integrate it on the same silicon wafer. Moreover, since flash memory is rather slow, it would not be possible to run at 600 MHz while executing from flash. Therefore NXP decided that they would integrate a larger SRAM (which is more expensive, but can use the same process, so it can be shrunk more), and for maximum performance let the processor execute from SRAM.
And there is plenty of SRAM – up to 512 kB in the RT1050. The i.MXRT has a very sophisticated “FlexRAM” controller, which allows configuring the on-chip SRAM in a very flexible way. Each 32 kB block of the memory can be dedicated to either ITC (Tightly Coupled Instruction) RAM, DTC (Tightly Coupled Data) RAM, or OC (On-Chip) RAM. For maximum performance code should execute from ITC memory. Frequently accessed data should be placed in DTC, while shared buffers should go into the OC. This separation of memories (and corresponding buses) makes it possible to run at 600 MHz (and more in the future) without wait-state penalties.
The i.MXRT can actually execute code from an external flash as well (this is called XIP – Execute In Place), but doing that causes a substantial performance hit, depending on the type of memory used. For maximum performance one should use either Quad SPI flash, or the new and shiny, but more expensive HyperFlash by Cypress. To speed up execution from external flash the processor has 32 kB instruction and data cache memories in addition to the general-purpose SRAM.
Even if the code fits entirely in the internal memory, the i.MXRT needs an external flash to boot from, which increases the total cost slightly. For designs that can run entirely from SRAM it is possible to boot from a cheap SPI flash costing only $0.2. If running from flash is planned, then a 1 MB Quad SPI flash adds about $1, while a 16 MB HyperFlash adds $4 to the total.
Although the i.MXRT also supports NAND flash, it does not have a hardware error-correction unit, therefore it is not really useful.
At the time of writing the review only the full featured MIMXRT1052, and the slightly reduced feature (with no LCD/CSI/PXP) MIMXRT1051 variants were available. The NXP website already contains preliminary info about the RT1020 and RT1060 families.
The RT1060 is a high-performance version adding High-Speed GPIO, high-speed CAN-FD, a synchronous parallel NAND/NOR/PSRAM controller and an additional 10/100 Ethernet port. It will be available with even larger integrated SRAM – probably for a higher price.
The RT1020 is a more reduced-feature variant with a slightly lower maximum clock (500 MHz), less SRAM, only a single USB controller, no graphics accelerator and camera interface, and reduced I/O. Still, this seems to be the more interesting one, because it will be very cheap, and will be available in LQFP package, which allows for cheaper PCB design. This would be a great choice for high-performance, low-cost devices.
The iMXRT1050EVK is a fairly nice board with lots of interfaces: there is a full Arduino-compatible interface (although not populated), a CSI compatible camera interface with a flat cable connector, an LCD/touchscreen interface with flat cable connector, two micro-USB connectors, a 100 Mbps Ethernet connector, a micro SD Card slot, an audio interface, S/PDIF (not populated on later revisions), CAN with on-board transceiver (only available on a header).
The board has 32 MB SDRAM, 64 MB Hyper Flash, and 8 MB Quad SPI flash on board (although the latter cannot be used without rather complicated hardware modification). (Update: the latest revision of the board makes is easier to switch to Quad SPI – it is not anymore necessary to desolder the HyperFlash.) There is a User LED and a User Button available for a (very) simple UI. As an extra a 3-axis linear accelerometer/3-axis magnetometer chip is included for simple IoT projects.
To me one of the more interesting feature is however, that the board includes a better-than-average stereo audio codec, a Wolfson (now Cirrus Logic) WM8960:
https://www.cirrus.com/products/wm8960/
The stereo output of the codec is available on a 3.5mm jack. One of the inputs is connected to an on-board microphone, while the other input is connected to the jack, too. The board is prepared for an S/PDIF interface as well, but the transformers and the connectors are not populated on later revisions of the eval board.
The EVK has an integrated CMSIS-DAP compatible OpenSDA debug port as well. This is rather slow, but OK for basic debugging. The board has a JTAG/SWD connector for external debug probes. The board can be used with just about any ARM Cortex-M7 compatible debug probes, like the Segger J-Link, the Keil ULINK, or the cheap NXP LPC-Link2. (Segger also provides a J-Link compatible firmware for both the LPC-Link2 and the on-board OpenSDA debug interface, though this is restricted to be used with the eval board only – i.e., should not be used with custom boards.) One caveat is, however, that the UART console cannot be used with an external debug probe.
More information about the board can be found here:
My primary goal is to evaluate the DSP capabilities of the processor, comparing different audio filter implementations (fixed point, floating point, running from different memories). I am also planning to do some power consumption measurements, and I will check the performance of the on-board audio codec as well.
Time permitting I would like to look into the USB stack implementation, and I plan to measure the SD controller performance as well. On the other hand I do not plan doing any graphics-related or networking performance tests – although the microcontroller is very capable in these areas as well.
The board arrived in a well-stuffed box. I don’t want to go into the details of unboxing, hopefully others will do that. Actually I’ve found an excellent introduction to the board by Erich Styger at:
https://mcuoneclipse.com/2017/12/16/mcuxpresso-ide-v10-1-0-with-i-mx-rt1052-crossover-processor/
My board has two labels: SCH-29538 REV A2 and 700-29538 REV X4. Although this is not the most recent revision, and uses an engineering sample of the i.MXRT processor, I was glad to find out that it has the two S/PDIF RCA connectors and the transformers already mounted, which seem to be absent from the later revisions. May come handy.
Update: meanwhile a newer version of the board has appeared, the IMXRT1050-EVKB, which has the A1 revision of the chip.
The board came with a couple of 0.1” stacking headers for the Arduino-compatible interface, and a micro-USB cable. Element14 was generous enough to include the optional RK043FN02H-CT LCD panel as well, which normally must be purchased separately.
After unboxing, my first surprise was that the included LCD panel did not have any means for mounting onto the board. It has two flex cables, one for the LCD interface, and another for the touch screen, but no frame or mounting holes.
It was also surprisingly difficult to attach the flex cables to the connectors. Finally I managed it, and fixed the display on the board using two layers of double sided mounting tape (a single layer wasn’t enough because of the thickness of the connectors), hoping that I have attached the flex cables well. Otherwise I’ll be in trouble if I’d need to remove the panel.
Since the LCD panel is mounted on the back of the board, I have also reversed the nylon standoffs to be able to use them as legs.
Note: If you need the Arduino headers, you should solder them to the board before you mount the LCD panel. However, the stacking headers cannot be mounted on the board together with the LCD, so you’d either need to cut the legs, use connectors with shorter legs, or mount the connectors so that the legs do not interfere with the LCD.
Btw., I’ve found the datasheet of the display here:
https://www.nxp.com/docs/en/supporting-information/RK043FN02H-CT.pdf?fsrch=1&sr=1&pageNum=1
I’m not sure I’ll need it, but anyway, it is good to have it.
I have used many different microcontrollers and development systems in the past, however, I’ve had no previous experience with either Freescale/NXP processors or MCUXpresso. Thus this is my first venture into the NXP ecosystem. Just for the record, I have installed the software on a Windows 7 Professional x64 system.
For the software installation I followed Erich’s tutorials at MCU on Eclipse:
Note: You’ll need to have an active account at NXP to be able to download the MCUXpresso IDE (interestingly I could download the SDK with my deactivated account as well). If the site says that “You have been redirected to this page because your ID is INACTIVE in Flexera. Please Activate your account to proceed.”, then you’ll need to contact the tech support to have your account re-activated.
I had a little trouble with the overly cautious Bitdefender virus protection, as it blocked some executables. After fixing this I was able to start the IDE. Windows Firewall also tried to block something, but after giving permissions finally everything seemed to start up.
The SDK can be custom-built, including only the features and toolchains that are needed. The SDK version at the time of writing was v2.3.0.
Update: Meanwhile v2.3.1 is available.
I have added just about everything to the SDK that was available, just in case. The SDK can be installed simply by dragging-and-dropping the downloaded zip file onto the “Installed SDKs” area in the MCUXpresso IDE. Alternatively it can be extracted to anywhere on the harddrive, and then the folder can be drag-and-dropped. This has the advantage that the SDK files will be referenced instead of being copied.
I've connected the board using the supplied USB cable. Windows started installing USB drivers. After completing the installation I noticed that the mbed Serial port driver was not found. Not a big problem, you can download the Windows driver from here:
https://os.mbed.com/docs/latest/tutorials/windows-serial-driver.html
Next I’ve imported all the demo apps by clicking on “Import SDK example(s)…” in the QuickStart Panel. Then I selected the “hello_world” application, and clicked on the Debug… QuickStart item. After giving some more permissions to Windows Firewall I got to the “Probes discovered” dialog. The DAPLink CMSIS-DAP probe was found, however, when I clicked on OK, all I got was a progress bar, doing nothing. After a while I got an error message “Error in final launch sequence”. Clicking on details showed a “Read timed out” error.
After a couple of retries (and reboots) I gave up, and started to look for a solution in the NXP forum.
I’ve found a document “Overview of using the MIMXRT1050-EVK with MCUXpresso IDE”:
This document has a number of essential tips for creating new projects and troubleshooting. I recommend to read it thoroughly.
Following the document I’ve updated the DAPLink firmware. This did not solve the problem. Then I tried to mass erase the HyperFlash. In the flash programmer I got another error saying “Could not connect to core”. Switched to external 5 V power. Same…
After a couple of hours of googling around in vain I finally got a briliant idea: plugged the board into a different USB port. Voilà! It worked! So apparently the OpenSDA port (or the DAPLink firmware) had some incompatibility with my VIA VL805-based USB 3.0 card. Strange, because I have used this port with many other devices, and they worked flawlessly.
Anyway, after plugging the board into one of the motherboard’s USB ports everything seemed to work well. I was able to run the “Hello World” application, displaying the printf output right in an MCUXpresso window, using the built-in semihosting feature.
Update: Later, when I was experimenting with running from ITC SRAM I ran into a similar (or the same) problem of “Error in final launch sequence”. In this case the solution was to mass erase the flash. It is important to do it in the right order: power down the board, flip boot mode switch (SW7), power up the board, open the “LinkServer GUI Flash programmer” utility from the toolbar, select Erase flash memory/Mass erase, press OK, then power down, flip boot mode switch back, and power up. Any other sequence results in errors.
Next I tried the SAI (Synchronous Audio Interface) demo application, which features the on-board audio codec and the SD Card interface. For some reason semihosting did not work here correctly: I got lots of garbled text, and the keyboard input wasn’t usable either. Therefore I switched to the UART console by clicking on “Start here/Quick Settings/SDK Debug Console/UART Console”.
Update: I also had severe problems with semihosting when experimenting with running from ITC SRAM. Most of the time the debugger would not start, or did not stop at “main”. The solution was again to switch to the UART console.
You need a UART terminal to connect to the virtual COM port installed by mBed. On Windows TeraTerm and PuTTY are both free and work well. I used PuTTY:
As described in the demo example’s “readme.txt” file, you need to configure the serial port for 115200 baud, 8-bits, no parity, one stop bit, no flow control.
For the SAI demo you’ll also need a micro SD Card.
First I had a problem with my high-performance SanDisk Extreme SDHC UHS-3 32GB card. The demo informed me making the directory failed, even though the card was empty. Then I tried with a cheapo Apacer 16GB card, and it worked immediately. Finally re-formatting the SanDisk card using the SD Card Formatter utility solved the problem, so maybe the file system on the SD Card was corrupted.
I also ran into a problem that the Debug session could not be restarted after terminating it without power cycling the board, or disconnecting the OpenSDA port. When I tried to start a new Debug session, I got an error:
…
Sending VECTRESET to run flash driver
target failed to halt after flash driver reset
core registers failed - rc Ee(FF). Redlink interface error 255.
chip initialization failed - Ee(FF). Redlink interface error 255.
failed to find a target memory area to use to test the Debug Access Port
required information about vendor NXP chip MIMXRT1052xxxxx not found
Failed on chip setup: Ec(01). Invalid part, XML, or configuration.
error closing down debug session - Ee(FF). Redlink interface error 255.
I think it may be a firmware bug, but I could not find a solution even though others seemed to have similar problems.
I also tried installing the Segger J-Link OpenSDA v2.1 firmware, which can be downloaded directly from Segger:
https://www.segger.com/products/debug-probes/j-link/models/other-j-links/opensda-sda-v2/
Unfortunately the i.MXRT processor is not yet officially supported in the latest non-beta (v6.30k) release of the J-Link software package, but replacing the vendor name “NXP” with “Freescale” in the file “JlinkDevices.xml” (found in the Segger installation directory) did the trick.
The Segger firmware worked quite well. It is much faster than the OpenSDA firmware. For a comparison see:
https://mcuoneclipse.com/2014/04/27/segger-j-link-firmware-for-opensdav2/
However, when I tried to Restart (Reset) the processor, the debugger lost the context, so I had to Terminate the session and start the Debug session again (though unlike the DAPLink, I could restart it any number of times).
Another problem is that this firmware does not yet support HyperFlash programming, which I will need to measure performance penalty when running from flash. So for now I have switched back to the official DAPLink v0244 firmware.
Update: since then v6.32 of the J-Link software package is available with support for HyperFlash programming on the i.MXRT1052. However, Segger still has no official release for the OpenSDA firmware for the i.MXRT1050-EVK, so this may only work with an external debug probe, like a J-Link.
Next I wanted to try the LCD. However, even though there is an “emwin_gui_demo” example in the current SDK version (v2.3.0), it cannot be imported into MCUXpresso. By searching in the NXP Community forum I found out that this is a known problem, and hopefully the example will be available in the next release of the SDK. I may get back to this later.
Update: SDK v2.3.1 has been meanwhile released with the emwin demo enabled, but I did not try it yet.
I have purchased an LPC-Link2 in the hope that it will eliminate the problems with the on-board debug interface. Well, it did not… Although it is faster for downloading programs, it seems to suffer from the same problems as the on-board debug interface. Moreover, it cannot be used with the UART Console, therefore only semihosting can be used, which, unfortunately, did not work for me. So at this point there is not much advantage of having the LPC-Link2, unless these problems will be fixed in later revisions of the board or the OpenSDA firmware.
My other hope was that Segger actually has a J-Link firmware for the LPC-Link2, converting it into a cheap J-Link (with restrictions though):
https://www.segger.com/downloads/jlink/#LPC-Link2
However, this firmware is NOT compatible with the i.MXRT. In fact it says it does not support Cortex-M7 at all. Bummer…
Note: if you plan to purchase the LPC-Link2, you’ll also need to purchase a 10-pin to 20-pin ARM Cortex adapter, e.g., the Embedded Artists EA-ACC-040 ($18):
https://www.embeddedartists.com/products/acc/acc_jtag_adapter_kit.php
or the Segger 9-pin Cortex-M Adapter ($31):
https://www.segger.com/products/debug-probes/j-link/accessories/adapters/9-pin-cortex-m-adapter/
Since my main goal was to compare the i.MXRT with other Cortex-M processors, I tried to find some existing DSP benchmark results. STMicroelectronics has a huge range of Cortex-M core processors, and fortunately they did benchmark their DSP performance. They have published an excellent Application Note, “AN4841: Digital signal processing for STM32 microcontrollers using CMSIS”:
ST made the source code also available for their STM32F746-DISCO (Cortex-M7) and STM32F429I-DISCO (Cortex-M4) eval boards. Although the code is restricted for use with ST microcontrollers, the benchmarks use the standard ARM CMSIS-DSP library to do the actual signal processing, thus I could implement the same tests on the i.MXRT.
The application note presents benchmark results for both the STM32F429 and the STM32F746 processor, measuring the execution time of various FFT and FIR implementations. FFT variations include calculation of 64, 256 and 1024 point FFT using Q15, Q31 (16 and 32-bit fixpoint) and F32 (single-precision floating point) input. FIR filtering benchmarks include measuring the execution time FIR filter using F32, Q31 and Q15 input and coefficients. In addition to these results the application note summarizes FFT benchmark results for a range of ST microcontrollers, based on Cortex-M0, M0+, M3, M4 and M7 cores.
In general, running from flash memory is expected to perform subpar, because of the slower bus and the additional wait states needed to access the flash memory. The integrated cache may help, but certainly only if the cache utilization is near to 100%. So I have expected that running from SRAM would greatly enhance the performance.
The i.MXRT includes a very sophisticated FlexRAM memory controller, which allows configuring each individual 32kB block of the internal SRAM either as Tightly Coupled Instruction (ITC) RAM, Tightly Coupled Data (DTC) RAM, or plain On-Chip (OC) RAM. Since the ITC and DTC memories are connected via dedicated, high bandwidth buses to the core, they can simultaneously access different parts of the memory. The DTC memory bus can even read data from two different blocks at once. This allows allocating the data buffers and coefficients optimally, so that the wait states are minimized.
The OC memory is much slower, therefore in general it is not recommended to run high-performance code from it. However, since it is also cached, the actual performance may not suffer too much – depending on the algorithm of course. On the other hand shared buffers (e.g., DMA) should be located in the OC RAM.
As we will see, the memory configuration did have a strong influence on the performance.
To start with, I have created a new empty project following the steps outlined here:
https://community.nxp.com/docs/DOC-334497
I have added the driver modules gpio, flexram and xip (just in case), and named the project cmsis-dsp-benchmark. I let MCUXpresso copy the source files to the project. (Although linking to the SDK files should be also possible, for some reason this did not work for me.)
By default MCUXpresso creates a project executing from flash (XIP). It would be interesting to see how XIP and running from ITC SRAM compares, so I will need to figure out how to link to SRAM.
The CMSIS-DSP library contains a number of digital signal processing primitives optimized for various ARM Cortex cores. The library itself is available in source code format in the “CMSIS\DSP_lib” directory of the i.MXRT SDK. The actual version I have is v1.5.1.
For those who are interested in the evolution of the CMSIS library, the complete source code is also available at GitHub:
https://github.com/ARM-software/CMSIS_5
This repository has a slightly different folder structure than the one included in the SDK, but the files are more-or-less the same.
The documentation of the library is available at:
http://www.keil.com/pack/doc/CMSIS/DSP/html/index.html
To use the DSP library we need to add the precompiled library to the project. I followed the steps in here:
https://mcuoneclipse.com/2013/02/14/tutorial-using-the-arm-cmsis-library/
First – for convenience – define a new build variable in C/C++ Build/Build Variables:
Name=CMSIS_LOC
Type=Directory
Value=c:\nxp\SDK_2.3.0_EVK-MIMXRT1050\CMSIS
Then add the CMSIS include path on the MCU C Compiler/Includes page:
"${CMSIS_LOC}\Include"
Add the appropriate CMSIS-DSP library and the library path on the MCU Linker/Libraries page:
Libraries=arm_cortexM7lfsp_math
Library search path="${CMSIS_LOC}\Lib\GCC"
The exact library depends on the floating point type used. The above library uses single precision floats.
Note: You need to remove the “lib” and “.a” from the library file name, and use plain (not curly) quotes around the path names.
Finally, define ARM_MATH_CM7 in the MCU C Compiler/Preprocessor, and include the file “arm-math.h” somewhere in your source.
NXP provides a similar tutorial, however, it did not work for me (the linker could not find the library):
https://community.nxp.com/docs/DOC-335465
Update: It turned out that the missing quotes were the culprit.
STM32 microcontrollers have a dedicated System Timer, which is a free-running 24-bit counter running at the system clock rate. The ST application note uses this timer to measure execution clock cycles. The i.MXRT does not have a similar dedicated timer, but we can use the Data Watchpoint and Trace unit instead:
https://mcuoneclipse.com/2017/01/30/cycle-counting-on-arm-cortex-m-with-dwt/
Define the following macros:
#define InitCycleCounter() \
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk
/*!< TRCENA: Enable trace and debug block DEMCR (Debug Exception and Monitor Control Register */
#define ResetCycleCounter() \
DWT->CYCCNT = 0
/*!< Reset cycle counter */
#define EnableCycleCounter() \
DWT->CTRL |= DWT_CTRL_CYCEVTENA_Msk
/*!< Enable cycle counter */
#define DisableCycleCounter() \
DWT->CTRL &= ~DWT_CTRL_CYCEVTENA_Msk
/*!< Disable cycle counter */
#define GetCycleCounter() \
DWT->CYCCNT
/*!< Read cycle counter register */
/* Systick Start */
#define TimerCount_Start() do{\
InitCycleCounter(); \
ResetCycleCounter(); \
EnableCycleCounter(); \
}while(0)
/* Systick Stop and retrieve CPU Clocks count */
#define TimerCount_Stop(Value) do {\
DisableCycleCounter(); \
Value = GetCycleCounter(); \
}while(0)
Then use TimerCount_Start() and TimerCount_Stop(cycles) to measure the actual clock cycles.
To measure the actual duration I used a GPIO pin and a logic analyzer (you can also use a scope, but I prefer a logic analyzer as it is more versatile). Since I have not soldered the Arduino headers before mounting the LCD, and there are no other pins available on the board, I decided to use the User LED. Fortunately (?) this shares a pin with the JTAG interface, so I could access the output on pin 5 of J21.
To make it work you need to configure the pin for output. MCUXpresso has a built-in tool for this, which makes this much easier. From the schematics I’ve found out that the User LED is connected to the F14 pin of the chip. The correct settings are:
Route to: GPIO_AD_B0_09
Direction: Output
Software Input On: Disabled
Hysteresis enable: Disable
Pull Up/Down Config: 100K Ohm Pull Down
Pull/Keeper select: Keeper
Pull/Keeper enable: Enabled
Open drain: Disabled
Speed: max(200MHz)
Drive strength: R0/6
Slew rate: Fast
First I made a mistake, and configured the pin as Open drain, which is usual with LEDs. However, this did not work. The symptom was that the measured times were way off, and short pulses did not appear at all in the logic analyzer. Apparently the pin could not drive the pin to the correct level.
To run from SRAM we need additional steps. By default the program runs from external flash. To make it run from SRAM we need to modify the linker scripts. This is rather well explained in the MCUXpresso User Manual.
First, we need to disable linking to flash memory. This can be done either by removing the BOARD_FLASH segment in the memory map in “C/C++ Build/MCU settings”, or simply by checking the “Link application to RAM” checkbox on the “C/C++ Build/Settings/MCU Linker/Managed Linker Script” page. We also need to remove the XIP_EXTERNAL_FLASH define.
Second, we need to reorder the RAM sections. By default all the code and data is put into the first available memory segment. By reorganizing the segments we can direct all the output to either the SRAM_OC, SRAM_DTC or SRAM ITC.
Putting code in SRAM_ITC and data in SRAM_DTC is more tricky. To do this we need to use the managed linker script feature of MCUXpresso. This is basically a script language to build the linker control file. The default scripts can be extended or overwritten by project settings. The easiest way to do this is to create a “linkscripts” directory under the project’s root, and create .ldt scripts that specify link options, as it is explained in the MCUXpresso User Manual.
To put all code into SRAM_ITC, and all data into SRAM_DTC we need to add the following scripts:
data.ldt:
<#if memory.alias=="RAM2"> /* SRAM_DTC */
*(.data*)
*(.rodata .rodata.* .constdata .constdata.*)
. = ALIGN(${text_align});
</#if>
<#if memory.alias=="RAM"> /* SRAM_ITC */
*(.text*)
. = ALIGN(${text_align});
</#if>
*(.data.$${memory.alias}*)
*(.data.$${memory.name}*)
and create an empty "main_text.ldt" and "main.rodata.ldt" file.
I have run each benchmark 100 times in a for loop, and printed the average of the execution cycles on the console. At the same time I have measured the total time by toggling the GPIO pin. To verify the results I’ve converted the cycles to microseconds assuming a 600 MHz clock.
I could have simply calculated the execution time from the cycle count, but measuring the actual time with an external time reference is more accurate, not to mention that it can be used to verify the actual clock rate.
I measured the timings for 4 different memory configurations:
FFT benchmark
The original application note describes the setup of the various tests in great detail. I have only implemented the 1024 point FFT, because this is the most demanding one.
The results are somewhat surprising. Certainly the higher clock rate results in a much shorter execution time. It can be also seen from the graphs that the number of cycles depend heavily on the memory configuration. The surprising result is that in most cases the cycle count of the i.MXRT is significantly lower than the results of the STM32F7. I have not investigated the reason for this. One explanation may be that the CMSIS-DSP version I used is different (better optimized), or the i.MXRT uses the memory bandwidth better than the STM32F7.
The most interesting result is that running from the external HyperFlash (certainly with the caches enabled) performs only slightly worse than running from SRAM. This is probably because the cache is large enough to load all of the code and data at once. Optimizing for size allows more code fit into the cache, therefore – also somewhat surprisingly – performance is better than with optimizing for speed.
It should be noted that running from flash gives a much more variation in the cycle count than running from SRAM, thus for deterministic operation (which is often a requirement for DSP tasks) running from SRAM is preferred.
Another observation is that running from ITC memory did not improve the performance (in some cases it is even slightly worse than running from DTC). This is something unexpected, therefore I think this result may be an error in my configuration. Anyway, it should be more carefully checked.
FIR benchmark
The ST FIR benchmark results were a bit strange at first, because the performance of the Q15 filter seemed to be much worse than the other two – higher precision – variants. I would expect that even if the processor has hardware floating point unit, a SIMD-optimized Q15 filter should perform much better.
It turned out that the floating point and Q31 FIR filter benchmarks used the same 29-tap filter, but the Q15 benchmark used a much longer, 56-tap filter. When the results are normalized to the tap count, the performance of the Q15 filter gets about the same as the Q31 and F32 filter. This is still worse than expected, so this hints to a suboptimal memory allocation.
As you can see, in this test there is almost no difference in the performance between running from HyperFlash, or running from the tightly-coupled instruction memory. Again, this is probably because the whole code can fit into the cache.
Somewhat surprisingly the Q15 filter performance has much less improved than expected. This is something that need to be checked.
The i.MXRT is a very capable processor. Currently it has the highest clock frequency of the ARM Cortex-M7 chips available on the market, beating other vendors by a factor of 1.5-2.5. The large and fast on-chip SRAM allows executing code at the maximum speed. Memory configuration is very flexible, allowing to tailor the SRAM to different use cases. Execution from external flash is also possible with a performance penalty, although the integrated cache helps a lot accelerating performance critical parts of the code.
The i.MXRT also has a lot of high-performance peripherals integrated. Having two High-Speed USB controllers with two integrated HS PHYs is unique in this category. Although the Ethernet controller is only 100 Mbps, it supports IEEE-1588, which is a very useful feature for clock synchronization over networks (and may be used to implement AVB transport). The processor has two fast HS200 capable SD Card controllers as well, which are also compatible with eMMC v4.5.
The evaluation board gives access to all peripherals. Additionally it has an Arduino-style expansion header with analog, GPIO, I2C, SPI and UART connections. The board includes a decent quality stereo audio codec as well. All-in-all, it is a very nice board, allowing to evaluate the i.MXRT processor.
The weak point of the board is the debug interface. Maybe this is only a software problem which will be fixed in later revisions, but currently debugging is a severe pain. Semihosting works sometimes, but if not, then it is hard to figure out what the problem is.
The cheap LPC-Link2 debug probe does not improve the debugging experience considerably. Currently it only works in CMSIS-DAP mode. The Segger J-Link firmware does not support the Cortex-M7.
The included LCD panel was not supported in the v2.3.0 release of the SDK, so I could not even try it.
The i.MXRT is definitely worth evaluating, as it has a tremendous potential for low-cost, high-performance devices. The eval board in its current version suffers from problems with the debug interface. The MCUXpresso IDE and SDK are quite mature, high-quality products, available for free. This makes the processor even more interesting for low-budget projects, even though the chip is currently only available in a not very DIY-friendly BGA package. Future versions will be available in a low-cost LQFP package.
NXP’s i.MXRT starting page:
i.MXRT1050EVK documentation:
Good introduction to the board by Erich Styger:
https://mcuoneclipse.com/2017/12/16/mcuxpresso-ide-v10-1-0-with-i-mx-rt1052-crossover-processor/
Overview of using the MIMXRT1050-EVK with MCUXpresso IDE (highly recommended)
Hands-On Workshop: MCUXpresso Software and Tools
https://community.nxp.com/docs/DOC-334498
Hands-On Workshop: Using MCUXpresso SDK
https://community.nxp.com/docs/DOC-334497
I would like to thank Element14 for making the board available to me for RoadTesting.
Top Comments
Hi Zoltan zjanosy
A great and very interesting Roadtest report, thank you. You have included points that I wasn't aware of and took a direction I never looked at. I think I will look at my board again in…
Hi Rod,
Thank you for your comment!
Indeed, this board was a real challenge -- actually to make it work :-) I really wanted to like the i.MXRT, and I put a lot of effort into learning the details to get…
I know exactly how you must have felt and I look forward to reading any continuations in your testing.
I initially had grand ideas to use my as a smart central heating controller utilising the nice GUI…