I've been studying Error Correcting Code (ECC) capability lately, specifically as it relates to Xilinx.
What is ECC?
Definition: Error correction code (ECC) checks read or transmitted data for errors and corrects them as soon as they are found. ECC is similar to parity checking except that it corrects errors immediately upon detection. ECC is becoming more common in the field of data storage and network transmission hardware, especially with the increase of data rates and corresponding errors. (https://www.techopedia.com/definition/24161/error-correction-code--ecc)
If you're not familiar with ECC, you can get a very nice introduction here: https://youtu.be/X8jsijhllIA.
In the Xilinx arena, this applies to memory interfaces. Some Xilinx parts (like Zynq UltraScale+ MPSoC) have the built-in capability to detect and correct errors that happen during a memory readback. It turns out that some memory devices have this same capability built in.
Four classifications of ECC:
- No ECC -- not sure this is really a classification, but it's basically the absence of ECC. That means the system has no ability to either detect or correct bit errors while reading the memory array. You also don't have any processing or memory capacity overhead.
- Side-band ECC -- this is the type of ECC that most people have likely heard about. For every 8 bits of real data, you must have 1 extra bit of overhead memory. For example, in the Zynq UltraScale+ MPSoC Technical Reference Manual (UG1085), you can read about ECC support for the PS DDR Memory Controller in Chapter 17. You can see in Table 17-2 that the ZU+ Memory Controller supports ECC in 4 different memory types: DDR3, DDR3L, DDR4, and LPDDR4. In each case, you must have an extra 8 bits of data storage for the codes matching the 32 or 64 bits of actual data.
For this scenario, you have memory overhead storing the error correcting codes, and you also have the memory controller with a bit of overhead performing the ECC calculations. - In-line ECC -- This is something new that I learned about recently. Instead of increasing the memory bus width for the memory overhead, you allocate memory space somewhere else within the standard array to store the codes. Your controller obviously must be able to handle this, to extract both the data and the codes from different areas of the memory, and then perform the error detection and correction. Cadence gives a good overview of this ECC classification while also explaining sideband here: https://youtu.be/F2k6A1PgeHw. I'm not aware of a Xilinx device that can do this, although it certainly is possible if someone wanted to design the memory controller to handle it. Perhaps there is an IP company out there that has already taken this on. If you are aware of one, please comment below!
- On-chip ECC -- This is also something that I have learned about recently. Some LPDDR4 chips are now built with the ECC storage and processing unit directly in the LPDDR4 chip itself. For example, you can see ISSI's offering of LPDDR4 with ECC here: https://www.issi.com/US/product-dram-lpddr4.shtml#jump3. Micron also has these same type of devices, with a nice white paper that is linked from this page: https://www.micron.com/about/blog/2017/february/the-advantage-of-ecc-dram-in-smartphones. Micron asserts that the key advantages to the on-chip ECC is power savings and temperature tolerance. The ECC allows you to drastically reduce the refresh time, which affects power. Since refresh time parameters are adversely affected with higher temperature, having an on-chip ECC device allows you to operate at higher temperatures without as drastic a change in the refresh parameters.
You don't need an additional memory chip (as with sideband ECC), and you also don't burden your memory controller with the ECC calculations. Of course, this extra storage and processing are embedded inside the memory device. I'm researching now how this affects cost and performance compared to a non-ECC system.
Who within the E14 community is using ECC in your systems? Please comment below with your experience and what type you have used!