Hercules Safety MCU Demo with Educational BoosterPack

24 May 2018

Learn about Safety features, test interactively and step through with a debugger.

This project gives you ready to run hardware with a screen menu and a joystick to navigate through Safety processes. You can trigger Safety mishaps in real time and see how the microcontroller handles them.

In this blog, errors are A Good Thing.

The Hercules family has a rich set of hardware mechanisms for safe operation.
These controllers are designed to be used in safety critical operations, such as automotive, medical and industrial applications.
There are parity checks on the memory, all peripherals are at least double. It even has two ARM cores that run the same instructions in lockstep and are compared continuously.

This design exercises some of those safety measures. The firmware performs integrity checks of CPU and memory. I also inject a few errors - such as deliberate memory parity mismatches and dual lock-stepped core errors - to see how you can handle such mishaps in firmware.

The Demo is all about errors. In most applications the goal is to prevent them. We will provoke them, and then look at the behaviour of our controller.

The blog comes with sample code, CCS project and HALCoGen config data. You can get all of that by checking the attachment at the end of this post.
I used an RM46 LaunchPadRM46 LaunchPad and an Educational BoosterPack MKIIEducational BoosterPack MKII.

The Demo

This project is inspired by - and borrows a lot of code from - TI's Hercules Safety MCU Demos.
That program is a PC-based interface where you can select a test or error condition and let the Hercules execute it.
And it comes with full source code for the firmware. That was a big learning source for me.

Most of the test and error injection code I use in my project is directly ported from those sources.
I replace the Windows-only user interface by hardware: the Educational BoosterPack.
It has all the gizmos I need:

a display to show options and results
a joystick to navigate and make selections
an RGB LED to show status of tests and error injections.

The demo shows a list of thins you can try out. The current version of the firmware covers integrity tests of MCU and memory.
It also injects inconsistencies in the Flash, SPI, CAN and Timer modules that the hardware has to catch.

To get started, mount the Educational BoosterPack on the LaunchPad,
put the BoosterPack's jumper J5 (just above the joystick) in the upper position,
load the firmware,

Go!

You can navigate the menu with the BoosterPack's joystick. Push up and down to scroll, move right to select an option.

Push any direction to return to the menu.

Menu items that have a green dot in front of them are integrity tests. Selecting them should show green on the BoosterPack's LED.
The ones with a red dot are deliberate errors. The firmware injects an inconsistency, and the controller should detect it.
The BoosterPack's LED will light red. But that's just a user interface thing.

The real proof is that the ERR led (next to the 4 blue buttons) on the LaunchPad lights up.

You can't fool that LED. It's connected to the controller's ESM output. If that one lights up, the error is detected and signaled.

You'll learn the most when you use Code Composer Studio to step trough the code.
You can see how the corruption is injected, and you can see the ERR led lighting up on your LaunchPad.
In the detailed description below, I include the best place to put a breakpoint to see what's happening.
It's not always possible to step trough the code though.

Some checks and errors (e.g.: the dual-controller lockstep error) are not detected when a debugger is attached.
That's not wrong, that's inherent to the type of incident.

Integrity Tests

The project performs two tests,

LBIST (CPU Logic Built in Self Test)
PBIST (Programmable Memory Built-in Self Test)

These tests are performed together during firmware startup, and the results are stored. When you select the menu options, the cached result is shown.

The approach to execute them at the beginning is in line with TI's advice:

PBIST is typically executed at start-up or shut-down because the tests are by nature destructive to memory contents.

If you want to change the project and run the tests on demand, check this forum post.

Navigation:
Menu option 1: CPU SelfTest (LBIST)
Menu option 2: Mem SelfTest (PBIST)

Code:

void pbist_test(void) { 
  pbistInit(0); 
  pbistStartSelfTest(); 
  pbistDisableSelfTest(); 
} 
 
void pbistInit(unsigned int RAM_GRP_SEL) { 
    /*Enable PBIST Memory Init MSI register*/ 
    systemREG1->MSINENA = 1U; 
     
    /*Enable PBIST self test*/ 
    systemREG1->MSTGCR = 0x0A; 
 
    /*Since ROM CLK = HCLK selected by default, MBIST will be reset for 16 VBUSclock cycles  */ 
    /*   N = 16 when HCLK:PBIST ROM clock is  1:1  -- N - VBUS Wait Cycles 
    N = 32 when HCLK:PBIST ROM clock is  1:2 
    N = 64 when HCLK:PBIST ROM clock is  1:4 
    N = 64 when HCLK:PBIST ROM clock is  1:8 
    */ 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
  asm(" nop"); 
 
    /* Enable Pbist Internal clocks and ROM I/F clocks*/ 
    pbistREG->PACT = 3U; 
 
    if(RAM_GRP_SEL == 0) 
    { 
  /*for 1,2 PORT and ROM MEMORIES*/ 
  pbistREG->ALGO = 0xFFFF; 
     
  /* RAM GROUP information */ 
  pbistREG->RINFOL = 0x3FFF; 
  pbistREG->RINFOU = 0x00000000; 
   
  /*selcting Run selft test with RAM overide*/ 
  pbistREG->PBISTOVERRRIDE = 0x01; 
    } 
    else // Individual RAM Test based on RAM group selected  
    { 
  /*for 1,2 PORT and ROM MEMORIES*/ 
  pbistREG->ALGO = RAM_GRP_ALGO_SEL[RAM_GRP_SEL]; 
     
  /* RAM GROUP information */ 
  pbistREG->RINFOL = (1 << (RAM_GRP_SEL -1)); 
  pbistREG->RINFOU = 0x00000000; 
   
  /*selcting Run selft test without RAM overide*/ 
  pbistREG->PBISTOVERRRIDE = 0x00; 
    } 
         
    /* Microcode loading from the OnChip ROM. Default is onchip ROM*/ 
    pbistREG->ROM =0x03; 
} 
 
void pbistStartSelfTest(void) { 
    /* DataLogger register with 0x14 to start the test*/ 
    pbistREG->DLR =0x14;  /*test will start after writing this vaule to the DLa register*/ 
 
    /* Wait till the MSTDONE Bit- PBIST Test Complete */ 
    while((systemREG1->MSTCGSTAT & 1) != 0x1); 
 
    _pBIST = true; 
} 
 
void pbistDisableSelfTest() { 
    /*Disable PBIST self test in Sys*/ 
    systemREG1->MSTGCR = 0x0A; 
 
    /*Disable PBIST internal ROM anc Clocks*/ 
    pbistREG->PACT=0x00; 
}

The results are stored in the _pBIST variable. If the test succeeded, the program shows a green LED when selecting the menu options.

Debug options:
Set a breakpoint in function pbistStartSelfTest() in pbist.c.
If you reach the line where _pBIST is set true, you know the test was ok.
Else your code will loop forever in the line before that one.

Error Injection

The menu items that have a red dot in front of them all inject an inconsistency in the controller.
It's then up to the controller's safety functionality to detect those errors and flag them to the ESM (error signaling module).
The Hercules controllers have a hardware pin that's toggled when an error is found.
On the LaunchPad, this pin is driving the ERR LED, next to the four blue buttons.

When this LED lights up, you can be assured that the error you injected has been caught.

What you do with that info in your projects is up to the safety requirements for your project.
In the demo program, I roll back the injected error immediately after it has been detected.
I reset the ESM register when you move the joystick in any direction.

In the event that a failure of any MCU component does occur, a system event is triggered through the MCU’s error signaling module (ESM).
This module provides a programmable error response based on error severity.
Potential responses include generation of a CPU interrupt, CPU non-maskable interrupt and/or error pin response.
The error pin response allows notification to an external observer that the MCU is in a faulty state.

Core Compare Error

This function makes the controller think that its two lock-stepped ARM cores are in a different state.

From the whitepaper Hercules Microcontrollers: Real-time MCUs for safety-critical products:

The lockstep CPU scheme implements a checker CPU, which is hardwired to be fed the same input as the functional CPU.
Two blocks of the same logic, fed the same input, should in theory produce the same output.
A core compare module monitors the outputs of the two CPU cores on a cycle-by-cycle basis and signals any errors to the system.
This near instant fault detection in the Hercules MCU’s comes with little penalty in power consumption and no impact to CPU performance.
Also, in comparison to other elements on the MCU, the size overhead of the lockstep mechanism is minimal.

Whenever logic is duplicated, there is always a concern of common mode failure.
To combat common mode failure, TI has implemented multiple best practices on the lockstep CPU subsystem.
Temporal diversity of the two CPU cores is implemented, such that the CPU cores operate 1.5 or 2 cycles out of phase in order to mitigate common mode failure in clocking.
A voltage guard ring is implemented around the CPU cores.
Physical design diversity is implemented by flipping and rotating the checker CPU with respect to the functional CPU.

Navigation:
Menu option 3: Core Compare Error

Code:

void CCM_R4_Compare(void) {

    // this only works if you don't have a debug session active. If you started this session via the debugger, the error status will not be triggered and ERR led on your LaunchPad will not light up.
    // the LED on the boostepack will light up, because that' firmware driven, not by the error trap itself.
    // if you started this session via de debugger, press the PORRST button to detach it and restart the firmware
    // you will loose your debug session.
    /* Setting the Error forcing mode   */
    CCMR4Reg->CCMKEYR = 0x00000009;
}

Debug options:
You can't debug the core compare error.
It's generated in function CCM_R4_Compare() in testCCMR4.c.
When you run the firmware with debug an active session, this compare error is not detected.

Flash ECC Error

The function detects a correctable single bit mismatch in flash memory. The controller detects and corrects the error and signals the error monitor.
TI has programmed a particular address in one-time-programmable memory with a single bit error specifically for this purpose, and we're reading that corrupt memory.
The hard-wired value on address 0xF00803F is 0x9ABCDEF1. That's a (deliberate!) corrupt value. The parity doesn't match the value.
When reading this address, the parity error is detected and reported, but at the same time, the corrected value 0x9ABCDEF0 is returned.
The program can continue with a correct value, but we know that there's an issue with memory and can flag this.

From the whitepaper Hercules Microcontrollers: Real-time MCUs for safety-critical products:

When writing to memory, data is encoded before it is put on the memory bus.
When reading, data integrity is checked once it has been retrieved by the CPU. In this way, the integrity of data is checked for the entire time it is not being stored;
any random faults that have occurred while data is on the memory bus or in memory will be detected.
Due to the tight coupling of the ECC control logic in the CPU, there is no overhead incurred for this safety mechanism
(i.e. zero impact on CPU performance or latency).

Navigation:
Menu option 4: Flash ECC Error

Code:

void ATCM_Correctable_Error(void) {
    unsigned int Single_Bit_Err_Loc;


    /** - 0xF00803F4 is TI OTP location which has corrupted data "0x9ABCDEF1" */
    unsigned int * Single_Bit_Err_Loc_ptr = (unsigned int *)(0xF00803F4);
    
    /** - Setup Flash ECC  
    *     - Error Detection / Correction Enable
    *     - Errors from OTP memory regions unblocked
    *     - Correctable Interrupt Enabled
    */
    flashWREG->FEDACCTRL1 = 0x20D;


    /** Enable Flash ECC in Cortex R4*/
    _coreEnableFlashEcc_();


    /** - Read data from the Single Bit Error location 
     *    - ESM error will be triggered after this read
     *    - Data will be corrected and stored in the variable Single_Bit_Err_Loc
     */
    Single_Bit_Err_Loc = * Single_Bit_Err_Loc_ptr;


    /** - Verify that corrected data is stored in the variable
     *  - Transmit back 0x0000000 to PC incase of failure
     */
    if(Single_Bit_Err_Loc != 0x9ABCDEF0)    {
//    sciSend_32bitdata(scilinREG, FAIL); // todod make this relevant to educational boosterpack
    }


    /** - Clear the Flash Single Bit error Flag */
    flashWREG->FEDACSTATUS = 2;


    /** - Disable Flash Error Detection / Correction */
    flashWREG->FEDACCTRL1 = 0x005;


    /** Disable Flash ECC in Cortex R4*/
    _coreDisableFlashEcc_();


}

Debug options:

Set your breakpoint in function ATCM_Correctable_Error() in flash.c.
When you step trough the code, you'll see that the red ERR light on the LaunchPad doesn't light up when you inject the single wrong bit.
It will light when you read from the corrupted location. You can also see that the memory single bit error is corrected, and that you get a correct value back.

NHET Parity Error

Hercules peripherals have error validation too. This function will introduce a parity error in the high-end timer memory area.

Navigation:
Menu option 5: NHET Parity Error

Code:

void NHET_Parity(void) {
    unsigned int temp;
    uint32 *RAMBASE = (uint32 *)hetRAM1;


    /** Enable NHET Parity */
    hetREG1->PCR = 0x0000000A;
    
    /** Fill NHET RAM with known Pattern
     *  The Parity Gets updated while filling the RAM */
    for (temp=0;temp<0x080;temp++)   {
    *RAMBASE = 0xA5A5A5A5;
    RAMBASE++;
    }


    /** Enable access to parity RAM */
    hetREG1->PCR |= 0x00000100;


    /** Corrupt the Parity RAM location to introduce a
     *  Single bit error */
//    RAMBASE = (uint32 *) (hetRAM1+0x2000) ;
    RAMBASE = (uint32 *)0xFF462000U; // todo check why the previous line doesn't compute.
    *RAMBASE =~(*RAMBASE);


    /** Disable access to parity RAM */
    hetREG1->PCR &= ~(0x00000100);


    /** Read NHET RAM location to create Parity Error */
    RAMBASE = (uint32 *)hetRAM1;
    temp = *RAMBASE;


    /* Undo the corruption */


    /** Enable access to parity RAM */
    hetREG1->PCR |= 0x00000100;


    *RAMBASE =~(*RAMBASE);


    /** Disable HTU RAM Parity */
    hetREG1->PCR = 0x00000005;


    hetInit();
}

Debug options:

Set your breakpoint in function NHET_Parity() in nhet.c.
Similar to the previous tests, parity errors are detected during a memory read operation.

MibSPI Parity Error

This is the same as the NHET test, except that we flip a bit in the SPI memory. I'm using MiBSPI module 1 because number 3 is used for the display.

Navigation:
Menu option 6: MibSPI Parity Error

Code:

void MIBSPI1_Parity(void) {
    unsigned short data[8]= {0x1234,0x2345,0x3456,0x4567,0x5678,0x6789,0x789A,0x89AB};
    unsigned char  *TXRamParity  = (unsigned char *)(mibspiRAM1) + 0x400;


    /** - Initialize MIBSPI Module */
    mibspiInit(); //todo: check how I can do this without loosing contact with the display
    
    /** - Initialize Data Buffer */
    mibspiSetData(mibspiREG1, 0, data);
    
    /** - Memory Map parity bits */ 
    ((mibspiBASECustom_t *)(mibspiREG1))->PTESTEN = 1;
    
    /** - Disable paritt error detection logic */
    ((mibspiBASECustom_t *)(mibspiREG1))->EDEN = 0x5;
    
    /** - Introduce Parity Error by flipping one bit in TXRAM parity */
    TXRamParity++;
    TXRamParity++;
    *TXRamParity = ~(*TXRamParity);
    
    /** - Enable paritt error detection logic */
    ((mibspiBASECustom_t *)(mibspiREG1))->EDEN = 0xA;
    
    /** - Remove Memory Map of parity bits */
    ((mibspiBASECustom_t *)(mibspiREG1))->PTESTEN = 0;


    /** - Trigger the transfer group0, since Parity is corrupted Parity 
     * error will be triggered */
    mibspiTransfer(mibspiREG1, 0);


    asm("   nop");
    asm("   nop");
    asm("   nop");


    /** Reset SPI once Test is complete */
    mibspiREG1->GCR0 = 0U;       
}

Debug options:

Set your breakpoint in function MIBSPI1_Parity() in parity_functions.c.
Similar to the previous tests, parity errors are detected during a memory read operation.

DCAN Parity Error

Again toggling a bit. This time in the memory that's assigned to the CAN1 module.

Navigation:
Menu option 7: DCAN Parity Error

Code:

void DCAN1_Parity(void) {
    unsigned int *mailbox;


    /** - Fill MailBox data with 0 */
    canREG1->IF1DATx[0] = 0;
    canREG1->IF1DATx[1] = 0;
    canREG1->IF1DATx[2] = 0;
    canREG1->IF1DATx[3] = 0;
    canREG1->IF1DATx[4] = 0;
    canREG1->IF1DATx[5] = 0;
    canREG1->IF1DATx[6] = 0;
    canREG1->IF1DATx[7] = 0;


    /** - Initialize Command Registers and select Message Number 1 */
    canREG1->IF1CMD  = 0xFF;
    canREG1->IF1NO     = 1;


    /** - wait for Busy Flag to set, IF[1] contents will be moved to Mailbox 1 */
    while((canREG1->IF1STAT & 0x80) == 0x80);


    /** - Disable Parity PMD = 0x5 */
    canREG1->CTL  |= 0x00001400;


    /** - Enable Test Mode */
    canREG1->CTL |= 0x80;


    /** - Enable Direct Access to DCAN RAM */
    canREG1->TEST |= 0x200;


    /** - Corrupt Mail Box1 data locations to generate Parity Error */
//  mailbox  = (unsigned int*)(canRAM1+ 0x20); // todo check why this doesn't calculate correctly
    mailbox  = (unsigned int*)0xFF1E0020U;
    *mailbox = *mailbox | 1;


    /** - Disable Direct access to DCAN RAM */
    canREG1->TEST &= 0xFFFFFDFF;


    /** - Enter Init Mode and Disable Test Mode and Enable Parity*/
    canREG1->CTL &= 0xFFFFEB7E;


    /** - Configure the Transfer direction to be from the
     *    message object 1 to the IF1 Register and start the read  */
    canREG1->IF1CMD  = 0x7F;
    canREG1->IF1NO = 1;


    /** - wait for Busy Flag to set, Mailbox[1] contents will be moved to IF[1] */
    while((canREG1->IF1STAT & 0x80) == 0x80);


    /* Wait for the DCAN Parity Error Bit to get set */
    while((canREG1->ES & 0x100) != 0x100);
}

Debug options:

Set your breakpoint in function DCAN1_Parity() in parity_functions.c.
Similar to the previous tests, parity errors are detected during a memory read operation.

What's next?

For you, it's now time to download the sample code, import it in CCS and start debugging.
Then please help me to improve the project:

The code needs clean-up.
The HTU, ADC and VIM tests that are in the uploaded project don't work. I've disabled the related menu items.
The ERR LED lights up after initialisation - something I turned into a feature by leaving it on for a second and then resetting it.
Dim the BoosterPack's LED. It's too bright and over-shines the LCD.
Replace the safety check functions generated by HALCogen (deprecated) by the ones of the Hercules SafeTI Diagnostic library.

I hope that this project will help you to get a grip on the safety features of the Hercules family.
Don't hesitate to message me if you have a question, an improvement or if you found something that's incorrect.

Thank you Anthony F. Seely for the advice on the PBIST test best practices, and Martin Valencia for testing the uploaded project.

disclaimers etc:
I received the boosterpack and launchpad from TI.
The LCD library used in this project is my port of Rei Vilo's Energia library for the Educational BoosterPack
Most code in this project's safetyfeatures folder is taken from TI's Hercules Safety MCU Demos.
Other code is scraped from the HALCoGen examples.
All artifacts made by me are free to grab. For the things I borrowed or got for free, please respect the licenses.
I'm free and independent

Related Blog
Hercules Safety MCU Demo with Educational BoosterPack
Hercules Safety MCU Demo with Educational BoosterPack in Action
Project14 \| Clustered MCUs: Functional Safety with Lockstep CPUs

Attachments:

RM46_BOOSTXL_EDUMKII_SAFETY_DEMO_NORTOS.zip