(Complete list of all blog entries in this series)
Maybe you followed what I did write so far. If this is so, you might remember by article about the head injury criterion. There I explained how the car industry determines how severe a head impact is. One of the important part in that calculations is that only the total acceleration is needed, but not the single axes.
That means I need to combine, for each reading I take from the accelerometer, all the values into the norm of the acceleration vector. The formula for that is:
The problematic part is to calculate the square root - on a Cortex-M0 this is not implemented in hardware. And since I need to do that 1600 times per second, I wanted to find out how fast it actually is.
The setup
Since there is no code profiler available, I resorted to a tool I now have available: my oscilloscope. Instead of trying to measure to elapsed time in the PSoC4 itself, I wrote the code so that an output pin is toggled whenever 1600 values have been calculated. Using that trick I can look with the oscilloscope and measure the elapsed time without any overhead (except for toggling the pin itself).
So my code looks like this:
// set P1_0 as output
CY_SYS_PINS_SET_DRIVE_MODE(CYREG_PRT1_PC,0,CY_SYS_PINS_DM_STRONG);
int dir=0;
uint16 i,j=0,k=0;
for(;;)
{
uint32 sum=0;
for (i=0;i<1600;i++)
{
sum+=sqrt(i*i+j*j+k*k);
}
j++;
if (0==(j%1600))
k++;
// toggle the pin
if (0==dir)
CY_SYS_PINS_SET_PIN(CYREG_PRT1_DR,0);
else
CY_SYS_PINS_CLEAR_PIN(CYREG_PRT1_DR,0);
dir=1-dir;
}i, j and k simulate my three acceleration axes.
Some results
My first test was without any calculation at all, so the loop was empty except for toggling the pin. The result was a 688kHz square wave, so the time for executing the loop is about 0.7µs.
Next step was to just add the values, without the square root. That ended up in a 288Hz square wave, so the time increased to 1.5ms. Thats quite significant, but still OK (it would mean a duty cycle of way below 1%). Last step was to really sum up the square roots. That resulted in a drop to 6.3Hz, with a loop execution time of about 80ms.
Now thats really slow - just doing the calculation for the acceleration means a duty cycle of nearly 10%. Thats not what I wanted...
Some speedup
So I set out to improve that. Since I don't need floating point precision, I looked for algorithms to calculate an integer square root. As usual, Google comes the the rescue and gave me a little code snipped. I copied it into my test program - and had a nice speedup. The measured frequency was now 22Hz, with an execution time of about 22ms. Thats nearly four times better than the FP version. Nice.
But then I noticed I might have another option for speedup. For now I was using the debug build of my program - and that usually disables most of the optimizations. So I switched over to the release build and tested again.
And lo and behold - now the loop runs in 8ms (with a frequency of 61Hz). That means I'm back at a 1% power usage duty cycle.
(I also measured the other times: the FP version runs in 61ms, and the simple sum in 332µs)
Conclusions
Going by the results I think I don't need to make further optimizations to my code. Since the isqrt snippet I found is not in the public domain (or under another permissive license) I will look for implementations that I can use, or create my own one (I also found a nice algorithm in a PSoC1 AppNote, maybe I use that one).
I still need to look into how much math I really need to do on the PSoC, and which parts I can move to the phone.

Top Comments