element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet & Tria Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • About Us
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      • Japan
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Vietnam
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Sudden Impact Wearables Design Challenge
  • Challenges & Projects
  • Design Challenges
  • Sudden Impact Wearables Design Challenge
  • More
  • Cancel
Sudden Impact Wearables Design Challenge
Blog Skier impact monitor 09 - PSoC4 performance measurements
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Events
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: hlipka
  • Date Created: 19 Feb 2015 11:45 PM Date Created
  • Views 651 views
  • Likes 2 likes
  • Comments 5 comments
  • impact_skiers
  • sudden_impact
Related
Recommended

Skier impact monitor 09 - PSoC4 performance measurements

hlipka
hlipka
19 Feb 2015

(Complete list of all blog entries in this series)

Maybe you followed what I did write so far. If this is so, you might remember by article about the head injury criterion. There I explained how the car industry determines how severe a head impact is. One of the important part in that calculations is that only the total acceleration is needed, but not the single axes.

That means I need to combine, for each reading I take from the accelerometer, all the values into the norm of the acceleration vector. The formula for that is:

image

The problematic part is to calculate the square root - on a Cortex-M0 this is not implemented in hardware. And since I need to do that 1600 times per second, I wanted to find out how fast it actually is.

The setup

Since there is no code profiler available, I resorted to a tool I now have available: my oscilloscope. Instead of trying to measure to elapsed time in the PSoC4 itself, I wrote the code so that an output pin is toggled whenever 1600 values have been calculated. Using that trick I can look with the oscilloscope and measure the elapsed time without any overhead (except for toggling the pin itself).

So my code looks like this:

// set P1_0 as output
CY_SYS_PINS_SET_DRIVE_MODE(CYREG_PRT1_PC,0,CY_SYS_PINS_DM_STRONG);
int dir=0;
uint16 i,j=0,k=0;
for(;;)
{
  uint32 sum=0;
  
  for (i=0;i<1600;i++)
  {
          sum+=sqrt(i*i+j*j+k*k);
  }
  j++;
  if (0==(j%1600))
          k++;

  // toggle the pin
  if (0==dir)
          CY_SYS_PINS_SET_PIN(CYREG_PRT1_DR,0);
  else
          CY_SYS_PINS_CLEAR_PIN(CYREG_PRT1_DR,0);
  dir=1-dir;
}

i, j and k simulate my three acceleration axes.

Some results

My first test was without any calculation at all, so the loop was empty except for toggling the pin. The result was a 688kHz square wave, so the time for executing the loop is about 0.7µs.

Next step was to just add the values, without the square root. That ended up in a 288Hz square wave, so the time increased to 1.5ms. Thats quite significant, but still OK (it would mean a duty cycle of way below 1%). Last step was to really sum up the square roots. That resulted in a drop to 6.3Hz, with a loop execution time of about 80ms.

Now thats really slow - just doing the calculation for the acceleration means a duty cycle of nearly 10%. Thats not what I wanted...

Some speedup

So I set out to improve that. Since I don't need floating point precision, I looked for algorithms to calculate an integer square root. As usual, Google comes the the rescue and gave me a little code snipped. I copied it into my test program - and had a nice speedup. The measured frequency was now 22Hz, with an execution time of about 22ms. Thats nearly four times better than the FP version. Nice.

But then I noticed I might have another option for speedup. For now I was using the debug build of my program - and that usually disables most of the optimizations. So I switched over to the release build and tested again.

And lo and behold - now the loop runs in 8ms (with a frequency of 61Hz). That means I'm back at a 1% power usage duty cycle.

(I also measured the other times: the FP version runs in 61ms, and the simple sum in 332µs)

Conclusions

Going by the results I think I don't need to make further optimizations to my code. Since the isqrt snippet I found is not in the public domain (or under another permissive license) I will look for implementations that I can use, or create my own one (I also found a nice algorithm in a PSoC1 AppNote, maybe I use that one).

I still need to look into how much math I really need to do on the PSoC, and which parts I can move to the phone.

  • Sign in to reply

Top Comments

  • clem57
    clem57 over 10 years ago in reply to hlipka +1
    I see. I was figuring you had a smaller one of 10 or 12 bit ADC. If you need better precision, consider http://www4.wittenberg.edu/academics/mathcomp/bjsdir/ZuseZ3Talk.pdf In there is a good explanation…
  • hlipka
    hlipka over 10 years ago in reply to clem57 +1
    Thanks! This algorithm looks like the one I found in the Cypress AppNote. (But it also looks like the algorithm I'm already using, so there might not be such a large speedup). To put this in perspective…
  • DAB
    DAB over 10 years ago

    You might want to do a precalc check on the data to see if any one axis exceeds the limit.

    You can do a parametric analysis to see what combinations of axis data enters into the critical area where you need to do the detailed calculation.

     

    Set up a matrix based upon a integer input level and use fuzzy logic to determine if the change is light, moderate, or severe.

    Then use a simple logic table that would require two of the three axes to be severe to be qualified as a potential problem.

     

    Just a thought,

    DAB

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • hlipka
    hlipka over 10 years ago in reply to clem57

    Thanks! This algorithm looks like the one I found in the Cypress AppNote. (But it also looks like the algorithm I'm already using, so there might not be such a large speedup).

    To put this in perspective: the loop calculating 1600 square roots (together with the squares and the sum) takes 22ms. Thats 13.75µs per step, and 660 cycles. Thats actually quite fast. I Might get a speedup with the algorithm if I reduce the number of steps (since I know the input value cannot be more than 24 bits).

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • clem57
    clem57 over 10 years ago in reply to hlipka

    I see. I was figuring you had a smaller one of 10 or 12 bit ADC. If you need better precision, consider http://www4.wittenberg.edu/academics/mathcomp/bjsdir/ZuseZ3Talk.pdf In there is a good explanation leading to "The Binary Version of the "Completing the Square" Square Root Algorithm" With simple shifting logic, you can get near the answer to the level of precision you require. On any processor shifting is very fast. And very little memory will be consumed.

    Clem

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • hlipka
    hlipka over 10 years ago in reply to clem57

    We are talking about a memory-constrained device here. My value range for the sum is 22 bits (the ADXL375 delivers values up to about 4000, so its10 bits. Square them and we are with 20bits. Sum three of them, and we are at about 22bits). My result range is when at 11 or 12 bits, so up to 16384. And I would need uint32 values in this table, to avoid any shifts (that make it slower again). The PSoC4 has 16kb flash (and the PSoC4BLE up to 128kb), so I would need a quite sparse table. Your example above has an error of 2% which I think is too much for my needs.

    But I will keep this in mind when I found out that the PSoC is using to much power and I need to reduce the duty cycle. Until now the exercise was to find out whether the calculations are fast enough to be feasible at all.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • clem57
    clem57 over 10 years ago

    I have a simple suggestion: A table of precompute squares like this:

    1. 9/81
    2. 20/400
    3. 30/900

    The table can be used to get an approximation like this: if you had 500, you know if is 500 is closer to 20/400 and farther from 30/900 = 22 ish making a linear assumption. (480 actually!)

     

    Clem

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube