Although the boundary between DSP & MCU is vanished still I want to here from you that what is your first choice when playing with digital signal.Is it a complete DSP like TI320F28xx or high performance cortex M4 core MCU.
As a Systems Engineer, I always like combining functional control with DSP capability.
I used this approach back in 1982 when I combined an Intel 8085 with a 9811 math coprocessor chip so that I could maintain a FLIR camera using a Doppler radar on board a Helicopter to support search and rescue operations. I used the 8085 to control all of the sensors while keeping the 9811 chip fully occupied doing floating point trigonometric calculations to derive the position angles for the servos.
Lots of fun and the pilots loved the result.
I haven't voted because I don't really think the basic question has a simple answer but it's certainly interesting to talk about.
I do lot's of DSP on Cortex Mx processors, I've done designs for clients that are based on DSP on M0+, M3 and M4, I 'll do M7 too when the dust has settled on the STM32H7 parts..
In the past I've used proper DSP chips (mainly AD) and I still support a customer project based on 320F2808.
As ever you have to use the part for the job.
Most of my current DSP work is done on FPGA, which gives you a step up on both performance and flexibility, but with a terrible hit on development cost and time.
But I agree with LInas, DSP chips are optimised for DSP stuff and clock for clock massively outperform general purpose processors at DSP work. On the other hand, gneral purpose DSPs (I'm thinking Cortex Mx here) ahve got pretty good and can oftne eat up the work of small DSPs, my 320F2808 design would go on a CortexM4 if I did it today.
I agree. A DSP is a fully independent processor (standalone or multiple cores with or without other processor cores). NEON just takes advantage of the fact that it is possible to (relatively) rapidly fill wide registers from contiguous memory locations, and then execute some things in parallel using these registers. NEON doesn't have a separate program counter or control unit, it shares it with the main CPU. Whereas a DSP can run autonomously reading in data, processing it and writing it. NEON is 'fed' with wide data as part of the normal execution of the main CPU - some notes on it here: BBB, NEON and making Tintin bigger
Basically it is an accelerator, like an FPU for example. (I'm just a beginner with NEON, I may have made mistakes in my understanding).
The only point I think I was trying to make at the time was regarding this:
compiler had option to use NEON core, and also, code did run faster with this option enabled, so my guess is that NEON did make some impact on code execution speed
However one can't turn on a flag and expect C code to actually use NEON, because no C operation maps to NEON instruction(s), nor does gcc today get a 'hint' that some part of the code could be translated into NEON instructions. The only known way of invoking NEON functionality in C is (a) to explicitly type NEON 'intrinsic' functions into the code, or (b) include code or link to libraries that have code written to take advantage of NEON (and if that code is written in C then it will be using NEON intrinsics). Parts or all of the code could also be written in assembler (e.g. inline assembler) and the NEON instruction set would be used.
Since ARM is popular nowadays, there are many libraries that have been written to use NEON (i.e. they explicitly have #ifdef's and #includes with NEON specific code when compiling for ARM+NEON), so one doesn't need to be aware of it and it could be used. But only if you are using such libraries (usually easy to tell, there would be some information on the project page, or 'grep' the source code).
With a DSP you can code in normal C it will execute on the DSP, since it has a control unit and program counter like any usual processor), but they have more sophisticated compilers nowadays (I'm no expert) which may get some hints from certain C code structure. Also to intensively control and make benefit of the DSP and use the performance and features of it fully, the usual intrinsics or inline assembler is still possible too. The only DSP I've used is the 56k architecture, which is quite ancient..
I gather the NEON instruction capability in M4 and M7 is not competing directly with a full DSP - it is to allow the M series to handle "DSP" tasks a little better than previous ARM instruction sets. A modern DSP should still significantly out-perform a Cortex M7 at DSP tasks.
You do need the flag, but GCC won't do anything with it unless your code (or any library code you're linking in) uses the special functions (intrinsics), i.e. the code needs to be specifically designed to use NEON.
So, perhaps your linked-in code was designed for NEON.
If you can explicitly code to get your data into (say) 8-bit values and can parallel-ize, then using the special functions, basic operators should accelerate you 5-10 times from what I understand (and I also saw this with limited experiments at the time). Perhaps it still won't compare with a DSP if the use-case is fractals generation (I have no idea).
Software libraries (video/image libraries I suppose) are available designed to take advantage of NEON if it exists.
Unless your code had special functions to support NEON (or used a library of code which uses such functions), then your code didn't use NEON capability. GCC doesn't make use of NEON automatically (even though SHARC's compiler may), so code needs to explicitly use NEON (there are functions that GCC understands) for SIMD acceleration to occur.
I investigated NEON a while back for the BBB (I'm no expert on NEON) and I had to explicitly call special functions that GCC understood and replaced with NEON instructions.
multiple problems for each processor.
This test is regarding complex math inside VERY long loop. STM32F407 is low clocked not efficient processor to do math, so it is very slow.
Beaglebone does have NEON core, that is small dsp for parallel computation. I don't know did my code explore this core, well, compiler was set in correct way, so neon should be used, since SIMD can be used for multiplications for multiple numbers
ADSP-21489 is very strong at for loops, with good compiler performance, thats why even at 1/2 clock, it give much higher fps to bigger screen.
This test simply means that if you want to do math, use DSP, not cortex M/A/R core
DSP at same clock frequency will destroy arm cortex m4 mcu.
ARM architecture is very flexible for complex task. DSP on the other hand, have hardware specially designed for making lot of calculation in very short time.
My ADSP-21489 running at 450MHz is MUSCH MUCH MUCH more faster than beaglebone (Arm cortex m8 with NEON Core (Small DSP)) running at 1,2GHz
take a look ( note screen fro Cortex is 320x240 while for dsp much larger 48x272, and still, making much more FPS. (also i have video how fast is softcore processor is nios 2f)
note that for STM32F407 and Beaglebone, you can clearly see calculation speed by line that is updating screen, for ADSP it is invisable, since it is pushing lot of fps to much larger screen around 2x larger, meaning 2x more calculations