Hi,
I have a zedboard and I'm developing an application which must execute a lot of multiplications in parallel (I'm using almost all dsp slices available) and then, after finishing, the result must be passed through N mathematical functions in parallel (imagine this ... it must execute 50 multiplications in parallel and then each mathematical function would require at least 75 multiplications in parallel, these are made up numbers).
As you can see this is a very resource consuming application. And my problem resides entirely in the N mathematical functions. They are tricky to implement in hardware, require a lot of resources which I don't have.
With this problem came the following question. Is it possible to offload these multiplications to the NEON co processor? I mean that the programmable logic sends the instructions directly to the NEON, without passing through the ARM processor?
Being really creative I think I can make a workaround and reuse the dsp slices that were already used, but I don't think I can fit all the mathematical operations.
PS: These are floating point operations and there is a possibility that this might not be precise enough, so I'd need more bits, which would lead to more multipliers, which would reduce the amount of calculations in parallel that I can execute