Hello,
I am benchmarking a simple algorithm with two multiplies on a Zedboard(7Z020) like shown below:
Array2[j] = 1000 * Array1[i] * Array1[i+1];
The timers clock freq is 111.111115MHz = 9ns per clock cycle.
Does anyone know the ARM instruction timing(in clock cycles) for 64-bit integer multiply and Divide?
I'm getting the following results:
==================================
C Algorithm Used:
volatile int Array1[2100];
int Array2[1050], i, j;
for(i=0, j=0; i<2100; i+=2, j++)t
Array2[j] = (int)((1000 * (long long)Array1[i]) * ((long long) Array1[i+1]));
Counter Ticks = 32
Time Taken = Counter Ticks * 9ns = 286(ns)
Software and Hardware Setup
===========================
My C program using the XScuTimer is like below:
{
Counter_Register := 0xFFFFFFFF;
Counter_Snapshot1 := Counter_Register;
Start_XScuTimer();
Algorithm;
Stop_XScuTimer();
Counter_Snapshot2 := Counter_Register
Counter_Ticks := Counter_Snapshot1 - Counter_Snapshot2;
Time_Taken := Counter_Ticks * 9ns;
}
Software Environment for ARM Cortex A-9 CPU0
--------------------------------------------
xilinx-gcc compiler settings: -mfpu=vfpv3 -mfloat-abi=softfp -mcpu=cortex-a9 -march=armv7-a
Optimization Level -O3
Below are some of my FPGA clock configurations:
Peripheral PLL source Frequency (MHz)
---------------------------------------------------
CPU 6x Freq(MHz) ARM PLL 666.666687
UART Freq(MHz) IO PLL 50.000000
TTC0 CLK0 Freq(MHz) CPU_1X 111.111115
FPGA0 Freq(MHz) IO PLL 100.000000
FPGA1 Freq(MHz) IO PLL 142.857132
FPGA2 Freq(MHz) IO PLL 50.000000
FPGA3 Freq(MHz) IO PLL 50.000000
DDR Operating Freq(MHz) 533.333313
Thanks in advance for looking at my issue.
Regards,
Vishal.