I need to do clock accurate profiling of bare metal applications on the ARM, where I profile the whole program by the sum of its part. I am getting weird result, so I am trying to validate my methodology. I am using the example form here and there. And I created two test cases :
=============================
int name[1000];
unsigned int j = 0, t = 0, clock = 0;
while(j < 100000) {
t = get_cyclecount();
while(j < 100000) {
t = get_cyclecount();
for (int i = 0; i < 1000; ++i) {
name[i] = 0;
}
for (int i = 0; i < 1000; ++i) {
name[i] += i*i;
}
for (int i = 0; i < 1000; ++i) {
name[i] += i*i;
}
clock += get_cyclecount() - t;
j++;
}
=============================
int name[1000];
=============================
int name[1000];
unsigned int j = 0, t = 0, clock = 0;
while(j < 100000) {
t = get_cyclecount();
for (int i = 0; i < 1000; ++i) {
name[i] = 0;
}
clock += get_cyclecount() - t;
t = get_cyclecount();
for (int i = 0; i < 1000; ++i) {
name[i] += i*i;
}
clock += get_cyclecount() - t;
t = get_cyclecount();
for (int i = 0; i < 1000; ++i) {
name[i] += i*i;
}
clock += get_cyclecount() - t;
j++;
}
=============================
I compile both code without optimization (-O0), multiple iteration size (value for j) and I get the following results. Case2 is only smaller for a uniq iteration... plus I computed (Case2-Case1)/3j that for me correspond to the cost of doing one mesure. But when I do like sugested in the post:
=============================
I compile both code without optimization (-O0), multiple iteration size (value for j) and I get the following results. Case2 is only smaller for a uniq iteration... plus I computed (Case2-Case1)/3j that for me correspond to the cost of doing one mesure. But when I do like sugested in the post:
unsigned int overhead = get_cyclecount();
overhead = get_cyclecount() - overhead;
this always gives me 29 clk.
Does anyone have an idea of what I'm doing wrong?
Why is there such a difference?
To resume I would like to make sure that my measure are correct and precise by finding how to compensate for the cost of measuring, and by trying to do that I get the inconstant values explained above: ~512 clock where a stable 29 clock should be found.
Does anyone have an idea of what I'm doing wrong?
Why is there such a difference?
To resume I would like to make sure that my measure are correct and precise by finding how to compensate for the cost of measuring, and by trying to do that I get the inconstant values explained above: ~512 clock where a stable 29 clock should be found.