Am running a simple rectify program on TigerSHARC TS201S
// C++ code
int *HalfWaveRectifyReleaseMode(int initial_array[], int final_array[], int N) {
int *return_pt = final_array;
if (N <= 0) return NULL;
for (int count = 0; count < N; count++) {
if (initial_array[count] > 0)
final_array[count] = initial_array[count];
else final_array[count] = 0;
}
return return_pt;
}
Using the code to demonstrate differences in speed with various modes of programming -- debug C, release C and custom assembly code -- float and int versions of rectify
We then go in and try to identify areas where stalls might occur to understand the behaviour of the architecture
Board is ADDS-TS201S-Exlite Rev 1.1 -- back of board says 1-D-1.2
There is nearly a factor of two between timings using cycle counter on the board and those provided with the simulator -- any idea why?
Consistent behaviour across all forms of program (C, debug, relase, asm)
Since I am counting cycles then "board speed" does not count -- meaning cycles / us (power save mode) is irrelevant, and I did not think that TigerSHARCs had a power save mode anyway
Board results
uS / point Integer Debug C 0.152225, Release C 0.022700, First ASM 0.016825
uS / point Float Debug C 0.157625, Release C 0.047850, First ASM 0.017125
us -- averageTime 0.003194, precision (maxTime - minTime) / 2 0.000088, acceptable 0.000512
Succesful link to test file CodeTimingComparison_Test_cpp.
Cycles / point Integer Debug C 75, Release C 11, First ASM 8
Total cycles Integer Debug C 12044, Release C 1816, First ASM 1358
Cycles / point Float Debug C 77, Release C 23, First ASM 8
Total Cycles Float Debug C 12474, Release C 3828, First ASM 1352
Cycles averageTime 0, precision (maxTime - minTime) / 2 0, acceptable 5
Succesful link to test file CycleCounter_CodeTimingComparison_Test_cpp.
Succesful link to test file ExploreTigerSHARCASM_Test_cpp.
Succesful link to test file Rectify_cpp.
Success: 17 blackbox tests passed.
Blackbox Assert statistics: 0 Failures, 0 Expected Failures, 28 Successes.
Whitebox Assert statistics: 0 Failures, 0 Expected Failures, 0 Successes. (Includes C Test statistics)
Test time: 0.00104717 seconds.
Simulator results
uS / point Integer Debug C 0.072787, Release C 0.010525, First ASM 0.010450
uS / point Float Debug C 0.086925, Release C 0.031625, First ASM 0.010525
us -- averageTime 0.001835, precision (maxTime - minTime) / 2 0.000044, acceptable 0.000512
Succesful link to test file CodeTimingComparison_Test_cpp.
Cycles / point Integer Debug C 36, Release C 5, First ASM 5
Total cycles Integer Debug C 5791, Release C 826, First ASM 831
.\CycleCounterCodeTimingComparison_Test.cpp(64): Error: Failure in CycleCounter_CodeTimingComparison_Int: integerRelease > firstIntegerAssembly
Cycles / point Float Debug C 43, Release C 15, First ASM 5
Total Cycles Float Debug C 6945, Release C 2531, First ASM 838
Cycles averageTime 0, precision (maxTime - minTime) / 2 0, acceptable 5
Succesful link to test file CycleCounter_CodeTimingComparison_Test_cpp.
Succesful link to test file ExploreTigerSHARCASM_Test_cpp.
Succesful link to test file Rectify_cpp.
FAILURE: 1 out of 17 blackbox tests failed.
Blackbox Assert statistics: 1 Failures, 0 Expected Failures, 27 Successes.
Whitebox Assert statistics: 0 Failures, 0 Expected Failures, 0 Successes. (Includes C Test statistics)
Test time: 0.00064387 seconds.
In case I am doing something obviously wrong -- Example tests looks like this.
This is a TigerSHARC variant of the UnitTest++ testing framework found at SoureForge
#define NUMPOINTS 160
TEST(CycleCounter_CodeTimingComparison_Int)
{
int initialArray[NUMPOINTS] = {
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6,
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6,
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6,
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6,
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6,
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6,
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6,
1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6, 1, 2, 3, -4, 6
};
int finalArray[NUMPOINTS] = {0, 0, 0, 0, 0};
__int64 measuredTimes[5];
measuredTimes[0] = __ReadCycleCounter64( );
HalfWaveRectifyDebugMode(initialArray, finalArray, NUMPOINTS);
measuredTimes[1] = __ReadCycleCounter64( );
HalfWaveRectifyReleaseMode(initialArray, finalArray, NUMPOINTS);
measuredTimes[2] = __ReadCycleCounter64( );
HalfWaveRectifyASM_Int(initialArray, finalArray, NUMPOINTS);
measuredTimes[3] = __ReadCycleCounter64( );
measuredTimes[4] = __ReadCycleCounter64( );
__int64 timerOverHead = (measuredTimes[4] - measuredTimes[3]);
__int64 integerDebug = (measuredTimes[1] - measuredTimes[0] - timerOverHead) / NUMPOINTS;
__int64 integerRelease = (measuredTimes[2] - measuredTimes[1] - timerOverHead) / NUMPOINTS;
__int64 firstIntegerAssembly =
(measuredTimes[3] - measuredTimes[2] - timerOverHead) / NUMPOINTS;
printf("Cycles / point Integer Debug C %d, Release C %d, First ASM %d\n",
(int) integerDebug, (int) integerRelease, (int) firstIntegerAssembly);
timerOverHead = (measuredTimes[4] - measuredTimes[3]);
integerDebug = (measuredTimes[1] - measuredTimes[0] - timerOverHead);
integerRelease = (measuredTimes[2] - measuredTimes[1] - timerOverHead);
firstIntegerAssembly = (measuredTimes[3] - measuredTimes[2] - timerOverHead);
printf("Total cycles Integer Debug C %d, Release C %d, First ASM %d\n",
(int) integerDebug, (int) integerRelease, (int) firstIntegerAssembly);
CHECK( integerDebug > integerRelease); // RELEASE FASTER
CHECK( integerDebug > firstIntegerAssembly); // OUR ASM FASTER
CHECK( integerRelease > firstIntegerAssembly); // OUR ASM FASTER
}
Thanks
Mike