For some time we have been running our TS201 C code with cache and BTB disabled. We now would like to enable cache. I have included "cache_macros.h" in main.c, and at the start of my main() function I invoke asm("cache_enable(750);");
In order to test the improvement using the cache, I have been looking at the cycle counter before and after a 1500-iteration loop with about 45 instructions per iteration.
I see no difference in performance with and without the cache enabled. In both cases, the loop takes about 260000 cycles, which works out to roughly 260000 / (1500*45) = 3.8 cycles per instruction.
I am running at 500 MHz, out of internal RAM, emulating with an ADZS-USB-ICE.
1. When I single-step through the loop, the cycle counter increments by about 17 for pretty much every instruction. But when I just put a breakpoint before and after the loop, the cycle counter increments on average by 3.8 per instruction. Is it expected that single stepping with the emulator would result in significantly more cycles per instruction?
2. Should I expect to see some improvement with the cache enabled vs cache disabled when not single-stepping?
3. What should I expect to see for "typical" cycles per instruction when just running a simple loop that mostly just accesses general registers? I know it will never get to 1 cycle per instruction, but I guess I was thinking it would get lower than almost 4 cycles per instruction with cache enabled.
4. Is there possibly something else I need to do (other than invoking the cache_enable macro) to enable cache?
It should be noted that at the start of our start-up code we explicitly disable the cache and BTB, because we have a boot loader for which we don't want the cache enabled (because it manipulates program memory, and we don't want cache/BTB in the picture at that point). So is there possibly something else I need to do to re-enable the cache and BTB in my main program (which runs after the boot loader is complete)?