Background to question
As academics, we look at new techniques for developing analysis tools for embedded systems. We are publishing in the area of race condition analysis -- e.g. fights for data access between VDK threads.
In the IEEE Software magazine special issue (May / June 2009) on testing of embedded systems we discussed hardware code instrumentation showing that the Blackfin has many desirable features in this area that are not present in other processors.
Software instrumentation of code is always an option, but hardware instrumentation is faster. People are researching new architectural features to solve the problem, but we have shown that Blackfin already has these features. We showed how to use the Blackfin trace buffer to handle code coverage issues for testing and the data watch registers to handle data race conditions.
Basically, we set up the data watch registers to watch a data location. When the match arises, we check to see if the thread has all the correct locks for accessing this memory location. We would like to see an exception triggered on the data match, but the hardware triggers an emulation, and we have to live with that
Software instrumentation slows the code down by 200 times (IBM thread checker for example) so even jumping in and out of emulation mode could actually be faster. Even if not, we could simulate the performance of a "perfect" Blackfin.
We have a background GUI which recognizes when VDSP switched into emulation mode after a data watch match. The GUI then changes the watch control, status and count register, and causes the processor to start again. With the data watch counter modified, the instruction re-eexcutes without causing a second data match.
Code seems to be working, but we are now trying to optimize how fast the GUI can restart the processor after doing the data arace analysis.
When VDSP is thrown into a data watch emulation, VDSP prints out a message on the console screen of the form
DATA WATCH 0 AT (PROGRAM) ADDRESS XXX AT (MEMORY) ADDRESS YYY
Which means that this operation consumes the time for a print, 3 symbol table look up and 3 access to the blackfin over the JTAG to access 3 registers inside VDSP.
This consume 63 ms (excluding the time for getting in and out of VDSP)
So the question is -- what do I need to do so that I can cause VDSP to jump round this print to console statement -- I would prefer an answer that avoided NDA, a patch for example
When the data watch emulation is thrown, we need to do a lot of analysis to see if the data match could lead to data race conditions (lets not worry about the details)
This would mean that inside the GUI many accesses to Blackfin memory locations are needed -- too slow over the JTAG. Therefore I am looking for a way around this JTAG access.
Attempt 1 -- unsuccesful
Fix the watch control, status and count register inside the GUI, copy RETE into RETX, and set the RETE to point to the start of an exception handler (which save and recovers any registers). Inside the exception handler, do all the data analysis we want by directly access memory location rather than over the JTAG for speed reasons.
I prefer to use an exception rather than interrupt as data matches could be occuring inside an interrupt, but are less likely inside an exception
When the exception handler exits, we return to the address that was originally stored in the RETE and (in principle) the code continues.
It works the first time a data watch match occurs but not the second time. The error message is that I have accessed a system register while in user mode. The only system register I am accessing during the exception handler is the RTX instruction itself
If this approach would crash the first time, I would not be upset,as the crash would match expected behaviour. However the fact that it crashes the second time suggests I am missing some minor detail which, if fixed, would allow the approach to work.
I can put sysreg_write(reg_RETX, 0x20) in my main( ) which says to me that I am starting out in supervisor mode -- so at what point did I switch to user mode and why?
Attempt 2 -- apparently successful
I presumed that the error in the first attempt was due to the fact that I was getting into the exception routine via the Emulation mode without properly setting ILATCH IPEND etc. (which you can't do inside VDSP)
So I tried the following
Set up some code to crash the processor by a data access error
P0 = 1;
[P0] = R0;
Set up some code to access location "desired address" which is being monitored by the data watch unit
P1 = desired address
[P1] = R0;
When CauseMemoryMatch is called and the memory match occurs, the processor swicthes into emulation mode. The GUI saves the RETE to location SAVE_RETE on processor, and sets RETE to SetCrash and restarts the processor
The code _SetCrash causes an exception
The exception handler checks to see if RETX is SetCrash + 4 -- and then does data watch code stuff, before setting RETX to the value stored in SAVE_RETE
I will fix the exception handler to recover P0 later -- not needed at the moment as the match is happening inside a subroutine so P0 is volatile.
When the exception handler exits, the processor returns and re-executes the instruction inside CauseMemoryMatch. This does not cause a new data match as the watch counter has been reset.
Seems to work -- so the questions are
(A) is it really working or is some nasty hardware feature going to catch me out later
(B) s there a better (faster and reliable) way of getting out of the data watch emulation mode and into some code running on the processor where I can perform data analysis before restarting the original code.