AnsweredAssumed Answered

Watchpoint register operation -- Processor hardware challenge

Question asked by MikeSmithCanada on Jul 20, 2009
Latest reply on Jul 29, 2009 by WassimB

Background to question

 

As academics, we look at new techniques for developing analysis tools for embedded systems. We are publishing in the area of race condition analysis -- e.g. fights for data access between VDK threads.

 

In the IEEE Software magazine special issue (May / June 2009) on testing of embedded systems we discussed hardware code instrumentation showing that the Blackfin has many desirable features in this area that are  not present in other processors.

 

Software instrumentation of code is always an option, but hardware instrumentation is faster. People are researching new architectural features to solve the problem, but we have shown that Blackfin already has these features. We showed how to use the Blackfin trace buffer to handle code coverage issues for testing and the data watch registers to handle data race conditions.

 

Basically, we set up the data watch registers to watch a data location. When the match arises, we check to see if the thread has all the correct locks for accessing this memory location. We would like to see an exception triggered on the data match, but the hardware triggers an emulation, and we have to live with that

 

Software instrumentation slows the code down by 200 times (IBM thread checker for example) so even jumping in and out of emulation mode could actually be faster. Even if not, we could simulate the performance of a "perfect" Blackfin.

 

We have a background GUI which recognizes when VDSP switched into emulation mode after a data watch match. The GUI then changes the watch control, status and count register, and causes the processor to start again. With the data  watch counter modified, the instruction re-eexcutes without causing a second data match.

 

Code seems to be working, but we are now trying to optimize how fast the GUI can restart the processor after doing the data arace analysis.

 


QUESTION 1

 

When VDSP is thrown into a data watch emulation, VDSP prints out a message on the console screen of the form

 

DATA WATCH 0 AT (PROGRAM) ADDRESS XXX AT (MEMORY) ADDRESS YYY

 

Which means that this operation consumes the time for a print, 3 symbol table look up and 3 access to the blackfin over the JTAG to access 3 registers inside VDSP.

 

This consume 63 ms (excluding the time for getting in and out of VDSP)

 

So the question is -- what do I need to do so that I can cause VDSP to jump round this print to console statement -- I would prefer an answer that avoided NDA, a patch for example

 

 

QUESTION 2

 

When the data watch emulation is thrown, we need to do a lot of analysis to see if the data match could lead to data race conditions (lets not worry about the details)

 

This would mean that inside the GUI many accesses to Blackfin memory locations are needed -- too slow over the JTAG.  Therefore I am looking for a way around this JTAG access.

 

Attempt 1 -- unsuccesful

 

Fix the watch control, status and count register inside the GUI, copy RETE into RETX,  and set the RETE to point to the start of an exception handler (which save and recovers any registers).   Inside the exception handler, do all the data analysis we want by directly access memory location rather than over the JTAG for speed reasons.

 

I prefer to use an exception rather than interrupt as data matches could be occuring inside an interrupt, but are less likely inside an exception

 

When the exception handler exits, we return to the address that was originally stored in the RETE and (in principle) the code continues.

 

It works the first time a data watch match occurs but not the second time. The error message is that I have accessed a system register while in user mode. The only system register I am accessing during the exception handler is the RTX instruction itself

 

If this approach would crash the first time, I would not be upset,as the crash would match expected behaviour. However the fact that it crashes the second time suggests I am missing some minor detail which, if fixed, would allow the approach to work.

 

I can put   sysreg_write(reg_RETX, 0x20) in my main( ) which says to me that I am starting out in supervisor mode -- so at what point did I switch to user mode and why?

 

 

 


Attempt 2 -- apparently successful

 

I presumed that the error in the first attempt was due to the fact that I was getting into the exception routine via the Emulation mode without properly setting ILATCH IPEND etc. (which you can't do inside VDSP)

 

So I tried the following

 

Set up some code to crash the processor by a data access error

 

_SetCrash:

          Save P0

          P0 = 1;

     [P0] = R0;

 

Set up some code to access location "desired address" which is being monitored by the data watch unit

 

_CauseMemoryMatch:

 

     P1 = desired address

     [P1] = R0;

     nop;

     nop;

etc

    RTS

 

 

When CauseMemoryMatch is called and the memory match occurs, the processor swicthes into emulation mode. The GUI saves the RETE to location SAVE_RETE on processor, and sets RETE to SetCrash and restarts the processor

 

The code _SetCrash causes an exception

 

The exception handler checks to see if RETX is SetCrash + 4 -- and then does data watch code stuff, before setting RETX to the value stored in SAVE_RETE

I will fix the exception handler to recover P0 later -- not needed at the moment as the match is happening inside a subroutine so P0 is volatile.

 

When the exception handler exits, the processor returns and re-executes the instruction inside CauseMemoryMatch. This does not cause a new data match as the watch counter has been reset.

 

Seems to work -- so the questions are

      (A) is it really working or is some nasty hardware feature going to catch me out later

      (B) s there a better (faster and reliable)  way of getting out of the data watch emulation mode and into some code running on the processor  where I can perform data analysis before restarting the original code.

Outcomes