2011-03-07 05:40:58     SIC_IWR, STI and IDLE -- strange code

Document created by Aaronwu Employee on Aug 26, 2013
Version 1Show Document
  • View in full screen mode

2011-03-07 05:40:58     SIC_IWR, STI and IDLE -- strange code

Jeremie RAFIN (FRANCE)

Message: 98670   

 

Dear Ladies and Gentlemen,

 

As per the blackfin datasheet (BF548), the instruction STI is not necessary before instruction IDLE if SIW_IWR is proprely configured so that wanted interrupts can wake-up the core from IDLE instruction. But in macro "idle_with_irq_disabled" (file "arch/blackfin/include/asm/irq.h"), we can read:

 

#define idle_with_irq_disabled() \

    __asm__ __volatile__( \

        NOP_PAD_ANOMALY_05000244 \

        ".align 8;" \

        "sti %0;" \

        "idle;" \

        : \

        : "d" (bfin_irq_flags) \

    )

 

Questions:

 

1) Why the instruction STI is used before and not after the instruction IDLE? Any specific reason?

 

2) On my current board, in the debug fs, I can see that all flags of SIC_IWR are reset (0x00000000) while default value is set (0xFFFFFFFF); is there any reason why we need to reset them (and where?)?

 

Thanks!

QuoteReplyEditDelete

 

 

2011-03-07 06:42:56     Re: SIC_IWR, STI and IDLE -- strange code

Mike Frysinger (UNITED STATES)

Message: 98673   

 

the macro is called "idle with irq disabled".  it isnt called "idle".

 

the reason for the STI is in the comment -- it's trying to minimize missed wakeup events.  the exact history is on LKML:

  lkml.org/lkml/2006/9/22/372

  thread.gmane.org/gmane.linux.kernel/448325/focus=449052

 

wake up sources in Linux are "opt in" and not "opt out".  if a peripheral doesnt have a driver loaded, the kernel cannot allow spurious interrupts/wakeup events to screw with the system.

 

as for where it gets reset, a simple grep of arch/blackfin/ for SIC_IWR should show you the answer.

QuoteReplyEditDelete

 

 

2011-03-07 07:29:02     Re: SIC_IWR, STI and IDLE -- strange code

Jeremie RAFIN (FRANCE)

Message: 98674   

 

Thanks for you quick answer, with useful information! But...

On the LKLM one can read:

 

> Here, according to design, it's not possible that interrupt occurs

> between "STI %0"(enable interrupt) and "IDLE".

>

> __asm__(".align 64; STI %0; IDLE;" : %0 (x):  :"cc");

>

> Robin can explain more details.

 

 

Unfortuneltly, I have not been able to find out the details from "Robin"

 

Because on my board (BF548), I can clearly show that this assumption (no interrupt occurs between STI and IDLE) is not true: I do have an interrupt between STI and IDLE (a hardware trace shows it).

 

Being on a tickless kernel, then, my program never handles (in time) the interrupt, since, as described in the LKLM discussion, the scheduler is never rerun in that racy condition...

 

Any suggestion?

 

 

 

For my second point, I think I get the answer. Thanks.

QuoteReplyEditDelete

 

 

2011-03-07 08:02:36     Re: SIC_IWR, STI and IDLE -- strange code

Mike Frysinger (UNITED STATES)

Message: 98680   

 

i dont see how dropping the STI would fix your situation.  have you tested this and found an improvement ?

QuoteReplyEditDelete

 

 

2011-03-07 08:27:30     Re: SIC_IWR, STI and IDLE -- strange code

Jeremie RAFIN (FRANCE)

Message: 98682   

 

Not drop, just reverse. Something like:

 

 

 

#define idle_with_irq_disabled() \

    __asm__ __volatile__( \

        NOP_PAD_ANOMALY_05000244 \

        ".align 8;" \

        "idle;" \

        "sti %0;" \

        : \

        : "d" (bfin_irq_flags) \

    )

 

 

 

The only problem is that I need to have the SIC_IWC correctly be set (the reason of my question 2). If I am correct, this can be the case only of CONFIG_PM is activated. Thus if the code above is the right one, we'll need to change the code around the PM so that even if PM is not enabled, the SIC_IWC is properly managed.

 

Wait and see: I'll try this afternoon and come back to you.

 

Meanwhile, do you have any idea how can we contact "Robin" about the "details"?

 

Thanks,

 

-Jeremie

QuoteReplyEditDelete

 

 

2011-03-07 08:52:04     Re: SIC_IWR, STI and IDLE -- strange code

Mike Frysinger (UNITED STATES)

Message: 98683   

 

reversing it doesnt make much sense for the purpose of default_idle.  if you look at the code, it immediately follows up with STI.

 

Robin is no longer on the team.  i can contact him personally, but that's about it.

QuoteReplyEditDelete

 

 

2011-03-07 12:19:06     Re: SIC_IWR, STI and IDLE -- strange code

Jeremie RAFIN (FRANCE)

Message: 98687   

 

My today conclusion's.

 

 

 

1) The good news is that removing the STI works (if next points are respected). NOTE: we're right that reversing is same as removing but I think we should do a loop on "need_resched" including IDLE + STI + CTI just for avoiding a schedule on any interrupt (in order to optimize, so we don't care at that point).

 

2) The CONFIG_PM is neither needed (to my mind) nor enough (for sure!); but in my config it is activated (I had not time to remove it and compile again).

 

3) I had to add a "write_SIC_IWR" at the same place than "write_SIC_IMASK" (with same logic) in "ints-priority.c" so that SIC_IWR is properly set for any used interrupt; if details are needed I can send them.

 

4) The bad new is that the IDLE instruction can not be exited by a core interrupt (unless STI is called before): the hardware stops the IDLE only for "INTM"; thus we can not have a kernel using it, e.g. for ticks. So my configuration works because the ticks are generated by a gp timer, not a core timer.

 

 

 

In a more general way:

 

a) I do think there is a bug in the current kernel (a STI instruction should not preceed an IDLE one for a tickless or even real-time application). If you don't mind, can you just have a quick talk to Robin to have his feedback (if he remembers)?

 

b) so far I think there is no generic solution: either we remove the IDLE instruction (and we roughtly waste 100mA), or we have no core interrupt. This is a poor trade-off.

 

 

 

So the topic is not, to my mind, over.

 

Have you any idea? Any question?

 

 

 

-Jeremie

QuoteReplyEditDelete

 

 

2011-03-08 13:05:33     Re: SIC_IWR, STI and IDLE -- strange code

Mike Frysinger (UNITED STATES)

Message: 98734   

 

the code sequence we're using is basically what design said should work.  it isnt documented anywhere.

 

how exactly are you seeing this sequence of events ?  are you just watching core timer ticks ?  do you have some other event or device that should be signaling things ?

 

please run `make bugreport` and post the tarball.

QuoteReplyEditDelete

 

 

2011-03-09 00:54:49     Re: SIC_IWR, STI and IDLE -- strange code

Mike Frysinger (UNITED STATES)

Message: 98744   

 

talking a bit more to design and noodling on things, you could try these things:

 

- turn on CONFIG_IDLE_L1 if it isnt already

 

- change idle_with_irq_disabled() to:

    asm volatile(

      "ssync;"

      ".align 8;"

      "sti %0;"

      "idle;"

    );

 

- change idle_with_irq_disabled() to:

    asm volatile(

      ".align 8;"

 

      /* x - 8 */

      "mnop;"

      "jump 1f;"

      "2: ssync;"

 

      /* x */

      "sti %0;"

      "idle;"

      "mnop;"

 

      /* x + 8 */

      "mnop; mnop;"

      /* x + 16 */

      "mnop; mnop;"

 

      /* x + 24 */

      "mnop;"

      "jump 3f;"

      "1: nop;"

 

      /* x + 32 */

      "jump 2b;"

      "3: nop;"

    );

QuoteReplyEditDelete

 

 

2011-03-09 13:38:25     Re: SIC_IWR, STI and IDLE -- strange code

Jeremie RAFIN (FRANCE)

Message: 98777   

 

Hi Mike,

 

After a day of tests, I can say that your very strange code -that I'd like to know how you have dreamt out such a stuf- does not work, unfortunetly. It improves the situation since the real time bug appears much less times (about one time every 1 minute instead of every about 1ms), but the issue is still there...

 

I think the problem is more unlikely because the idle code is in L1 so that the time window this code is racy is much smaller. But if you tell us why you are MNOP'ing and JUMP'ing several times before the STI/IDLE, peraphs we could understand why the racy condition is less likely. Can you explain, please?

 

To answer your other points: yes you guess right: Isabelle and I are working together. The way we generate the events is described in her messages (a special module creating threads and interrupt collisions). There is no other expected event, all is quiet. At the end we generate a hardware trace to analyze why our kthread has not been scheduled. Each time we can see that an interrupt is executed just after the STI instruction (the one that wakes up the kthread) and before the IDLE (as you can see it in Isabelle's last post).

 

Cheers,

 

-Jeremie

QuoteReplyEditDelete

 

 

2011-03-10 03:56:35     Re: SIC_IWR, STI and IDLE -- strange code

Mike Frysinger (UNITED STATES)

Message: 98840   

 

you might have to force the ssync bit to be like:

.align 8;

mnop; nop; ssync;

.align 8;

sti %0;

idle;

 

the jumping code should only be necessary when running in external memory.  if running in L1 inst, the small variant should be sufficient.

 

the point is to quiet noise from the prefetcher since IDLE includes an implicit SSYNC which we cant tolerate.  the STI and IDLE must commit back to back so that the hardware can guarantee it going in without an interrupt hitting in between.

Attachments

    Outcomes