2009-02-03 02:57:37     udelay() / mdelay() broken? (BF527 - 530 MHz)

Document created by Aaronwu Employee on Aug 9, 2013
Version 1Show Document
  • View in full screen mode

2009-02-03 02:57:37     udelay() / mdelay() broken? (BF527 - 530 MHz)

Marco Reppenhagen (GERMANY)

Message: 68700   

 

Hello!

 

We've to use mdelay() during the setup of some devices connected to our BF527, running @530MHz.

 

While we try to do this, we printed out some timestamps and it seemed like include/asm-blackfin/delay.h is not usable for fast CPUs:

 

The result of  32-bit multiplication (usecs * lpj) turns around when usecs get higher then 814 (oso.).

That's a pitty, because mdelay does a n*udelay(1000) for greater values... ooops! Even it should be possible to do a udelay(3000),.... but it's not!

 

Even on smaller BF526@300MHz some silly behaviours will occur during the use of mdelay(); If you take a look at the definition of mdelay(), you'll know what I mean (1 works, 2..5 may be and so on// Depend on MAX settings).

 

I just made a quickhack to fix it in ./delay.h:

 

static inline void udelay(unsigned long usecs)

{

    /* (mjr) quickhack */

    u64 biggy = (u64)usecs * (u64)loops_per_jiffy;

    do_div (biggy, (1000000 / HZ));

    __delay((unsigned long)biggy);

}

 

Original code was:

 

{

    extern unsigned long loops_per_jiffy;

    __delay(usecs * loops_per_jiffy / (1000000 / HZ));

}

 

Depends on 64-Bit operations my way is slower but works for me, because when a delay happen, mostly you have to wait a minimum of time and a bit more make it slow.If a wrap around happen, the time may go below the minimum and correct wait will fail. Shure: my quickhack may extend timeouts.

 

I'm very interested in exchanging  experiences about this....

 

 

TranslateQuoteReplyEditDelete

 

 

2009-02-03 04:57:44     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Michael Hennerich (GERMANY)

Message: 68716   

 

Hi Marco,

 

let me take a look.

 

udelay() is guaranteed to work up to 1000us.

If this is not the case here - we need to fix this immediately.

I think the correct approach would be to slow down our __dealay loop.

(by adding a NOP; into the loop body)

This will end up in a smaller lpj value, and thus fixing this overflow issue.

 

-Michael

 

QuoteReplyEditDelete

 

 

2009-02-03 07:26:04     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Marco Reppenhagen (GERMANY)

Message: 68755   

 

Hi, again!

 

If you like to turn HZ down to 100 (10ms) on f.e. BF527@530MHz udelay(1000)  __actually_do_not_work.

 

A colleague of mine fixed the udelay() in quite a smart way:

 

in blackfin/kernel/setup.c the cclk is given, so we can calculate a "cached_cclk_per_usec" and export it.

 

Down in include/asm-blackfin/delay.h we save the whole division-stuff and everything we have to do is:

 

__delay(usecs * cached_cclk_per_usec);

 

That makes udelay much more precise, because no time is lost in silly operations. (My do_div solution...well: it brakes a fly on a wheel).

 

 

 

I attached the code -->  May be someone will put it into the trunk?

 

 

 

(With this fix it's possible to take up to a few "seconds of usecs(x)" undepending to the HZ value or the very large loops_per_jiffie value. The only limitation is now the 32-bit register in the called "nop-loop" (...by LSETUP); For 1usec we have to loop 528 times on a bf527@530MHz....)

 

 

 

 

 

delay.h

setup.c

TranslateQuoteReplyEditDelete

 

 

2009-02-03 07:27:54     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Michael Hennerich (GERMANY)

Message: 68756   

 

>We've to use mdelay() during the setup of some devices connected to our

>BF527, running @530MHz.

>

>While we try to do this, we printed out some timestamps and it seemed like

>include/asm-blackfin/delay.h is not usable for fast CPUs:

>

>The result of  32-bit multiplication (usecs * lpj) turns around when usecs

>get higher then 814 (oso.).

 

I don't see this happen!

 

Processor Speed: 525 MHz core clock and 105 MHz System Clock

---snip--

Calibrating delay loop... 1044.48 BogoMIPS (lpj=2088960)

 

usecs * loops_per_jiffy overflows for usecs > 2000

 

Processor Speed: 600 MHz core clock and 100 MHz System Clock

---snip--

Calibrating delay loop... 1191.93 BogoMIPS (lpj=2383872)

 

usecs * loops_per_jiffy overflows for usecs > 1800

 

I wonder what your lpj equals?

 

>

>That's a pitty, because mdelay does a n*udelay(1000) for greater values...

>ooops! Even it should be possible to do a udelay(3000),.... but it's not!

>

>Even on smaller BF526@300MHz some silly behaviours will occur during the

>use of mdelay(); If you take a look at the definition of mdelay(), you'll

>know what I mean (1 works, 2..5 may be and so on// Depend on MAX settings).

 

In general for long mdelay, I would use schedule_timeout() instead.

 

You're absolutely right we should define

 

#define MAX_UDELAY_MS    1

 

- in our asm/delay.h to work round this.

 

Thanks!

 

>

>I just made a quickhack to fix it in ./delay.h:

>

>static inline void udelay(unsigned long usecs)

>

>{

>

>    /* (mjr) quickhack */

>

>    u64 biggy = (u64)usecs * (u64)loops_per_jiffy;

>

>    do_div (biggy, (1000000 / HZ));

>

>    __delay((unsigned long)biggy);

>

>}

>

>Original code was:

>

>{

>

>    extern unsigned long loops_per_jiffy;

>

>    __delay(usecs * loops_per_jiffy / (1000000 / HZ));

>

>}

>

>Depends on 64-Bit operations my way is slower but works for me, because

>when a delay happen, mostly you have to wait a minimum of time and a bit

>more make it slow.If a wrap around happen, the time may go below the

>minimum and correct wait will fail. Shure: my quickhack may extend

>timeouts.

>

>I'm very interested in exchanging  experiences about this....

 

Preferably I like to avoid this 64-bit math.

Maybe we should also add a __bad_delay() function.

 

-Michael

QuoteReplyEditDelete

 

 

2009-02-03 07:48:52     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Wolfgang Muees (GERMANY)

Message: 68758   

 

Michael,

 

if you have a HZ Value of 100, and a Blackfin CPU with 600 MHz, then lpj = 600.000.000 / 100 = 6.000.000.

 

A udelay(1000) will compute 1000 x 6.000.000 = 6.000.000.000.

 

Max(unsigned long) = 4.294.967.296 ==> BOOM.

 

Calculating the __delay value from the precomputed cclk_per_usec value is simple and efficient. No overflows any more..

 

regards

 

Wolfgang

 

 

TranslateQuoteReplyEditDelete

 

 

2009-02-03 08:02:22     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Michael Hennerich (GERMANY)

Message: 68760   

 

I see – you’re right.

However I don’t like the idea of not using the kernel managed lpj variable at all.

 

cached_cclk_per_usec is only approx = loops_per_jiffy / (1000000 / HZ)

 

The kernel time keeping infrastructure updates lpj, during CPU_FREQ transitions.

Using your introduced cached_cclk_per_usec would fail until someone calls get_cclk().

 

-Michael

 

QuoteReplyEditDelete

 

 

2009-02-03 08:34:29     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Marco Reppenhagen (GERMANY)

Message: 68764   

 

Hi!

 

We can call get_cclk() during the calibration so the new variable is called once and known to all others.

 

After that I don't think that the calculated Value is less precise than the use of lpj. Far from it: For small values of udelay(x), where x is 1.. we save an ugly division and go into the asm-loop soon... The division in the original code is more a problem than the "inaccuracy of cclk", if there's something like that....

 

I think THIS fix works for everybody, even if someone uses low HZ and/or fast CPUs.

(What about BF533 with 750MHz or nest Generations >1GHz....?)

TranslateQuoteReplyEditDelete

 

 

2009-02-03 08:44:00     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Michael Hennerich (GERMANY)

Message: 68765   

 

udelay(x) is guaranteed to wait at least x micro seconds.

If it takes longer no one should care!

 

Mathematically seen your precomputed cclk_per_usec value is similar to bracing the argument of __delay like that.

 

usecs * (loops_per_jiffy / (1000000 / HZ))

 

However this intentionally avoided since we’re losing accuracy.

This hurts especially when we don’t run CCLK at integers of MHz.

 

static inline void udelay(unsigned long usecs)

{

    extern unsigned long loops_per_jiffy;

    __delay(usecs * loops_per_jiffy / (1000000 / HZ));

}

 

-Michael

QuoteReplyEditDelete

 

 

2009-02-03 09:07:56     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Marco Reppenhagen (GERMANY)

Message: 68768   

 

AHH! Now I see you're right!

 

But we have to do something, because it's not unusual to use HZ=100 on fast CPUs and it's a fact that udelay() fails long time before the 1000usec guarantee. "mdelay()" does everything, but won't wait for at least n-msecs ;-(

 

Maybe it's better to use my 64-bit stuff as a __bad_delay  function, but all the mdelays in usb, sound, kernel and so on will fail as hitherto...

 

...a CONFIG_BAD_UDELAY , which switch to the "bad"solution ?

 

Shure, it's not as simple as that...

TranslateQuoteReplyEditDelete

 

 

2009-02-03 09:16:20     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Michael Hennerich (GERMANY)

Message: 68769   

 

You're absolutely right - what we have today is broken.

 

Thanks for pointing this out!

 

 

I will commit something into svn branch and head that addresses this issue later today.

 

 

Currently evaluating different approaches.

 

 

-Michael

QuoteReplyEditDelete

 

 

2009-02-03 11:04:53     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Michael Hennerich (GERMANY)

Message: 68777   

 

This seems to be a good tradeoff.

 

/*

* close approximation borrowed from m68knommu to avoid 64-bit math

*/

 

#define    HZSCALE        (268435456 / (1000000/HZ))

 

static inline void udelay(unsigned long usecs)

{

    extern unsigned long loops_per_jiffy;

    __delay((((usecs * HZSCALE) >> 11) * (loops_per_jiffy >> 11)) >> 6);

}

QuoteReplyEditDelete

 

 

2009-02-03 12:09:38     Re: udelay() / mdelay() broken? (BF527 - 530 MHz)

Mike Frysinger (UNITED STATES)

Message: 68786   

 

no code should use those cached clock related values.  only the internal clock functions where the values actually get cached.  the meaning/usage of those variables may change over time and that can only be accomplished if their usage doesnt proliferate.

Attachments

Outcomes