2009-02-03 02:57:37 udelay() / mdelay() broken? (BF527 - 530 MHz)
Marco Reppenhagen (GERMANY)
Message: 68700
Hello!
We've to use mdelay() during the setup of some devices connected to our BF527, running @530MHz.
While we try to do this, we printed out some timestamps and it seemed like include/asm-blackfin/delay.h is not usable for fast CPUs:
The result of 32-bit multiplication (usecs * lpj) turns around when usecs get higher then 814 (oso.).
That's a pitty, because mdelay does a n*udelay(1000) for greater values... ooops! Even it should be possible to do a udelay(3000),.... but it's not!
Even on smaller BF526@300MHz some silly behaviours will occur during the use of mdelay(); If you take a look at the definition of mdelay(), you'll know what I mean (1 works, 2..5 may be and so on// Depend on MAX settings).
I just made a quickhack to fix it in ./delay.h:
static inline void udelay(unsigned long usecs)
{
/* (mjr) quickhack */
u64 biggy = (u64)usecs * (u64)loops_per_jiffy;
do_div (biggy, (1000000 / HZ));
__delay((unsigned long)biggy);
}
Original code was:
{
extern unsigned long loops_per_jiffy;
__delay(usecs * loops_per_jiffy / (1000000 / HZ));
}
Depends on 64-Bit operations my way is slower but works for me, because when a delay happen, mostly you have to wait a minimum of time and a bit more make it slow.If a wrap around happen, the time may go below the minimum and correct wait will fail. Shure: my quickhack may extend timeouts.
I'm very interested in exchanging experiences about this....
TranslateQuoteReplyEditDelete
2009-02-03 04:57:44 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Michael Hennerich (GERMANY)
Message: 68716
Hi Marco,
let me take a look.
udelay() is guaranteed to work up to 1000us.
If this is not the case here - we need to fix this immediately.
I think the correct approach would be to slow down our __dealay loop.
(by adding a NOP; into the loop body)
This will end up in a smaller lpj value, and thus fixing this overflow issue.
-Michael
QuoteReplyEditDelete
2009-02-03 07:26:04 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Marco Reppenhagen (GERMANY)
Message: 68755
Hi, again!
If you like to turn HZ down to 100 (10ms) on f.e. BF527@530MHz udelay(1000) __actually_do_not_work.
A colleague of mine fixed the udelay() in quite a smart way:
in blackfin/kernel/setup.c the cclk is given, so we can calculate a "cached_cclk_per_usec" and export it.
Down in include/asm-blackfin/delay.h we save the whole division-stuff and everything we have to do is:
__delay(usecs * cached_cclk_per_usec);
That makes udelay much more precise, because no time is lost in silly operations. (My do_div solution...well: it brakes a fly on a wheel).
I attached the code --> May be someone will put it into the trunk?
(With this fix it's possible to take up to a few "seconds of usecs(x)" undepending to the HZ value or the very large loops_per_jiffie value. The only limitation is now the 32-bit register in the called "nop-loop" (...by LSETUP); For 1usec we have to loop 528 times on a bf527@530MHz....)
delay.h
setup.c
TranslateQuoteReplyEditDelete
2009-02-03 07:27:54 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Michael Hennerich (GERMANY)
Message: 68756
>We've to use mdelay() during the setup of some devices connected to our
>BF527, running @530MHz.
>
>While we try to do this, we printed out some timestamps and it seemed like
>include/asm-blackfin/delay.h is not usable for fast CPUs:
>
>The result of 32-bit multiplication (usecs * lpj) turns around when usecs
>get higher then 814 (oso.).
I don't see this happen!
Processor Speed: 525 MHz core clock and 105 MHz System Clock
---snip--
Calibrating delay loop... 1044.48 BogoMIPS (lpj=2088960)
usecs * loops_per_jiffy overflows for usecs > 2000
Processor Speed: 600 MHz core clock and 100 MHz System Clock
---snip--
Calibrating delay loop... 1191.93 BogoMIPS (lpj=2383872)
usecs * loops_per_jiffy overflows for usecs > 1800
I wonder what your lpj equals?
>
>That's a pitty, because mdelay does a n*udelay(1000) for greater values...
>ooops! Even it should be possible to do a udelay(3000),.... but it's not!
>
>Even on smaller BF526@300MHz some silly behaviours will occur during the
>use of mdelay(); If you take a look at the definition of mdelay(), you'll
>know what I mean (1 works, 2..5 may be and so on// Depend on MAX settings).
In general for long mdelay, I would use schedule_timeout() instead.
You're absolutely right we should define
#define MAX_UDELAY_MS 1
- in our asm/delay.h to work round this.
Thanks!
>
>I just made a quickhack to fix it in ./delay.h:
>
>static inline void udelay(unsigned long usecs)
>
>{
>
> /* (mjr) quickhack */
>
> u64 biggy = (u64)usecs * (u64)loops_per_jiffy;
>
> do_div (biggy, (1000000 / HZ));
>
> __delay((unsigned long)biggy);
>
>}
>
>Original code was:
>
>{
>
> extern unsigned long loops_per_jiffy;
>
> __delay(usecs * loops_per_jiffy / (1000000 / HZ));
>
>}
>
>Depends on 64-Bit operations my way is slower but works for me, because
>when a delay happen, mostly you have to wait a minimum of time and a bit
>more make it slow.If a wrap around happen, the time may go below the
>minimum and correct wait will fail. Shure: my quickhack may extend
>timeouts.
>
>I'm very interested in exchanging experiences about this....
Preferably I like to avoid this 64-bit math.
Maybe we should also add a __bad_delay() function.
-Michael
QuoteReplyEditDelete
2009-02-03 07:48:52 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Wolfgang Muees (GERMANY)
Message: 68758
Michael,
if you have a HZ Value of 100, and a Blackfin CPU with 600 MHz, then lpj = 600.000.000 / 100 = 6.000.000.
A udelay(1000) will compute 1000 x 6.000.000 = 6.000.000.000.
Max(unsigned long) = 4.294.967.296 ==> BOOM.
Calculating the __delay value from the precomputed cclk_per_usec value is simple and efficient. No overflows any more..
regards
Wolfgang
TranslateQuoteReplyEditDelete
2009-02-03 08:02:22 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Michael Hennerich (GERMANY)
Message: 68760
I see – you’re right.
However I don’t like the idea of not using the kernel managed lpj variable at all.
cached_cclk_per_usec is only approx = loops_per_jiffy / (1000000 / HZ)
The kernel time keeping infrastructure updates lpj, during CPU_FREQ transitions.
Using your introduced cached_cclk_per_usec would fail until someone calls get_cclk().
-Michael
QuoteReplyEditDelete
2009-02-03 08:34:29 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Marco Reppenhagen (GERMANY)
Message: 68764
Hi!
We can call get_cclk() during the calibration so the new variable is called once and known to all others.
After that I don't think that the calculated Value is less precise than the use of lpj. Far from it: For small values of udelay(x), where x is 1.. we save an ugly division and go into the asm-loop soon... The division in the original code is more a problem than the "inaccuracy of cclk", if there's something like that....
I think THIS fix works for everybody, even if someone uses low HZ and/or fast CPUs.
(What about BF533 with 750MHz or nest Generations >1GHz....?)
TranslateQuoteReplyEditDelete
2009-02-03 08:44:00 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Michael Hennerich (GERMANY)
Message: 68765
udelay(x) is guaranteed to wait at least x micro seconds.
If it takes longer no one should care!
Mathematically seen your precomputed cclk_per_usec value is similar to bracing the argument of __delay like that.
usecs * (loops_per_jiffy / (1000000 / HZ))
However this intentionally avoided since we’re losing accuracy.
This hurts especially when we don’t run CCLK at integers of MHz.
static inline void udelay(unsigned long usecs)
{
extern unsigned long loops_per_jiffy;
__delay(usecs * loops_per_jiffy / (1000000 / HZ));
}
-Michael
QuoteReplyEditDelete
2009-02-03 09:07:56 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Marco Reppenhagen (GERMANY)
Message: 68768
AHH! Now I see you're right!
But we have to do something, because it's not unusual to use HZ=100 on fast CPUs and it's a fact that udelay() fails long time before the 1000usec guarantee. "mdelay()" does everything, but won't wait for at least n-msecs ;-(
Maybe it's better to use my 64-bit stuff as a __bad_delay function, but all the mdelays in usb, sound, kernel and so on will fail as hitherto...
...a CONFIG_BAD_UDELAY , which switch to the "bad"solution ?
Shure, it's not as simple as that...
TranslateQuoteReplyEditDelete
2009-02-03 09:16:20 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Michael Hennerich (GERMANY)
Message: 68769
You're absolutely right - what we have today is broken.
Thanks for pointing this out!
I will commit something into svn branch and head that addresses this issue later today.
Currently evaluating different approaches.
-Michael
QuoteReplyEditDelete
2009-02-03 11:04:53 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Michael Hennerich (GERMANY)
Message: 68777
This seems to be a good tradeoff.
/*
* close approximation borrowed from m68knommu to avoid 64-bit math
*/
#define HZSCALE (268435456 / (1000000/HZ))
static inline void udelay(unsigned long usecs)
{
extern unsigned long loops_per_jiffy;
__delay((((usecs * HZSCALE) >> 11) * (loops_per_jiffy >> 11)) >> 6);
}
QuoteReplyEditDelete
2009-02-03 12:09:38 Re: udelay() / mdelay() broken? (BF527 - 530 MHz)
Mike Frysinger (UNITED STATES)
Message: 68786
no code should use those cached clock related values. only the internal clock functions where the values actually get cached. the meaning/usage of those variables may change over time and that can only be accomplished if their usage doesnt proliferate.