2008-11-28 09:51:27 CPLB handler improvements
Michael McTernan (UNITED KINGDOM)
Message: 65965
On our heavily loaded BF537 system with 128Meg SDRAM we get loads of DCPLB misses, so I tried to make the noMPU miss handler a bit more efficient to squeeze a more from the platform.
I ended up re-writing the handler in C, and changing some operations
- evicted CPLB entries are placed into a MRU list for faster searching
- CPLB entries are still evicted in a round-robin schedule, but entries are replaced without copying up the table
- locked CPLB entires are stored in the top of the table allowing the eviction to compare with 0 when wraping through the table
I waggled a GPIO line in _ex_icplb_miss/_ex_dcplb_miss and used a scope to check the execution duration of the assembly implementation against the one I did in C to see if it was any better under a typical workload we run on our plaform.
The min/mean/max execution durations as measured over about 15,000 CPLB misses are given in the following table in microseconds:
New C Original ASM
Min 559 1354
Average 665 2694
Max 1345 3820
I think the C implementation is therefore much faster, about x4 in the average case. The attached plot of these values and scope traces also show this, as well as /proc/cplbinfo output which shows the MRU list.
I'd quite like the changes included in the kernel and given wider inspection/testing, but I've made them against the 2008 SVN branch and not sure if there are other plans in this area in anycase.
Is it worth submitting this patch, and if so, should I work to integrate it to trunk or are the changes on 2008 good enough?
C-cplbinfo.txt
C_CPLB.png
ASM_CPLB.png
comparison.pdf
QuoteReplyEditDelete
2008-11-28 10:01:54 Re: CPLB handler improvements
Bernd Schmidt (GERMANY)
Message: 65966
That sounds fantastic. Please send the patch to our development mailinglist for review.
It's doubtful that we will put it into 2008R1, but this sounds like it would be nice to have in trunk.
QuoteReplyEditDelete
2008-11-28 11:05:55 Re: CPLB handler improvements
Michael McTernan (UNITED KINGDOM)
Message: 65970
Cool. I've added the patch to the tracker, I think that's what you mean:
I've also attached it here for completeness of this thread. It's implemented as a kconfig option along side the no-MPU and MPU handlers, so should have minmal impact on the 2008 branch, but if you have bug fix only policy that's fine.
Since the CPLB handler only really comes into play when theres a lot of memory and things going on, will you be able to test this or would ADI need a board?
c-cplb-08.patch
QuoteReplyEditDelete
2008-11-28 12:16:36 Re: CPLB handler improvements
Robin Getz (UNITED STATES)
Message: 65974
Michael:
I think we have the capability to test. Thanks for sending back.
Right now, it looks as if - you are protecting the entire function (cplb_miss) with caches turned off - I thought you should be able to do everything (except the CPLB writes) with things On - which is fewer instructions - maybe faster. As far as I remember/can tell - reading the CPLB_addr/data registers is OK to do while the MPU is active. It is only when updating things that the MPU must be turned off.
Also your enable/disable_[id]cplb needs to be aligned to the start of a cache line. (look in the existing code for the align 8). (Although - why we turn interrupts on/off makes no sense, since we are running at EVT3 already). I think the SSYNC() will insert some CLI/STI instructions as well. I don't know if it makes sense to try to do this without by writing some assembly (for those functions).
I will have a more detailed look on Monday.
QuoteReplyEditDelete
2008-11-28 12:45:11 Re: CPLB handler improvements
Michael McTernan (UNITED KINGDOM)
Message: 65977
> It is only when updating things that the MPU must be turned off.
The hardware ref manual says that too. I think I did it like that to keep cplb_miss() cleaner, but you make a good point about the cache being off - I always compile the CPLB miss handlers in L1 SRAM though, so guess I didn't see the downside of disabling cache. So it won't get faster, but would be slower if moved outside L1. A slight restructure there is in order.
> Also your enable/disable_[id]cplb needs to be aligned to the start of a cache line.
I took the enable/disable functions from the MPU version and saw no special alignment there. If this is a requirement, a shared assembly implementation of the functions is probably sensible. I'll take a look.
> I will have a more detailed look on Monday.
Cool - many thanks for your time.
QuoteReplyEditDelete
2008-11-28 14:49:14 Re: CPLB handler improvements
Michael Hennerich (GERMANY)
Message: 65983
Yeah - a BF533-STAMP might do the trick here (128MBytes SDRAM).
I think the align 8 around writes to IMEM_CONTROL and DMEM_CONTROL was an early Edinburgh rev 0.2 and below thing.
Our current Cache flush and MPU control code doesn’t do that anymore, it also disappeared from the HRMs.
-Michael
QuoteReplyEditDelete
2008-11-28 15:54:12 Re: CPLB handler improvements
Michael McTernan (UNITED KINGDOM)
Message: 65989
From the code I think the align is needed for anomaly 05000125. This is handled via differing definitions of bfin_write_DMEM_CONTROL() based on the anomaly switch, so both the MPU CPLB handler and my implementation are transparently okay whether or not this is present.
I did however notice that the anomaly version of the macro does the SSYNC's, so I've changed my implementation to avoid excessive SSYNCs in that case.
Attached is an updated version of the patch. It avoids the excessive SSYNCs, and also keeps the CPLBs enabled while handling faults to get benefit from the cache in the case that the handler is not in L1, as pointed out by Robin. I don't have a scope to hand, but doubt the execution speed is much changed, except in the case the code is outside L1 SRAM.
c-cplb-08-1.patch
QuoteReplyEditDelete
2008-11-28 15:59:21 Re: CPLB handler improvements
Mike Frysinger (UNITED STATES)
Message: 65990
anomaly 05000125 exists on processors/revisions we do not support. those revisions are known to crash due to other anomalies, so there is no point in continuing to handle it.
QuoteReplyEditDelete
2008-11-28 20:42:57 Re: CPLB handler improvements
Robin Getz (UNITED STATES)
Message: 65993
Michael:
I'm not sure what you mean?
rgetz@imhotep:~/blackfin/trunk/uClinux-dist/linux-2.6.x> grep bfin_write_DMEM_CONTROL arch/blackfin/* -R
arch/blackfin/include/asm/cdef_LPBlackfin.h:#define bfin_write_DMEM_CONTROL(val) bfin_write32(DMEM_CONTROL,val)
There isn't any anomaly handling there. (As Mike indicated, it was removed from trunk, since it was dead code), and when the anomaly isn't there (on the branch), there is no SSYNC applied.
QuoteReplyEditDelete
2008-11-30 12:24:47 Re: CPLB handler improvements
Michael McTernan (UNITED KINGDOM)
Message: 66042
Robin - the anomaly handling can be seen in older code e.g. the 2008 branch, arch/blackfin/mach-common/cacheinit.S. But I think this is now a distraction anyway - I was just answering your point about needing .align 8. From the older code, it's clear that this was only for anomaly 05000125, confirming Michael Hennerich's thoughts about it being an "early Edinburgh rev 0.2 and below thing".
Also don't forget that the patch is currently against the 2008 branch, where handling for 05000125 is still present.