2008-12-12 06:21:53     CPLB handler improvements (re-benchmarked)

Document created by Aaronwu Employee on Aug 9, 2013
Version 1Show Document
  • View in full screen mode

2008-12-12 06:21:53     CPLB handler improvements (re-benchmarked)

Michael McTernan (UNITED KINGDOM)

Message: 66661   




I've reprofiled Bernd's latest patches and the results are awesome:





      cplb-c-5.diff     Original ASM     Difference

Min     434     1321     33%

Average     462     2960     16%

Max     1045     4113     25%


cplb-c-5.diff - CPLB not in SRAM     


      cplb-c-5.diff     Original ASM     Difference

Min     425     1755     24%

Average     477     12252     4%

Max     1823     18553     10%


(All numbers in nano-seconds)


Attached are some graphs of this, although I think it's pretty clear that the C implementation blows the socks of the ASM version.


Bernd - it's also pretty clear that you've improved on my implementation (both from the code and the stats), so I didn't do a run of my patch vs yours, but from my original data I was getting the average case to be 25% of the ASM implementation, where as you get down to 16% in the comparable version.




What else needs to be done before the patch can get into trunk?  I'm going to run it on my 2008 branch over here for a while to ensure there's nothing unexpected, but so far it's flying - many thanks!










2008-12-12 08:12:35     Re: CPLB handler improvements (re-benchmarked)

Bernd Schmidt (GERMANY)

Message: 66663   


The improvement over the old version should in fact be even greater, since the register saving entry/exit code in entry.S has been reduced.  That happens outside of your profiling points (which is as it should be, to allow comparison with your earlier patch).


Since the results are good, I'll port it to trunk and check it in today or early next week.






2008-12-15 07:53:08     Re: CPLB handler improvements (re-benchmarked)

Michael McTernan (UNITED KINGDOM)

Message: 66713   


> Since the results are good


They are really awesome!  Must be infrequent that handcoded assembly is converted to C to see a 6x speedup.


Many thanks for your work on this.