2009-04-27 10:49:33 Bus Loading
Gopal Karanam (INDIA)
Message: 73292
Hello,
I've to measure the bus loading due to running uClinux under cache. I wanted to get a second opinion here.
I understand the theoretical L3 throughput is affected by refresh time, pre-charge time, etc. We can only do a relative comparison.
I'm thinking of a stand alone application (using elf or VDSP) running out of L1 and does a read or write from L3. We can compare the cycles to do the same under uClinux. Will this be a good measure?
- Gopal
QuoteReplyEditDelete
2009-04-27 11:02:49 Re: Bus Loading
Robin Getz (UNITED STATES)
Message: 73294
Gopal:
No.
Can you decribe what you are actually trying to do - and we may be able to help out or point you in the right direction.
-Robin
QuoteReplyEditDelete
2009-04-27 13:33:20 Re: Bus Loading
Gopal Karanam (INDIA)
Message: 73307
Robin,
I did DMA wait cycle measurement on VDSP and uClinux. The numbers on uClinux is around 40-50% more. I've scheduler, IRQ, etc in L1. In order to estimate the performance of the modules on uClinux we want to have an idea of the bus loading. We have clear picture in case of stand alone applications and can easily estimate MIPS even before running it on silicon.
I know there are a lot of variables in case of uClinux but we should be able to get some initial estimate even if its like 10% off from the actual one.
- Gopal
QuoteReplyEditDelete
2009-04-27 13:38:32 Re: Bus Loading
Mike Frysinger (UNITED STATES)
Message: 73308
you can move some pieces of the kernel into on-chip memory, but you'll never be able to fit it all. there will always be SDRAM activity due to the kernel and userspace.
QuoteReplyEditDelete
2009-04-27 14:07:42 Re: Bus Loading
Gopal Karanam (INDIA)
Message: 73312
This is exactly what I want to quantify. We want to reserve some resources (like part of L1 Code) for uClinux and quantify the bus loading so we can estimate the performance before we actually do the exercise. We want to say "for a system with so & so capabilities the performance will be x% than on a stand alone version".
I assume I'm not the only one looking for answers to such questions.
QuoteReplyEditDelete
2009-04-27 14:08:36 Re: Bus Loading
Robin Getz (UNITED STATES)
Message: 73313
Gopal:
> I did DMA wait cycle measurement on VDSP and uClinux.
I have no idea what a "wait cycle measurement" is. Again - can you describe _what_ you are trying to do? not describe some meaningless tests you did.
-Robin
QuoteReplyEditDelete
2009-04-27 14:28:33 Re: Bus Loading
Robin Getz (UNITED STATES)
Message: 73314
Gopal:
You are the only one.
Most people don't look at it like that since they are not taking things from a non-Linux evironment into a Linux environement. Most people pick the environment/OS, and then the codebase.
If you want to look at processor load from a given application while the kernel is running - use getrusage().
-Robin
QuoteReplyEditDelete
2009-04-27 14:43:23 Re: Bus Loading
Gopal Karanam (INDIA)
Message: 73316
Robin,
The people you describe are those who develop products and pick the OS. We develop basic building blocks (libraries) and we don't restrict them to be used to any particular environment. We want to reuse our work and effort. When we provide these modules we would want to provide rough performance metrics. We cannot just say use this and go figure out the performance possible.
Those tests maybe meaningless to you but gives us great understanding of the system we are dealing with. DMA wait cycles is the number of cycles to do a DMA operation.
1. Setup Descriptors
2. Start DMA
3. Wait for DMA to complete.
DMA wait cycles is the number of cycles processor loops in step 3.
From this measurement we can arrive at the minimum MIPS achievable for a module.
Gopal
QuoteReplyEditDelete
2009-04-27 15:21:20 Re: Bus Loading
Mike Frysinger (UNITED STATES)
Message: 73317
your libraries wont work under Linux then because the assumption on your part is that you have access to supervisor-only resources -- you dont
QuoteReplyEditDelete
2009-04-27 15:34:32 Re: Bus Loading
Robin Getz (UNITED STATES)
Message: 73318
Gopal:
I'm not discounting the work that you do - we just need to better understand what you are trying to do/measure.
As I'm sure that you are aware - saying that XXX algorithm will only consume YYY MIPS is a lie in any environment which is multithreaded, and uses data/instruction cache. The performance is too dependent on what else is going on in the system. Whether or not it takes X cycles or 2 time X cycles for a DMA transaction to complete is a system level issue - which you can't determine in a general purpose OS environment, where the end user can run anything else they want to in the system, or where the system must respond to other real world events (like a ping flood).
We can do all sorts of things (and have done so) to reduce the overhead of running in a multi-threaded cache based environment - but it still depends on what you are really trying to do, and how it fits in with the rest of the Linux infrastructure.
Are you doing the DMA from userspace or kernel space? What else is running on the system? What is active externally? When you have been doing your measurements - which phase is taking longer? (setup, start, or transfer?)
Thanks