2009-04-27 10:49:33     Bus Loading

Document created by Aaronwu Employee on Aug 15, 2013
Version 1Show Document
  • View in full screen mode

2009-04-27 10:49:33     Bus Loading

Gopal Karanam (INDIA)

Message: 73292   

 

Hello,

 

I've to measure the bus loading due to running uClinux under cache. I wanted to get a second opinion here.

 

I understand the theoretical L3 throughput is affected by refresh time, pre-charge time, etc. We can only do a relative comparison.

 

I'm thinking of a stand alone application (using elf or VDSP)  running out of L1 and does a read or write from L3. We can compare the cycles to do the same under uClinux. Will this be a good measure?

 

- Gopal

 

 

QuoteReplyEditDelete

 

 

2009-04-27 11:02:49     Re: Bus Loading

Robin Getz (UNITED STATES)

Message: 73294   

 

Gopal:

 

No.

 

Can you decribe what you are actually trying to do - and we may be able to help out or point you in the right direction.

 

-Robin

QuoteReplyEditDelete

 

 

2009-04-27 13:33:20     Re: Bus Loading

Gopal Karanam (INDIA)

Message: 73307   

 

Robin,

 

I did DMA wait cycle measurement on VDSP and uClinux. The numbers on uClinux is around 40-50% more. I've scheduler, IRQ, etc in L1. In order to estimate the performance of the modules on uClinux we want to have an idea of the bus loading. We have clear picture in case of stand alone applications and can easily estimate MIPS even before running it on silicon.

 

I know there are a lot of variables in case of uClinux but we should be able to get some initial estimate even if its like 10% off from the actual one.

 

- Gopal

QuoteReplyEditDelete

 

 

2009-04-27 13:38:32     Re: Bus Loading

Mike Frysinger (UNITED STATES)

Message: 73308   

 

you can move some pieces of the kernel into on-chip memory, but you'll never be able to fit it all.  there will always be SDRAM activity due to the kernel and userspace.

QuoteReplyEditDelete

 

 

2009-04-27 14:07:42     Re: Bus Loading

Gopal Karanam (INDIA)

Message: 73312   

 

This is exactly what I want to quantify. We want to reserve some resources (like part of L1 Code) for uClinux and quantify the bus loading so we can estimate the performance before we actually do the exercise. We want to say "for a system with so & so capabilities the performance will be x% than on a stand alone version".

 

I assume I'm not the only one looking for answers to such questions.

QuoteReplyEditDelete

 

 

2009-04-27 14:08:36     Re: Bus Loading

Robin Getz (UNITED STATES)

Message: 73313   

 

Gopal:

 

> I did DMA wait cycle measurement on VDSP and uClinux.

 

I have no idea what a "wait cycle measurement" is. Again - can you describe _what_ you are trying to do? not describe some meaningless tests you did.

 

-Robin

QuoteReplyEditDelete

 

 

2009-04-27 14:28:33     Re: Bus Loading

Robin Getz (UNITED STATES)

Message: 73314   

 

Gopal:

 

You are the only one.

 

Most people don't look at it like that since they are not taking things from a non-Linux evironment into a Linux environement. Most people pick the environment/OS, and then the codebase.

 

If you want to look at processor load from a given application while the kernel is running - use getrusage().

 

-Robin

QuoteReplyEditDelete

 

 

2009-04-27 14:43:23     Re: Bus Loading

Gopal Karanam (INDIA)

Message: 73316   

 

Robin,

 

The people you describe are those who develop products and pick the OS. We develop basic building blocks (libraries) and we don't restrict them to be used to any particular environment. We want to reuse our work and effort. When we provide these modules we would want to provide rough performance metrics. We cannot just say use this and go figure out the performance possible.

 

Those tests maybe meaningless to you but gives us great understanding of the system we are dealing with. DMA wait cycles is the number of cycles to do a DMA operation.

 

1. Setup Descriptors

 

2. Start DMA

 

3. Wait for DMA to complete.

 

DMA wait cycles is the number of cycles processor loops in step 3.

 

From this measurement we can arrive at the minimum MIPS achievable for a module.

 

Gopal

QuoteReplyEditDelete

 

 

2009-04-27 15:21:20     Re: Bus Loading

Mike Frysinger (UNITED STATES)

Message: 73317   

 

your libraries wont work under Linux then because the assumption on your part is that you have access to supervisor-only resources -- you dont

QuoteReplyEditDelete

 

 

2009-04-27 15:34:32     Re: Bus Loading

Robin Getz (UNITED STATES)

Message: 73318   

 

Gopal:

 

I'm not discounting the work that you do - we just need to better understand what you are trying to do/measure.

 

As I'm sure that you are aware - saying that XXX algorithm will only consume YYY MIPS is a lie in any environment which is multithreaded, and uses data/instruction cache. The performance is too dependent on what else is going on in the system. Whether or not it takes X cycles or 2 time X cycles for a DMA transaction to complete is a system level issue - which you can't determine in a general purpose OS environment, where the end user can run anything else they want to in the system, or where the system must respond to other real world events (like a ping flood).

 

We can do all sorts of things (and have done so) to reduce the overhead of running in a multi-threaded cache based environment - but it still depends on what you are really trying to do, and how it fits in with the rest of the Linux infrastructure.

 

Are you doing the DMA from userspace or kernel space? What else is running on the system? What is active externally? When you have been doing your measurements - which phase is taking longer? (setup, start, or transfer?)

 

Thanks

Attachments

    Outcomes