PGO Linker (BF516)

System Info:

Windows 7 Professional 64-bit

Service Pack 1

VDSP Product version: 5.0.10.0, IDDE Ver 8.0.7.15 (July 5, 2011).

Hello,

I read about the PGO linker in this article:

http://www.analog.com/static/imported-files/application_notes/EE_306_Rev_1.pdf

I have a project group consisting of several projects. One project is the start-up project and the other projects are library projects.

I have run profile guided optimization on one of the library projects in a simulator (a software audio codec). I then used the .pgo file to optimize that library project.

Now I want to make sure that code and data is placed in an optimal way, and that's when I stumbled into the above mentioned note about the PGO linker. How do I use the PGO linker to make sure that the code and data in the library project is placed optimally in memory ?

The code in the project group normally runs on a hardware board. One of the projects in the project group is a proprietary RTOS. The project that I want to optimize is used by some of the other projects in the project group when audio has to be encoded/decoded.

I followed these instructions in the EE-note, but I couldn't get it to work. It also seems that the instructions are outdated ?? I uploaded the DXE to H/W and ran the code for a while and then I manually halted the CPU and proceeded to step 5. I'm not sure if the PGO linker can only be used for code running in a simulator??

1. Load the program in VisualDSP++.

2. Choose Tools->Profiler to open the

Profiler window. (Not present in my version of VDSP. Why?)

3. Run the program with a sample input data

set. (Does this mean that the DXE has to be executed in a simulator???)

4. Wait until the program halts or is halted

manually. (I halted it manually)

5. Open a Windows Command Prompt window.  (OK)

6. Execute the PGO Linker utility with the

appropriate command-line arguments

(Listing 1). The tool produces an .asm file. (Tool hangs)

----------------------------------------------------

7. Include the generated .asm file in the project.

8. Rebuild the project.

Here's the console output:

C:\Program Files (x86)\Analog Devices\VisualDSP 5.0>PGOLinker.exe "C:\code\program.dxe" "program.asm"

The command line options are configured as follows:-

DSP Executable --> C:\code\program.dxe

Linker directive Map File --> program.asm

Minimum L1 size selected --> 4

Maximum L1 size selected --> 80

L1 memory incremented in steps of --> 4

Algorithm Selected--> O2

Connecting to the IDDE and loading Program

This application has requested the Runtime to terminate it in an unusual way.

Please contact the application's support team for more information.

  • 0
    •  Analog Employees 
    on Mar 1, 2013 12:16 PM

    Hi,

    I'm not particularly familiar with the PGOLinker, as it isn't part of the VisualDSP++ toolchain - it is a plugin developed by one of our application engineers as an automated assistant for optimal linking. I certainly can't answer all your questions, but I will ask the author of the EE-Note and plugin to take a look at this thread to offer his assistance.

    First, regarding the lack of the 'Tools: Profiler' menu option. I believe the author was trying to generalise to avoid having to say something like "'Tools: Statistical Profiler' for Hardware Sessions, and 'Tools: Linear Profiler' for Simulator Sessions". Select whichever profiler is available for your session.

    Similarly, the EE-Note does say "Used in all VisualDSP++ sessions (simulator, compiled simulator, and emulator)", under the 'Features' section of the EE-Note, so it is not limited to the Simulator. Therefore, running with a Sample Data Set simply means executing the application in a manner that is representative of a typical execution, so that the PGO Linker is using execution data from a typical run to determine which code gets executed most often.

    One other thing I would note is that as you are trying to determine the placement for a library project, there will be additional caveats. Firstly, a Library project does not invoke the Linker and does not produce a DXE file, so in order to profile the Library, you would need to profile it as part of an application that demonstrates a typical execution. Depending on how many other libraries/modules are being included in your application, the results from a single library run the risk of being redundant by the time they are included in your final application: for example the most-called-function of a Library should - as far as optimal linking is concerned - be placed in L1 memory. But, once included in your final application, if that library's functions only account for 2% of the total execution of the application, it would likely be wasteful to place that function in L1 as there are more often called functions elsewhere in the application that may benefit better from placement in L1.

    It really only makes sense to try and determine the ideal linking requirements of a complete application.

    Regarding the PGO Linker itself, as I say I will ask the author to take a look at this thread. I would note that it has not been updated since 2006, and was written for/tested against VisualDSP++ 4.0. There may have been changes in the VisualDSP++ toolset over the years that affect the correct execution of the PGO Linker.

    Regards,

    Craig.

  • 0
    •  Analog Employees 
    on Mar 12, 2013 12:05 PM

    Hi,

    Apologies for the delay on this one, I have been out of the office on business. I have spoken with the author of that application note, and he does not believe the PGOLinker has been kept current with VisualDSP++ released.

    Investigating why it no longer works with recent VisualDSP++ releases may take some time, unfortunately, as would any updates to the utility.

    In the mean time we do not have any alternative but to recommend that you do not use the utility and instead rely on the results of the Statistical Profiler to offer guidance for you to manually control the placement of your code for improved performance.

    We can advise further on the methods for controlling code/data placement, if you require information on these techniques.

    Regards,

    Craig.

  • Hi Craig,

    Thank you for your reply.

    Please advise further on the methods for controlling code/data placement. As I understand there is internal memory (on-chip memory) and external memory. The internal memory can be configured (segmented) to hold on or more of the following: data and/or code and/or cache and/or stack and/or heap, right?

    So what should I be looking at when I decide what goes where?


  • 0
    •  Analog Employees 
    on Mar 14, 2013 12:16 PM

    Hi,

    The introduction to the EE-Note summarises the basic considerations for code placement, in that wherever possible you should place time-critical as well as most frequently executed code in L1 memory. Similarly, for any data required by these functions (whether it is a heap, or global variables such as data buffers) L1 placement is preferred to avoid the latencies associated with external memory access.

    Lower priority code/data can be placed in external memory. If a large amount of code/data is being placed in SDRAM, and the performance takes too big a hit, you should consider making use of the instruction and/or data caches.

    Be aware that the caches use a portion of L1 memory for the storage of cached code/data. So, in enabling the instruction cache you reduce the amount of code that can permanently reside in L1 Code memory by a third (from 48K to 32K). The data cache is slightly more flexible in that you have two Data banks on the BF516, and each of them can be configured as either all-data, or a 50/50 split of data and cache. This allows configurations of:

    1. Two data banks of 32K each
    2. One 32K data bank, one 16K data bank and one 16K data cache
    3. Two 16K data banks and two 16K data caches

    Once a cache miss occurs and code is brought in from L3 memory to L1 memory. You take a hit in cycles while the cache line is filled, but then performance is at L1 memory speeds. This improves the average execution time. That said, you should consider whether the trade-off is worth it, and whether having 48K of L1 code space is more beneficial to your application (in terms of placement of frequently-executed/time-critical code) than reducing that to 32K to add caching of the code in L3.

    As you develop a clear idea of the code/data that is best placed in L1 memory, you can use the 'section' pragmas (#pragma section() and #pragma default_section()) to control the placement of your code and global data. The LDF is littered with default section names in INPUT_SECTIONS commands, or you can add your own. The section names in an INPUT_SECTIONS command are those in the position highlighted in red below.

    INPUT_SECTIONS($OBJECTS(L1_code) $LIBRARIES(L1_code))

    To place a global symbol (i.e. function, global variable) in a specific memory location, you can add the section pragma:

    #pragma section("L1_code")

    void myFunction(void){...}

    If you use a custom section name, you can gain greater control over the placement of your code and data by also adding your own custom 'output section commands' in the LDF. As the LDF is processed line-by-line, you can create a custom output section or two at the start of the SECTIONS{...} block of the LDF to ensure that your symbols get priority for placement by coming first in the LDF. So you could do something like:

    #pragma section("MyData_in_L1")

    int MyBuffer[1000];

    #pragma section("MyCode_in_L1")

    void MyFunction(void){...}

    Then in the LDF, find an appropriate place for adding some output section commands. If you use a System Builder generated LDF (one added during the New Project wizard, or via 'Project: Project Options: Add Startup Code/LDF') you should only make changes within the generated $VDSG blocks. Fortunately, there is one right at the start of the SECTIONS{...} block of the LDF that you can use. To map the code and data above, we would add two output sections:

    my_l1_data

    {

      INPUT_SECTION_ALIGN(4)

       INPUT_SECTIONS($OBJECTS(MyData_in_L1) $LIBRARIES(MyData_in_L1))

    } > MEM_L1_DATA_A

    my_l1_code

    {

       INPUT_SECTION_ALIGN(4)

      INPUT_SECTIONS($OBJECTS(MyCode_in_L1) $LIBRARIES(MyCode_in_L1))

    } > MEM_L1_CODE_A

    As you mentioned using Libraries in your project, it is important to make sure that the libraries are part of the $LIBRARIES macro if you use the command above. The $OBJECTS macro picks up the command-line objects (those produced by building the source files that are part of the actual project. The $LIBRARIES macro is defined in the LDF, and can be augmented with additional libraries (again using the $VDSG sections in the appropriate location).

    You can, actually, call out specific files individually, though. For example if your code just uses the default section names (i.e. you have not used the section pragmas to 'tag' your code with a custom name) you could use a similar approach above in adding the custom output sections in the LDF, but call out your specific libraries or objects:

    INPUT_SECTIONS(MyLibrary.dlb(program))

    Hope that helps. If you have any other questions, or want to know about any of these commands in more detail let me know. They are also described in detail in the Linker and Utilities Manual.

    Regards,

    Craig.