Post Go back to editing

Cycle Count Difference in SHARC-FX When Function Definition and Call Are in Separate Files

Category: Hardware
Product Number: ADSP-21835

Dear Team,

I am working in 21834 processor (SHARC-FX), in simulation I am observing a difference in cycle counts when running the same function in two scenarios:

Are there any specific compiler settings, optimization behaviors, or inlining considerations in SHARC-FX that could cause such differences?

Please share the root cause of this behavior.

Thanks, and regards

Franky45

Thread Notes

Parents
  • Hi,

    We are also observing the same behavior. We will get back to you early next week.

    Regards,
    Santhakumari.V

  • Hi,

    We recommend you to use XT_RSR_CCOUNT() built-in function to measure the cycle count in SHARC-FX.

    Below is the sample snippet for your reference.

    #include <stdio.h>
    #include <xtensa/tie/xt_timer.h>

    void myfunc(void);

    int main() {
    unsigned int cycles = XT_RSR_CCOUNT(); // start count
    myfunc();
    cycles = XT_RSR_CCOUNT() - cycles; // end count - start count
    printf("myfunc took %u cycles\n", cycles);
    return 0;
    }

    void myfunc(void){
    int age;
    int currentYear = 2025;
    int birthYear = 1995;
    age = currentYear - birthYear;

    }

    Hope this helps!

    Regards,
    Santhakumari.V

  • Hi Santhakumari,

    I have three questions regarding cycle counts:

    • I calculated the cycle counts using the built-in function XT_RSR_CCOUNT(), but I am still observing inconsistencies. When the function call and function definition are in the same file, I measure 23 cycles, whereas when they are in different files, the cycle count increases to 49 cycles. Could you explain why this difference occurs?

    • In SHARC+, we used CYCLES_START() and CYCLES_STOP() to measure cycles, but for SHARC-FX you recommend using the XT_RSR_CCOUNT() built-in function. Could you explain the reason for preferring XT_RSR_CCOUNT() on SHARC-FX?

    • I am seeing higher cycle counts on SHARC-FX for scalar-based operations compared to SHARC+. On SHARC+, the operation takes 5 cycles, whereas on SHARC-FX it takes 22 cycles. Since SHARC-FX is known for optimization, what steps should I follow to optimize these types of operations?

    Please help clarify these points. I had also attached the project files for your reference.

    ADSP-21834_Scalar_based_operation.zipADSP-21593_Scalar_based_operation.zipADSP-21834_Function_and_Call_same_file.zipADSP-21834_Function_and_Call_different_file.zip

  • Hi Santhakumari,

    Could you please share an update on this? We need the information urgently. Thank you for your understanding.

    Regards,

    Franky45

  • Hi Franky,

    We have simulated this issue in our end and working internally on this.

    We will get back to you as soon as possible.

    Regards,
    Santhakumari.V

  • Hi Santhakumari,

    Could you please share an update on this?
    Thank you for your understanding.

    Regards,

    Franky45

  • Hi,

    Apologies for the delay in response.

    Regarding Questions 1 and 3 -> We are checking this with our internal team and will get back to you as soon as we get a response from them.

    2 -> The main difference in SHARC-FX and why cycle count call takes longer is due to the differences in Processor architecture and that an extension to the original SHARC compatible implementation was made to support SHARC-FX. However SHARC-FX supports assembly to do cycle count reading directly through a XT_RSR_CCOUNT() call which reads a register using an assembly instruction which is more optimised.

    The overhead for the cycle count macro is much larger than it needs to be as it was designed to mimic the setup process and maintain similarity to the SHARC implementation as it was part of the migration for sharc+ projects.

    Regards,
    Santhakumari.V

  • Hi,

    Please find the below holding response from internal team.

    We suspect that MMR writes on EHP stall until the register returns an acknowledgement while SHARC+ writes are fire-and-forget. We also found that a read of the same register takes 43 cycles.

    Could you please let us know how important is it that its fast to write? Does time matter for you? It takes a little longer to configure the DMA, but surely that isn’t done often.

    Internal team is working on it now and we will reply back, once we get a response from them.

    Thanks for your understanding.

    Regards,
    Santhakumari.V

Reply
  • Hi,

    Please find the below holding response from internal team.

    We suspect that MMR writes on EHP stall until the register returns an acknowledgement while SHARC+ writes are fire-and-forget. We also found that a read of the same register takes 43 cycles.

    Could you please let us know how important is it that its fast to write? Does time matter for you? It takes a little longer to configure the DMA, but surely that isn’t done often.

    Internal team is working on it now and we will reply back, once we get a response from them.

    Thanks for your understanding.

    Regards,
    Santhakumari.V

Children
  • Hi Santhakumari,

    1) Regarding MMR writes on EHP stalling until acknowledgement versus SHARC+ fire-and-forget behavior:

    Why do MMR writes on EHP stall until the register returns an acknowledgement, whereas SHARC+ writes are fire-and-forget and do not wait for any acknowledgement? Specifically, what mechanism allows SHARC+ to proceed without waiting, and why is the acknowledgement mandatory on SHARC-FX? Where we can find the details about this difference in architecture is there any specific documents to be referred

    2) Regarding the reference to DMA configuration:

    Our earlier question was not specific to DMA configuration. The main concern we raised in the previous discussion was about higher cycle counts observed on SHARC-FX when the function call and function definition are placed in different files, compared to when both are in the same file.

    The mention of DMA configuration seems unrelated to this particular observation.

    Our intent is to understand:

                 Why the cycle count increases when the function call and function definition are placed in different files, compared to when both are in the same file on SHARC-FX

    Clarification on this aspect would help us correctly attribute the root cause of the increased cycle counts we are measuring.

    Looking forward to your insights.

    Regards,

    Franky45