Post Go back to editing

cfftf() and ifftf() fast complex packing and unpacking

Thread Summary

The user is facing issues with the radix-2 cfftf() and ifftf() functions in the CCES / SHARC library, which use separate vectors for real and imaginary components. The final answer suggests writing custom functions to handle complex operations with separate real and imaginary arrays, using compiler pragmas for optimization. The accompanying answer mentions that the query is being checked internally for further responses.
AI Generated Content
Category: Software
Product Number: ADSP-21479

I'm utilizing the optimized radix-2 cfftf() and ifftf() functions in the CCES / SHARC library.  These function are leaving me in a lurch because they use separate vectors for real and imaginary components, however there are no corresponding optimized vector functions that use this format.  Rather, they take complex_float format.  So in order to apply a basic filter in the frequency domain and convert back, something like so is needed:

  cfftf()    // Convert to freq domain representation

  [...]  // Pack from float real, float imag to complex_float

  cvecmltf()    // Apply filter

   [...]    // Unpack from complex_float to float real, float imag

   ifftf()   // Convert to time domain  

   [...]  // Pack from float real, float imag to complex float

What is needed are complex functions (like cvecmltf) that take component values instead of complex_float.  Alternatively, a highly optimized vector pack and unpack to and from complex float is needed. 

I can write the SHARC assembly to do this, but was hoping to avoid it, or to find something in the library function I have overlooked.

Any suggestions?   

 

Thread Notes

Parents
  • Hi,

    Thank you for your inquiry.

    We are checking this query internally now. We will get back to you once we get a response from them.

    Best Regards,
    Santhakumari.V

  • Hi,

    Chris is correct - there are no complex functions that operate on inputs where the real and complex parts are in separate arrays. As he says, one solution is to write functions to pack and unpack to/from complex_float. However, another solution might be to write versions of the complex functions that take the inputs as separate arrays - with the appropriate pragmas and optimization enabled, these functions might give better overall performance than using the library functions with code to pack/unpack. For example:

    // Perform a complex float multiply where the real and imaginary parts of the inputs and
    // outputs are in separate arrays
    void alt_cvecvmlt(float *x_r, float *x_i, float *y_r, float *y_i, float *out_r, float *out_i, int size)
    {
      #pragma no_alias
      #pragma loop_count(2, 10000, 2)
      for (int i = 0; i < size; i++) {
        out_r[i] = x_r[i] * y_r[i] - x_i[i] * y_i[i];
        out_i[i] = x_r[i] * y_i[i] + x_i[i] * y_r[i];
      }
    }


    This function will execute 2 iterations of the loop in 6 cycles.


    It's worth explaining the pragmas that are used, as these have a significant impact on the performance.

    • "no_alias" tells the compiler that the output arrays do not overlap with any other arrays
    • "loop_count" tells the compiler that 'size' will be greater than 2 and always a multiple of 2


    You would need to confirm that these pragmas can be used in your code.

    I haven't looked at the overall performance of this approach vs Chris's approach - it's just something that might be worth investigating. Also, if Chris is using several complex functions, then more work would be required to write versions of them, and it might not be worthwhile.

    Thanks,
    Kenny

Reply
  • Hi,

    Chris is correct - there are no complex functions that operate on inputs where the real and complex parts are in separate arrays. As he says, one solution is to write functions to pack and unpack to/from complex_float. However, another solution might be to write versions of the complex functions that take the inputs as separate arrays - with the appropriate pragmas and optimization enabled, these functions might give better overall performance than using the library functions with code to pack/unpack. For example:

    // Perform a complex float multiply where the real and imaginary parts of the inputs and
    // outputs are in separate arrays
    void alt_cvecvmlt(float *x_r, float *x_i, float *y_r, float *y_i, float *out_r, float *out_i, int size)
    {
      #pragma no_alias
      #pragma loop_count(2, 10000, 2)
      for (int i = 0; i < size; i++) {
        out_r[i] = x_r[i] * y_r[i] - x_i[i] * y_i[i];
        out_i[i] = x_r[i] * y_i[i] + x_i[i] * y_r[i];
      }
    }


    This function will execute 2 iterations of the loop in 6 cycles.


    It's worth explaining the pragmas that are used, as these have a significant impact on the performance.

    • "no_alias" tells the compiler that the output arrays do not overlap with any other arrays
    • "loop_count" tells the compiler that 'size' will be greater than 2 and always a multiple of 2


    You would need to confirm that these pragmas can be used in your code.

    I haven't looked at the overall performance of this approach vs Chris's approach - it's just something that might be worth investigating. Also, if Chris is using several complex functions, then more work would be required to write versions of them, and it might not be worthwhile.

    Thanks,
    Kenny

Children
No Data