I'm utilizing the optimized radix-2 cfftf() and ifftf() functions in the CCES / SHARC library. These function are leaving me in a lurch because they use separate vectors for real and imaginary components, however there are no corresponding optimized vector functions that use this format. Rather, they take complex_float format. So in order to apply a basic filter in the frequency domain and convert back, something like so is needed:
cfftf() // Convert to freq domain representation
[...] // Pack from float real, float imag to complex_float
cvecmltf() // Apply filter
[...] // Unpack from complex_float to float real, float imag
ifftf() // Convert to time domain
[...] // Pack from float real, float imag to complex float
What is needed are complex functions (like cvecmltf) that take component values instead of complex_float. Alternatively, a highly optimized vector pack and unpack to and from complex float is needed.
I can write the SHARC assembly to do this, but was hoping to avoid it, or to find something in the library function I have overlooked.
Any suggestions?
Divya.P - Moved from SHARC Processors to CrossCore Embedded Studio and Add-ins. Post date updated from Monday, June 30, 2025 5:58 PM UTC to Tuesday, July 1, 2025 4:40 AM UTC to reflect the move.
cconrad - Moved from CrossCore Embedded Studio and Add-ins to SHARC Processors. Post date updated from Tuesday, July 1, 2025 4:40 AM UTC to Tuesday, July 1, 2025 2:09 PM UTC to reflect the move.
cconrad - Moved from CrossCore Embedded Studio and Add-ins to SHARC Processors. Post date updated from Tuesday, July 1, 2025 2:09 PM UTC to Tuesday, July 1, 2025 2:09 PM UTC to reflect the move.
Hi,
Thank you for your inquiry.
We are checking this query internally now. We will get back to you once we get a response from them.
Best Regards,
Santhakumari.V
Hi,
Chris is correct - there are no complex functions that operate on inputs where the real and complex parts are in separate arrays. As he says, one solution is to write functions to pack and unpack to/from complex_float. However, another solution might be to write versions of the complex functions that take the inputs as separate arrays - with the appropriate pragmas and optimization enabled, these functions might give better overall performance than using the library functions with code to pack/unpack. For example:
// Perform a complex float multiply where the real and imaginary parts of the inputs and
// outputs are in separate arrays
void alt_cvecvmlt(float *x_r, float *x_i, float *y_r, float *y_i, float *out_r, float *out_i, int size)
{
#pragma no_alias
#pragma loop_count(2, 10000, 2)
for (int i = 0; i < size; i++) {
out_r[i] = x_r[i] * y_r[i] - x_i[i] * y_i[i];
out_i[i] = x_r[i] * y_i[i] + x_i[i] * y_r[i];
}
}
This function will execute 2 iterations of the loop in 6 cycles.
It's worth explaining the pragmas that are used, as these have a significant impact on the performance.
You would need to confirm that these pragmas can be used in your code.
I haven't looked at the overall performance of this approach vs Chris's approach - it's just something that might be worth investigating. Also, if Chris is using several complex functions, then more work would be required to write versions of them, and it might not be worthwhile.
Thanks,
Kenny
Hi,
Chris is correct - there are no complex functions that operate on inputs where the real and complex parts are in separate arrays. As he says, one solution is to write functions to pack and unpack to/from complex_float. However, another solution might be to write versions of the complex functions that take the inputs as separate arrays - with the appropriate pragmas and optimization enabled, these functions might give better overall performance than using the library functions with code to pack/unpack. For example:
// Perform a complex float multiply where the real and imaginary parts of the inputs and
// outputs are in separate arrays
void alt_cvecvmlt(float *x_r, float *x_i, float *y_r, float *y_i, float *out_r, float *out_i, int size)
{
#pragma no_alias
#pragma loop_count(2, 10000, 2)
for (int i = 0; i < size; i++) {
out_r[i] = x_r[i] * y_r[i] - x_i[i] * y_i[i];
out_i[i] = x_r[i] * y_i[i] + x_i[i] * y_r[i];
}
}
This function will execute 2 iterations of the loop in 6 cycles.
It's worth explaining the pragmas that are used, as these have a significant impact on the performance.
You would need to confirm that these pragmas can be used in your code.
I haven't looked at the overall performance of this approach vs Chris's approach - it's just something that might be worth investigating. Also, if Chris is using several complex functions, then more work would be required to write versions of them, and it might not be worthwhile.
Thanks,
Kenny