Hi,

We use __builtin_conv_FtoR() to convert floating point values to fixed point values. If we input -1.0F we get 0x8000000, which seems to be correct. However, +1.0F gives 0xffffffff, which is a faulty output. The largest value that gives a correct output is 1.0 – 129*10^-32, which gives 0x7fffff80. What is the reason for that? Why can't we go up to 0x7fffffff?

Regards,

Krishna.

Hi Krishna,

The fractional arithmetic is based on Two's Complement Fixed-Point arithmetic. Typically fractional arithmetic handles the overflow condition by using a technique known as "Saturation". When a result is saturated it is set to the largest positive value for positive results that would overflow, and it is set to the largest negative number for negative results that would overflow. If an operation produces a result that cannot be represented in the space provided then this is undefined behavior in the C standard, so if you wish to use operations which saturate on overflow then we recommend that you use the compiler built-in functions. This built-in function converts the outgoing floating-point data to 1.31 fixed-point format. When the fractional value is 0.99999997 it is saturating and giving the fixed point value as 0x7fffff80, above that this value it gets overflowed and gives 0xffffffff.

The compiler provides built-in functions that do perform fractional arithmetic - these built-ins are described under the section "Fractional Value Built-In Functions in C" in the C/C++ Compiler and Library manual. The manual is available via the link given below.

For Visual DSP++:

http://www.analog.com/static/imported-files/software_manuals/50_bf_cc_rtl_mn_rev_5.4.pdf

For CCES:

http://www.analog.com/static/imported-files/software_manuals/cces_1-0-2_comp_lib_bf_man_rev.1.2.pdf

Thanks,

Jithul