Hi,I have written an asm code which do 40 bit floating point computation for stereo(Left&Right) input.The asm code is SIMD mode enabled. And optimization enabled in the project properties.As stereo, Left(L) and Right(R) channels computations are done in R registers and S registers correspondingly.40 bit operation enabled by clearing RND32 bit in mode1 register.Test Environment:I am using CCES version 2.9.1I have tested the asm code using processor ADSP21569 cycle accurate simulator and function simulator.Issue:Both cycle accurate simulator and function simulator results are not same.When I checked the difference, functional simulator right(R) channel results are differed.Observations:Right channel computations are done in S(shadow) registers. In functional simulator, S registers 40 bit computation gives inconsistent results.For example, When assigning one register value to another register, value not same in the destination register(precision changed).These incorrect values leads to difference in results.
Questions :My understanding is both cycle accurate simulator and function simulator results should same.Why the deviations in function simulator?In SIMD mode, Why S registers 40 bit floating point computation gives inconsistent results in function simulator?
Hi Guruprasath,Yes, you understanding is correct. The expected result of both the cycle accurate simulator and function simulator results should be same.We have tried to simulate your issue with our example project, but we are not able to simulate this issue.Could you please share us your sample project which replicate this issue along with screenshot.This will help us to assist you furtherRegards,Nishanthi.V
Hi Nishanthi,Thanks for the reply.I have attached the sample project which replicate the issue.This project reads the 256 samples from the input binary file.And some 40 bit operation done using the asm code.Output 256 samples writes to the output binary file.This project executed in both cycle accurate and functional simulator based on ADSP-21569.When we compare the output binary files differences observed.Also took the memory dump of the output buffer in both cases. They also having differences.I have attached the output binary files and memory dumps also.Thanks and Regards,Guruprasath.
Hi,We also able to simulate your behavior from your attached project. Also we tried to run with the real hardware, output binary files and memory files are matched only with the Cycle-accurate simulator. So, a typical approach is to use the functional Simulator when speed is essential and the Cycle Accurate Simulator when specific cycle accurate details is essential.The Cycle Accurate Simulator is the only simulator suitable for use in determining the real-world performance of your application. This Simulator models latencies - such as Multi-Cycle Instructions, L1 Data Memory stalls and Instruction Latencies - allowing you to obtain real-world cycle counts and performance figures.For Speeding up simulation the best solution we can offer is to use the Functional Simulator instead of the cycle accurate simulator. The Functional Simulator is a whole separate simulation product that runs blindingly fast compared to the cycle accurate simulator. And also when the applications is larger in size the improvement in simulation speed using the Functional Simulator is significant compared to the Cycle Accurate Simulator.Where the speed of simulation does not have to be faithful to the real processor. The Functional Simulator is deliberately not cycle accurate, as it is intended purely as a super-fast functional simulator.Regards,Nishanthi.V