[#6546] The simulator does not handle implicit DISALGNEXCPT instruction.

Document created by Aaronwu Employee on Oct 17, 2013
Version 1Show Document
  • View in full screen mode

[#6546] The simulator does not handle implicit DISALGNEXCPT instruction.

Submitted By: Anton Shokurov

Open Date

2011-04-14 14:54:29     Close Date

2011-04-16 13:28:51

Priority:

Medium     Assignee:

Mike Frysinger

Board:

N/A     Silicon Revision:

Resolution:

Fixed     Fixed In Release:

N/A

Processor:

ALL     

Host Operating System:

toolchain rev.:

    kernel rev.:

State:

Closed     Found In Release:

N/A

Is this bug repeatable?:

N/A     

Summary: The simulator does not handle implicit DISALGNEXCPT instruction.

Details:

 

This is the continuation of "i get an Unhandled exception and Illegal instruction error when run in simulator"

 

It turns out that i was using the simulator correctly in the first place. The problem was that the simulator does not generate an implicit DISALGNEXCPT instruction for the Quad 8 bit instructions:

 

BYTEPACK, BYTEUNPACK,

SAA,

BYTEOP2P, BYTEOP3P, BYTEP1P,

BYTEOP16P, BYTEOP16M.

 

I have attached a fixed file for this.

 

I have also noted that the simulator does not handle correctly (very long) instructions such as:

 

R6 = BYTEOP3P(R1:0, R3:2) (LO)    || I3+=M3          || [I2++] = R6;

 

The instruction "[I2++] = R6" should write the old R6, not the new that was evaluated at "R6 = BYTEOP3P(R1:0, R3:2) (LO)". This should be fixed probably by doing it the way the real processor does: you do the reads as usual and queue the writes. They get executed after the very long instruction has been completed. BTW, the attached file also has a patch for the instruction:

"R%i = R%i + R%i, R%i = R%i - R%i%s;"

that was implemented as:

 

STORE (DREG (dst1), add32 (cpu, DREG (src0), DREG (src1), 1, s));

STORE (DREG (dst0), sub32 (cpu, DREG (src0), DREG (src1), 1, s, 1));.

This implementation (considering to what i have said before) does not work correctly, since the dst1 can match the src. Therefore will execute incorrectly. The correct code (for the time being) is:

 

bu32 val0 = add32 (cpu, DREG (src0), DREG (src1), 1, s);

      bu32 val1 = sub32 (cpu, DREG (src0), DREG (src1), 1, s, 1);

     

      STORE (DREG (dst1), val0);

      STORE (DREG (dst0), val1);.

 

 

Follow-ups

 

--- Mike Frysinger                                           2011-04-14 15:16:01

please post patches (in the unified format), not entire files.  things change

quickly and we have a hard time figuring out what version of the file you're

using.

 

--- Anton Shokurov                                           2011-04-14 15:39:55

I've posted the patch.

 

--- Mike Frysinger                                           2011-04-14 18:27:09

thanks, that makes things clear.  ive committed the implicit DISALGNEXCPT

support as well as a test case for it.  for future note though, please file 1

tracker item per issue.  you've got three here ...

 

as for your proposed change to the multi-issue add/sub insn, i dont see how

your patch makes any difference.  the code is going through the store buffer

when means the actual move to dst1/dst0 doesnt occur until the store buffer is

processed -- at the end of interp_insn_bfin.  if you run with -t --trace-core,

you can clearly see this behavior:

...

reg:      0x0000ae #5    __start         -wrote R0 = 0x87654321

...

reg:      0x0000b6 #6    __start         -wrote R1 = 0x12345678

...

insn:     0x0000ba #7    __start         -R0 = R0 + R1, R1 = R0 - R1 (NS);

reg:      0x0000ba #7    __start         -queuing write R0 = 0x99999999

reg:      0x0000ba #7    __start         -queuing write R1 = 0x7530eca9

reg:      0x0000ba #7    __start         -dequeuing write R0 = 0x99999999

reg:      0x0000ba #7    __start         -dequeuing write R1 = 0x7530eca9

...

 

and indeed, that's the answer the hardware gives too.

 

--- Mike Frysinger                                           2011-04-14 20:39:15

ive committed a fix for a bunch of 32bit insns to use the store buffer so that

output regs dont get clobbered early when used in parallel insns.  this should

address the insn you quoted:

    R6 = BYTEOP3P(R1:0, R3:2) (LO)    || I3+=M3          || [I2++] = R6;

 

--- Anton Shokurov                                           2011-04-15 11:07:36

>>as for your proposed change to the multi-issue add/sub insn, i dont see

how

your patch makes any difference.

 

First of all i did not know that queue is indeed implemented, second of all i

desperately needed the code running, so I guess I got carried away patching the

code. Nevertheless the code does look strange if one does not know that the

writes are being queued.

 

Thanks for the fix.

 

--- Mike Frysinger                                           2011-04-15 11:50:00

so current trunk seems to be working for you with all your needs ?

 

--- Anton Shokurov                                           2011-04-16 02:35:58

Yes. It works like a charm, that is mpeg video gets properly coded. Although i

do still have to figure out why the output stream does not match (bit for bit)

the one generated on the x86. They are very close in size, but they do not match

bit for bit.

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

bfin-sim.c    text/x-csrc    174365    Anton Shokurov

enabl_impl_disbl_algn.diff    text/x-patch    3213    Anton Shokurov

Outcomes