I am a firmware engineer for a bio tech company in Texas. I was working on a legacy product refresh project when we ran into some issues.
This board was about 20 years design, and lot of the components had reached end of life. So we decided to keep the main architecture and components the same and replaced the obsolete components with an FPGA and new USB controller chips.
we built prototypes and tested them, all were working great with the stock firmware. we built new firmware only for the FPGA that was mimicking the glue logic and old usb transceiver , and the new cypress usb controller.
We ordered 9 boards with the new design confident we have tested it with the prototype.
only 4 boot up, 5 are not responding.
We currently do not have any debugging tools. What we are doing is forcing the microcontroller ( ST10R167) firmware to write to a unused USB register address, and monitoring the address bus using a signaltap on the fpga.
With the dsp, we are injecting dummy i/o reads to various addresses in various locations in the firmware (which is written in assembly), programming it and then observing the data bus to indicate that particular write.
So we only have the information on if a certain part of code gets hit.
What we are seeing is that the DSP stops running its Main loop.
This routine is supposed to just keep calling itself at the end of the routine like an infinite loop. All other major functions are interrupt driven.
one of the continuous ADC read operations is done by feeding the SCLK ( serial clock from SPORT1) back into IRQ2 pin. This ISR fires correctly, and we see that the routine executes the 5 read operations correctly and timely. But there is no evidence of the main loop running after a few loops.
During startup the main loop executes a few times. We expect it to keep running, since its the one that handles some command parsing, and handshake that tells the micro that the dsp is active and responding and is running the correct firmware.
But the main loop dies for some reason, and we don't have a clue what.
We checked signal integrity on the data bus , and there seems to be no apparent noise. (although we are fully sure how to confirm that)
We checked the clock going in to CLKIN. It looks fine. The CLKOUT is x2 of CLKIN at 32.168 Mhz.
The DSP and the micro communicate using a shared memory architecture. Both the devices write to and read from the shared memory to communicate the data and commands.
Different commands are signalled by different interrupt lines pulled by the micro.
The job of the main routine is to check for "CR" on the command word (shared memory variable) and go to a label called Begin, which will compute the crc and write "SB" or standby into the command word. The micro writes "CR" into that word, and indefinitely waits for "SB" to be written by the DSP. But the DSP never sees the "CR" as the main loop thats supposed to catch it has stopped executing!
Here is the sequence of events that happen on powerup.
1. dsp loads initial firmware from a flash chip to program memory. part of the program memory is "softload" that handles firmware download from the micro.
2. micro sets the PWD pin to 0 to force the dsp to powerdown mode where the softload resides. It then downloads a new firmware image to the dsp, while the dsp overwrites its own PM with the opcodes that its receiving from the micro. the microprocessor's firmware has an image of the DSP firmware built into it.
3. after firmware download completes, the PWD pin is set to 1 so execution resumes. A CRC of the PM is computed and saved. The Main loop is now running fine along with IRQ2.
4. after about 360753 times the main loop stops running. So when the micro writes a "CR" to the command word, the dsp never recognizes it, and never writes "SB" to it. so both the chips hang and wait.
I am puzzled on what will cause the main loop to die.
I have stripped the one complicated function call from the loop. now there are only a few memory reads, writes and ENA INT commands in that "mainbody". It still dies. I was hoping to find a IDLE, but there is none.
We are completely stumped and are running out of ideas.