Post Go back to editing

LTC4332 FAULT does not go away even after clearing EVENT register and resetting ON pin for 180ms

Category: Hardware
Product Number: LTC4332

The LTC4332 on majority of our systems works flawlessly, but on a couple of systems, we are observing that after a couple of hours of normal operation, suddenly the SPI communication was stopping. I investigated and figured out that the FAULT bit is getting set in the EVENT Register (value in EVENT register is 0x05). Upon further investigation, I saw that the content of the FAULT register is 0x11, which means the RX_BUF_UNDERFLOW and the SPI_WRITE_FAULT bits are set, pointing to some SPI communication error. Power cycling the system, fixes the issue immediately and the system starts running normally again for the next few hours till we see this happen again. We are still unsure what causes the communication error and investigation is going on. The wires and the connectors connecting the local and the remote look good and the issue does not go away if we replace the wires or the connectors. The issue at the moment is always seen on the remote side, as the problem travels with the faulty remote boards. But we fear that this could very well happen with the local board as well, so we cannot be certain that only remote could give us issues.

Question:

While we figure out the root cause, we would still like to be able to restore the communication using a firmware/hardware solution. So, can I restore the communication without power-cycling the system or the LTC4332? I'm clearing the EVENT register by writing 0 to it but this is not helping in restoring communication. I'm pulling the ON pin low for >180ms (~200ms) to trigger a remote reset, but I still do not see the communication getting restored and the EVENT register continues to show communication fault even after the above steps. I do reinitialize the LTC4332 registers after the remote reset, but the fault comes back as soon as any attempt is made to communicate over LTC4332. What else can I try? The circuit on the local side is shown below:

The remote connections are shown below:

Any pointers/ideas on this would be very helpful.

Thanks

  • What is your SPI clock speed? 

    The RX_BUF_UNDERFLOW fault can occur when the Receive buffer overflows which can happen when the local SPI frequency exceeds the configured SPI clock speed.

    Eric

  • Thanks for your response. We have grounded both the SPEED1 and SPEED2 pins, so the SPI frequency set is 2MHz. Also, the length of the cable connecting the 2 boards is less than 2 meters long. As mentioned, this issue is seen only on certain boards and not on all the boards, so the SPI frequency set is working for us. There's a team trying to root cause the issue, but in the meantime, we are trying to figure out how to recover the SPI communication via firmware, as the problem goes away as soon as we reset the remote board by unplugging-re-plugging the cable. There's a document which explains that pulling the ON pin low for >180ms should reset the remote board and the link should be restored, but I do not see this happen. Any pointers on the recovery would be very helpful.

    Thanks