we have an issue with the LTC4331 device.
During our debugging we implemented a test that reads a single byte over and over again from a device on the remote side of the LTC4331.
During the test we gradually increase the time between two reads starting from around 30ms.
In our case the device address of the remote device is 0x20. We read the register 0x05 and we expect the content to be 0xA2. We have seen the exact same failure happening with different I2C slaves as well so the exact numbers don't matter.
Everything looks normal until we reach a time between two reads of around 31.2ms (the exact timing seems to be device and temperature dependent). At this point we start seeing failures on the I2C transactions. Mostly there is no major impact of these failures as the link just recovers on the next transaction. However in rare cases the LTC4331 locks up completely. So any communication towards any device on the bus is failing.
Below you can see some logic analyzer screenshots. The meanings of the signals are:
Channel 2 (red): SCL as seen at the local LTC4331.
Channel 3 (orange): SDA as seen at the local LTC4331.
Channel 4 yellow: toggled by our MCU when a failure was detected by the test application.
Channel 5 green: SCL as seen at the remote LTC4331.
Channel 6 blue: SDA as seen at the remote LTC4331.
Zoom successful transaction:
Zoom first (recovered) failure:
Zoom second (recovered) failure:
Zoom third (permanent) failure (begin and end):
As you can see everything is normal in the first few transactions. Then failures start happening:
In the first failing transaction everything looks normal initially on the remote side. The start condition is correctly output. But then there is a glitch on both the SDA and the SCL lines followed by a stop condition. The local side receives a NAK. Due to the detected failure the next transaction is started with some additional delay. But then the subsequent transaction are fine again.
The second failure looks identical on the local side compared to the first one. But it looks slightly different on the remote side. There is no more activity on the SCL line on the remote side but just a short pulse on the SDA line. Still this is recovered after a longer pause from the test.
The third failure results in a permanent failure of the LTC. After this failure any communication with the LTC or with any of the slaves on the remote side failed. So far we were not able to recover from this state except by removing power from the LTCs. In this third instance there is again only a short pulse on the SDA line on the remote side. So the wave forms on the remote side look similar to the second failure. However on the local side the NAK is only received after ~31.5ms.
We saw the exact same behavior on two completely different HW designs with different speed settings.
Would you have any explanation for this behavior?
As these failures start happening around the specified timeout time of the LTC4331 (which is 31.5ms) we think that this is somehow related to the timeout feature of the LTC4331 chip. Are you aware of any issue regarding this?
As a workaround we will simply avoid starting a new transaction in the timing window 28ms to 35ms since the last transaction finished. Do you think that would help?
Unfortunately I was not able to attach the full Salea logic analyzer capture to this post, but I am happy to provide it through other channels if necessary.
Thanks for your reply,
Removed misleading bracket.
[edited by: cbumae at 2:11 PM (GMT -4) on 9 Apr 2021]