Post Go back to editing

LTC4331: repetitive slave side faults

Hello there,

We are using the LTC4331 in our equipment, but randomly, our master receives NACK signals after sending a slave address. These slave addresses has been correctly acknowledged over few previous transactions, and the problem is manufacturer and model independent.

Our I2C is here cadenced at 93kHz, but similar issues happens at 90kHz. Our master MCU does not provide straight 100kHz due to silicon issues.

To look for faulty transactions, we flooded the I2C with transactions. Our test code does the following I2C transactions, in a loop, with 326µs between each transactions:

For i in 0 to 7 loop

I2C_Write(real_slave_addr(i), register_addr); // Register address is the same for all I2C peripherals

data = I2C_Read(read_slave_addr(i), 1); // Reads only one byte.

end loop;

When probing both SCL and SDA signals on both sides, we observe the following behaviours:

Case 1: incomplete I2C transaction.

Left: Master side of the I2C bus. In blue SDA, in yellow SCL. Please note the NACK at the end of the transaction.

Right: Slave side of the I2C bus. In yellow SDA, in blue SCL. Removing the Manchester-like encoding, we obtain the same beginning of data packet. However, only 6 SCL periods are here propagated.

Case 2:master side gets a NACK when sending the peripheral address, while the slave side continues to read data

   

Left: Master side of the I2C bus. In blue SDA, in yellow SCL. Please note the NACK at the end of the transaction.

Right: Slave side of the I2C bus. In yellow SDA, in blue SCL. At the opposite of the master side, we obtain here a complete read transaction over more than 8-bit.

Investigation for both cases

When such NACK appears, we directly read the fault register of the master side LTC4331. What we get are both I2C_WRITE_FAULT and EXT_I2C_FAULT.

What we mainly observe is:

- sometimes the slave side LTC4331 continues to emit a clock and data on its respective side, while the LTC4331 on the master has already answered a NACK.

- sometimes the slave side LTC4331 only emits 6 clock period and then stop the transmission. The appropriate NACK is following on the master side.

- sometimes, the NACK signal is alone, i.e. appears after more than 400µs.

Our hardware implementation involves the transmission over a 3-meter Ethernet cable, with bias resistors only on the emitter side. Receiver bias resistors have been not populated because we have a common ground on both sides of the Ethernet cable.

Could you please give us some clues of what would cause such failures?

Thank you for your assistance.

Best regards,

--

Niels

  • Hi Neils,

    Here's what you can do first to see if it's a setup problem first, increase the time between each transaction. In your loop, there's no wait statement. Did you write the code for the master yourself or are you using some library for i2c? Make sure it supports clock stretching as that would give you the best performance or you can reduce the SCL frequency. Also, what is the configuration you've chosen on speed 1 and 2? Since you're getting  I2C_WRITE_FAULT =1  and EXT_I2C_FAULT=1, the master device is sensing that SDA is low when it should be high,
    it considers SDA stuck by a slave device. When EXT_I2C_FAULT= 1 you could either have an SDA or an SCL fault condition, the remote
    LTC4331 sends a FAULT response to the local LTC4331 which sets the bit. From your loop, I can see that those are read transactions. You're reading the same register from 7 different slaves? Seems like you've a stuck bus there. You can see this in the picture on the bottom right. The LTC4331 enters a bus recovery routine which drives 16 SCL clocks onto the bus and then issues a STOP event. So I think what's happening is that the master device is sensing that SDA is low when it should be high. 

    Regards,

    Naveen

  • Good morning Naveen,

    Thank you for your assistance.

    Please find our answers below:

    Here's what you can do first to see if it's a setup problem first, increase the time between each transaction. In your loop, there's no wait statement.

    The problem happens whether we do flood the I2C with data without pause or with a 1 ms pause between device transactions.Please note, this code has been designed only to investigate why the problem happens on our product, with a 10ms delay between each transaction burst.

    Did you write the code for the master yourself or are you using some library for i2c?

    We use libraries. We have investigated with two different MCUs from two different manufacturers, hence with two different I2C library.

    Make sure it supports clock stretching as that would give you the best performance or you can reduce the SCL frequency.

    This has been investigated: at our knowledge, all our I2C devices support clock stretching.

    Also, what is the configuration you've chosen on speed 1 and 2?

    Selected speed index is 4 (i.e. 100kHz link).

    Seems like you've a stuck bus there. You can see this in the picture on the bottom right. The LTC4331 enters a bus recovery routine which drives 16 SCL clocks onto the bus and then issues a STOP event. So I think what's happening is that the master device is sensing that SDA is low when it should be high. 

    Thank you for your explanation. However,we observe only eight clock periods on SCL before the recovery process is introduced, while this low state might be an acknowledge of the address byte. In your opinion, what would be the cause for the LTC4331 to not send the clock period for the acknowledge bit?

    If we understood well, p.17 of the LTC4331 datasheet states "On the remote side, the LTC4331 master device can detect and attempt recovery from I2C bus faults. If the master device senses that SDA is low when it should be high, it considers SDA stuck by a slave device". Now, let's suppose on of our slaves holds both SCL and SDA low on this acknowledge bit. What would be the expected behaviour from the LTC4331?

    Again, thank you for your assistance.

    Best regards,

    --

    Niels

  • Hi Niels,

    Thanks for the explanation. The acknowledge bit is read on the edge of the SCL so normally if the master is writing/reading data to the slave, it will send the address of the slave it wants to write to in the first 7 clock cycles then it will hold the SCL line low (SDA will be high) till the N/ACK is received from the slave, I think N/ACK itself is read on the edge of the SCL when it arrives.

    The first two pictures are captured at the same time? We should be able to see the NACK being sent from slave. I'm not sure why you're just seeing 6 clock cycles on the slave side. There should be 8 as it is the slave sending the NACK, correct? The SCL frequency on the slave seems to be a little different or have you zoomed out? In the second two pictures, The master again receives a NACK from the slave. The slave seems to have sent the NACK correctly (7 clk cycles 8th is for NACK), but after a NACK from slave, the master should send a stop as soon as it sees the NACK from the slaves, it perhaps took too long and it had to  enter a timeout and then issues a forced STOP. This is normal. According to the datasheet: "If a local LTC4331 detects that all remote slaves have NACK’d, it ceases transmitting further I2C data to the remote side until a STOP or REPEATED START condition is detected. This feature prevents the LTC4331 from stalling the bus unnecessarily. Note that in this scenario, the remote side SCL is held low until a STOP or START condition is detected. Unusually long cycle times could activate the tTIMEOUT condition (your case). When in local mode and SCL is held low for a minimum of tTIMEOUT:SLAVE:MAX, the I2C Interface is reset and SCL and SDA are released if held low.

    I think at the same time,  if the remote master (the ltc on the slave side) senses that SDA is low for too long, it considers SDA stuck by a slave device. The LTC4331 enters a bus recovery routine which drives 16 SCL clocks onto the bus and then issues a STOP event. So this has probably happened in your case in parallel.

    SCL if low for too long could trigger timeout. SDA if low for too long could trigger the recovery process.

    Answer to your question: Slaves are not meant to hold both SDA and SCL low on the ack bit. If the slave is sending the N(ACK) bit it has to be on the edge of the SCL. The master while waiting for the ACK/NACK holds only the SCL low (SDA is normally high) until the N/ack bit is received which is read on the edge of the clock. The slave while waiting for the master to receive its N/ACK bit does hold both SCL and SDA low, but master is still responsible for sending the stop condition. If it takes too long, it triggers timeout.

    Regards,

    Naveen

  • Good afternoon Naveen,

    Thank you for your prompt answer.

    The acknowledge bit is read on the edge of the SCL so normally if the master is writing/reading data to the slave, it will send the address of the slave it wants to write to in the first 7 clock cycles then it will hold the SCL line low (SDA will be high) till the N/ACK is received from the slave, I think N/ACK itself is read on the edge of the SCL when it arrives.

    The first two pictures are captured at the same time?

    That is correct, both are triggered at the same time via a GPIO set to High when a NACK is detected by our respective I2C drivers (channel 3 on all pictures).

    We should be able to see the NACK being sent from slave. I'm not sure why you're just seeing 6 clock cycles on the slave side. There should be 8 as it is the slave sending the NACK, correct?

    9 (8-bit data + 1-bit acknowledge) is what we have expected too. Hence our questions :-)

    The SCL frequency on the slave seems to be a little different or have you zoomed out? In the second two pictures, The master again receives a NACK from the slave. The slave seems to have sent the NACK correctly (7 clk cycles 8th is for NACK), but after a NACK from slave, the master should send a stop as soon as it sees the NACK from the slaves, it perhaps took too long and it had to  enter a timeout and then issues a forced STOP.

    Normally, 9 (8-bit data + 1-bit acknowledge) clock cycles are expected isn't it? On this second set of measurements, it looks the SDA is held low as if the remote slave wanted to acknowledge, but does wait for a new SCL rising edge from the remote master.

    This is normal. According to the datasheet: "If a local LTC4331 detects that all remote slaves have NACK’d, it ceases transmitting further I2C data to the remote side until a STOP or REPEATED START condition is detected. This feature prevents the LTC4331 from stalling the bus unnecessarily. Note that in this scenario, the remote side SCL is held low until a STOP or START condition is detected. Unusually long cycle times could activate the tTIMEOUT condition (your case). When in local mode and SCL is held low for a minimum of tTIMEOUT:SLAVE:MAX, the I2C Interface is reset and SCL and SDA are released if held low.

    I think at the same time,  if the remote master (the ltc on the slave side) senses that SDA is low for too long, it considers SDA stuck by a slave device. The LTC4331 enters a bus recovery routine which drives 16 SCL clocks onto the bus and then issues a STOP event. So this has probably happened in your case in parallel.

    SCL if low for too long could trigger timeout. SDA if low for too long could trigger the recovery process.

    Question: still considering the fact SDA is kept low to signal an ACK on the 9th SCL period, shall we expect to see the same timeout timings as for SCL, between the last transaction and the reset process? If yes, why there is only here around 60µs between the last packet and recovery process?

    Again, thank you very much for your support.

    Kind regards,

    --

    Niels

  • Good morning,

    Is there any update regarding our request?

    Thank you for your assistance.

    Best regards,

    --

    Niels

  • Hi Niels, any success with your tests? I'm evaluating the same device and had some issues as well. I found this topic: https://ez.analog.com/interface-isolation/f/q-a/543449/unexpected-behavior-of-ltc4331

    And after setting at least 35ms between each I2C transaction all went smoothly.. I just wondered if you had the same issue...

    _Wim_