2011-07-15 06:24:48     mac driver, duplicate rx and lost rx packets

Document created by Aaronwu Employee on Aug 27, 2013
Version 1Show Document
  • View in full screen mode

2011-07-15 06:24:48     mac driver, duplicate rx and lost rx packets

John Rees (UNITED KINGDOM)

Message: 102435   

 

Hi

 

We have been investigating a problem where it appears that bfin mac driver gets into a mode where it either starts receiving duplicate packets or dropping packets.

 

We think this can occur when an error occurs on the ethernet link, this is how we are invoking the problem.

 

I have started putting in some debug trace to determine if we are indeed posting duplicates up to the stack ( netif_rx) and I can detect that this is happening.  We are not seeing duplicates on the wire so it must be occuring from the MAC level upwards.

 

Once the driver enters the error state it does appear to recover after a few minutes.

 

We are seeing this problem on the 0602 kernel bf537. Has a similar error like this been seen before?

 

It looks like in the new kernel the code that deals with the rx DMA and descripters works pretty much the same way, should the driver not take account of the errors occurring on the ethernet within the interrupt handler, how do we know that the frame recieved did not have an error, it looks like it is just passed into the stack regardless.

 

Sorry that the posting is a little vague, we are still investigating and will report our findings as we go along, but any suggestions would be welcome.

 

Thanks

 

John

QuoteReplyEditDelete

 

 

2011-07-15 06:41:07     Re: mac driver, duplicate rx and lost rx packets

Sonic Zhang (CHINA)

Message: 102436   

 

Which release do you run into this problem? 2010R1-RC5 or trunk?

QuoteReplyEditDelete

 

 

2011-07-15 08:17:16     Re: mac driver, duplicate rx and lost rx packets

John Rees (UNITED KINGDOM)

Message: 102438   

 

Hi Sonic

 

 

 

We see this on an old release of kernel 2006R2-RC2, I compared the drivers in the 2006 kernel to recent trunk and I know that the driver has been changed considerably since then, but the DMA handling looks the same so we are suspecting it may exhibit the same behaviour as we are finding.  I am not currently in a position to try out trunk at the moment.

 

We are starting to think that the irq handler is not going through sufficient dma buffer descriptors when the IRQ occurs leaving buffers un serviced.  We will make changes to the driver to try and improve this and report back.

 

Thanks

 

John

QuoteReplyEditDelete

 

 

2011-07-18 04:50:29     Re: mac driver, duplicate rx and lost rx packets

John Rees (UNITED KINGDOM)

Message: 102462   

 

Hi

 

After reading the post form Peter Gombos we think we have been investigating the same problem.

 

I think that the code in the IRQ does not cope with error conditions, when packets are recieved as far as I can tell from the blackfin HRM we should not receive them until RX_OK is set, the IRQ was looking at the status_word as soon as this was non zero we sent the packet into the ip stack, even if errors were present.

 

It also looks like the IRQ could be exited too early, we put some code to check if there were futher buffers pending on IRQ exit and we could see that it was exiting when buffers were pending.

 

We have changed our IRQ and bf537mac_rx routine to check for valid receive frames.

 

Please look at the attached excerp.

 

We have tested this change and so far it copes quite well now when subjected to heavy network traffic, note that we do not run with watchdogs and it is possible that we may spend longer inside the IRQ so this might require further refinement to take account of this.

 

Please could you confirm if our change looks reasonable.

 

Many Thanks

 

John Rees

 

bfin_mac_excerp.c

QuoteReplyEditDelete

 

 

2011-07-18 05:40:12     Re: mac driver, duplicate rx and lost rx packets

Peter Gombos (HUNGARY)

Message: 102464   

 

Hi John,

 

The problem seems similar. But we use the newest driver. Unfortunately this driver is more complicated then yours. Moreover I don't see the change you made. I have to dig out the original driver of your release to compare.

 

In 2010R1 I see some try to call the receiver as long as there are packets.

 

My problem is that the device is not able to recover from this state. Something confused.

 

I try to apply your change and getting back soon...

QuoteReplyEditDelete

 

 

2011-07-18 15:53:09     Re: mac driver, duplicate rx and lost rx packets

James Kosin (UNITED STATES)

Message: 102473   

 

Hi John & Peter,

 

It could also be a PHY problem?  I'm not exeriencing the issue here and I am using the 2010R1 kernel.  I'm using a Marvell Switch chip connected to the MAC interface in my setup.

 

James

QuoteReplyEditDelete

 

 

2011-07-19 04:56:56     Re: mac driver, duplicate rx and lost rx packets

John Rees (UNITED KINGDOM)

Message: 102521   

 

Hi Peter,

 

When our system experienced 'the strange state' it also did not recover. The ping stats were virtually identical to yours.

 

I found that ifconfig eth0 down followed by ifconfig eth0 up was able to recover the interface.

 

With the new IRQ changes we find it very tolerant to intense traffic in your bad swith setup, we have two units side by side and the unit with the new changes have recovered every time while the unchanged unit is broken.

 

We are confident that our change is the solution.  The IRQ handler seems the be the cause of our problems, also the latest trunk kernel IRQ is vertually identical, there is some minor changes in the rx routine, but this looks like the addtion of error checking, which we are effectively now doing in the IRQ so it should be easy to port.

 

 

 

I have modified the IRQ reoutine slightly to our original posting as I felt there might be a bit of a race with the DMA controller, here is our final IRQ handler.

 

/* interrupt routine to handle rx and error signal */

static irqreturn_t bf537mac_interrupt(int irq, void *dev_id,

     struct pt_regs *regs)

{

    struct net_device *dev = dev_id;

 

    do

    {

        if( current_rx_ptr->status.status_word )

        {

            if(current_rx_ptr->status.status_word & RX_OK)

            {

                bf537mac_rx(dev);

            }

            current_rx_ptr->status.status_word = 0x00000000;

        }

       current_rx_ptr = current_rx_ptr->next;

    }while (current_rx_ptr->status.status_word);

 

    bfin_write_DMA1_IRQ_STATUS( DMA_DONE | DMA_ERR);

    return IRQ_HANDLED;

}

 

John

QuoteReplyEditDelete

 

 

2011-07-19 11:22:09     Re: mac driver, duplicate rx and lost rx packets

Peter Gombos (HUNGARY)

Message: 102537   

 

Hi John,

 

I did a lot of tests and I have very good results. I applied your solution with minor changes and it seems OK. Here is my ISR routine for 2010R1RC5:

 

/* interrupt routine to handle rx and error signal */

static irqreturn_t bfin_mac_interrupt(int irq, void *dev_id)

{

    struct net_device *dev = dev_id;

 

    do {

        if (current_rx_ptr->status.status_word)

     {

      if (current_rx_ptr->status.status_word & RX_OK) bfin_mac_rx(dev);

      current_rx_ptr->status.status_word = 0x00000000;

     }

    current_rx_ptr = current_rx_ptr->next;

     } while (current_rx_ptr->status.status_word);

    bfin_write_DMA1_IRQ_STATUS( DMA_DONE | DMA_ERR);

    return IRQ_HANDLED;

}

 

I had to remove the last lines from bfin_mac_rx() function (after the out: label) because of the double pointer step. Further test needed but I'm very positive now.

 

Unforunatelly I still have a minor problem with TCP/IP protocol. Sometimes I see duplicated ack and fast frame retransmission while getting images from the board via boa server. The TCP eliminates this but the problem exists. I suspect the ethernet driver to lose frames.

QuoteReplyEditDelete

 

 

2011-07-20 11:05:03     Re: mac driver, duplicate rx and lost rx packets

James Kosin (UNITED STATES)

Message: 102548   

 

Hi Peter & John,

 

bfin_mac_rx() should already have a clear for the status in the code; so, you could simplify even further like this:

 

do {

         bfin_mac_rx(dev);

} while (current_rx_ptr->status.status_word != 0);

 

bfin_write_DMA1_IRQ_STATUS(bfin_read_DMA1_IRQ_STATUS() |

                           DMA_DONE | DMA_ERR);

return IRQ_HANDLED;

 

bfin_mac_rx() already handles clearing the status word when both error and success.  Already handles the error case by throwing away the error and incrementing a counter.  Already moves the current_rx_ptr to the next.

 

James

QuoteReplyEditDelete

 

 

2011-07-21 05:24:02     Re: mac driver, duplicate rx and lost rx packets

John Rees (UNITED KINGDOM)

Message: 102589   

 

Hi,

 

You could probably change the IRQ as you suggest on the new kernel, my fix was originally for the old kernel, where there was no checking of status bits in the rx routine.

 

I would think it would be better not to perform the logic within a branch inside an IRQ, as it may push the locals in the rx routine on the stack uneccesarily if there was an error, causing a bit of a performance penalty for no reason, fair enough when a legitimate packet is received.

 

John

QuoteReplyEditDelete

 

 

2011-07-22 08:10:24     Re: mac driver, duplicate rx and lost rx packets

James Kosin (UNITED STATES)

Message: 102616   

 

John,

 

I'm using an older kernel too.

 

uClinux 2010R1

 

root:/> uname -r

2.6.34.7-ADI-2010R1

 

 

James

Attachments

Outcomes