2011-05-04 12:40:13     bfin_mac: <unknown>: hw csum failure

Document created by Aaronwu Employee on Aug 26, 2013
Version 1Show Document
  • View in full screen mode

2011-05-04 12:40:13     bfin_mac: <unknown>: hw csum failure

Stefan Wanja (GERMANY)

Message: 100424   

 

Hello,

 

I can see an easy to reproduce problem on bf537-ezkit lite (running 2010 Release) and on bf527 custom board (running 2009 Release). The problem is when the network interface is activated after boot or after plugging a cable in and at the same time packets are coming in I get an error message like this:

 

PHY: 0:01 - Link is Up - 100/Full

<unknown>: hw csum failure.

Hardware Trace:

   0 Target : <0x00115e78> { _dump_stack + 0x0 }

     Source : <0x000d1926> { _netdev_rx_csum_fault + 0x36 } JUMP.L

   1 Target : <0x000d1920> { _netdev_rx_csum_fault + 0x30 }

     Source : <0x00115f7c> { _printk + 0x14 } RTS

   2 Target : <0x00115f78> { _printk + 0x10 }

     Source : <0x00011eea> { _vprintk + 0x16a } RTS

   3 Target : <0x00011ede> { _vprintk + 0x15e }

     Source : <0xffa00cf8> { __common_int_entry + 0xcc } RTI

   4 Target : <0xffa00c96> { __common_int_entry + 0x6a }

     Source : <0xffa00ae0> { _return_from_int + 0x58 } RTS

   5 Target : <0xffa00ae0> { _return_from_int + 0x58 }

     Source : <0xffa00ab6> { _return_from_int + 0x2e } IF !CC JUMP pcrel

   6 Target : <0xffa00a88> { _return_from_int + 0x0 }

     Source : <0xffa00c92> { __common_int_entry + 0x66 } JUMP.L

   7 Target : <0xffa00c90> { __common_int_entry + 0x64 }

     Source : <0xffa0038e> { _asm_do_IRQ + 0x6a } RTS

   8 Target : <0xffa00386> { _asm_do_IRQ + 0x62 }

     Source : <0x00015536> { __local_bh_enable + 0x3a } RTS

   9 Target : <0x000154fc> { __local_bh_enable + 0x0 }

     Source : <0x00015b7c> { ___do_softirq + 0xa4 } JUMP.L

  10 Target : <0x00015b74> { ___do_softirq + 0x9c }

     Source : <0x00015b68> { ___do_softirq + 0x90 } IF CC JUMP pcrel

  11 Target : <0x00015b5a> { ___do_softirq + 0x82 }

     Source : <0x000340e4> { _rcu_bh_qs + 0x14 } RTS

  12 Target : <0x000340d0> { _rcu_bh_qs + 0x0 }

     Source : <0x00015b56> { ___do_softirq + 0x7e } JUMP.L

  13 Target : <0x00015b4e> { ___do_softirq + 0x76 }

     Source : <0x000d1388> { _net_rx_action + 0x7c } RTS

  14 Target : <0x000d136e> { _net_rx_action + 0x62 }

     Source : <0x000d1396> { _net_rx_action + 0x8a } IF CC JUMP pcrel (BP)

  15 Target : <0x000d1392> { _net_rx_action + 0x86 }

     Source : <0x000d1216> { _process_backlog + 0x8e } RTS

 

 

One way to reproduce this is start the board, start "iperf -s -u" on another computer, then start "iperf -c <server-ip> -u -b 100M -i 30 -d -t 99999" and then unplug the cable and plug it in again or just reset the board and wait for reactivation of the network interface.

 

Most of the times the error messages are bloating out as long as there are packets coming in but sometimes the system even seems to hang. Please see complete Logs in the attached files.

 

I could also observe, that uboot fails in fetching a DHCP given IP, gets a complete nonsense IP or gets one but the TFTP transmission has fails or times out when the iperf packets of the iperf server are sent to the device at the same time.

 

Should a bug be filed in the tracker?

 

Kind Regards,

 

Stefan

 

csum_error.txt

csum_error_with_hang.txt

QuoteReplyEditDelete

 

 

2011-05-06 02:09:25     Re: bfin_mac: <unknown>: hw csum failure

Sonic Zhang (CHINA)

Message: 100451   

 

Can't replicate.

 

2010R1-RC5 Linux on bf537-stamp IP 10.100.4.50

Hopt Linux PC IP 10.100.4.174

 

As client:

 

root:/> iperf -c 10.100.4.174 -u -b 100M -i 30 -d -t 99999

------------------------------------------------------------

Server listening on UDP port 5001

Receiving 1470 byte datagrams

UDP buffer size:   104 KByte (default)

------------------------------------------------------------

------------------------------------------------------------

Client connecting to 10.100.4.174, UDP port 5001

Sending 1470 byte datagrams

UDP buffer size:   104 KByte (default)

------------------------------------------------------------

[  5] local 10.100.4.50 port 38320 connected with 10.100.4.174 port 5001

[  6] local 10.100.4.50 port 5001 connected with 10.100.4.174 port 42851

PHY: 0:01 - Link is Down

PHY: 0:01 - Link is Up - 100/Full

PHY: 0:01 - Link is Down

[ ID] Interval       Transfer     Bandwidth

[  5]  0.0-30.0 sec    151 MBytes  42.3 Mbits/sec

PHY: 0:01 - Link is Up - 100/Full

[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams

[  6]  0.0-30.0 sec    201 MBytes  56.1 Mbits/sec  0.146 ms 86196/229327 (38%)

 

 

 

PHY: 0:01 - Link is Down

PHY: 0:01 - Link is Up - 100/Full

[ ID] Interval       Transfer     Bandwidth

[  5] 30.0-60.0 sec    187 MBytes  52.2 Mbits/sec

[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams

[  6] 30.0-60.0 sec    165 MBytes  46.2 Mbits/sec  0.106 ms 150291/268115 (56%)

^C[ ID] Interval       Transfer     Bandwidth

[  5]  0.0-87.9 sec    414 MBytes  39.5 Mbits/sec

[  5] Sent 295119 datagrams

[  5] Server Report:

[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams

[  5]  0.0-87.9 sec    185 MBytes  17.7 Mbits/sec  0.024 ms 162981/295118 (55%)

[  5]  0.0-87.9 sec  1 datagrams received out-of-order

 

 

As server:

 

root:/> iperf -s -u

------------------------------------------------------------

Server listening on UDP port 5001

Receiving 1470 byte datagrams

UDP buffer size:   104 KByte (default)

------------------------------------------------------------

[  5] local 10.100.4.50 port 5001 connected with 10.100.4.174 port 42851

[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams

[  5]  0.0- 4.6 sec  52.0 MBytes  95.0 Mbits/sec  0.029 ms 992590/1029651 (96%)

[  5]  0.0- 4.6 sec  1 datagrams received out-of-order

[  6] local 10.100.4.50 port 5001 connected with 10.100.4.174 port 43382

------------------------------------------------------------

Client connecting to 10.100.4.174, UDP port 5001

Sending 1470 byte datagrams

UDP buffer size:   104 KByte (default)

------------------------------------------------------------

[  7] local 10.100.4.50 port 53282 connected with 10.100.4.174 port 5001

PHY: 0:01 - Link is Down

PHY: 0:01 - Link is Up - 100/Full

 

 

PHY: 0:01 - Link is Down

PHY: 0:01 - Link is Up - 100/Full

 

 

 

PHY: 0:01 - Link is Down

PHY: 0:01 - Link is Up - 100/Full

 

 

 

[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams

[  6]  0.0-47.5 sec    276 MBytes  48.7 Mbits/sec  0.168 ms 196873/393680 (50%)

[  6]  0.0-47.5 sec  1 datagrams received out-of-order

write2 failed: Connection refused

[ ID] Interval       Transfer     Bandwidth

[  7]  0.0-48.4 sec    266 MBytes  46.0 Mbits/sec

[  7] Sent 189454 datagrams

read failed: Connection refused

[  7] WARNING: did not receive ack of last datagram after 2 tries.

QuoteReplyEditDelete

 

 

2011-05-06 08:34:22     Re: bfin_mac: <unknown>: hw csum failure

Stefan Wanja (GERMANY)

Message: 100486   

 

Hello Sonic,

 

maybe it makes a difference if the device is directly connected to the pc or via a switch. I think when directly connected the PC itself doesn't send data until the link is completely established, maybe a switch is "faster" with outputting the first packets... In my previous post the connection was done via a Gigabit-Switch. When connect a Gigabit Laptop it also occures, but not as often.

 

Maybe  the cache mode makes a difference. We are using WriteThrough Cache. Kernel config is attached. I'll check to see if I can reproduce it with WRITE BACK also.

 

Attached is also the binary kernel with filesystem attached. Maybe you can check with this?

 

Kind Regards,

 

Stefan

 

uImage.gz.initramfs

config

QuoteReplyEditDelete

 

 

2011-05-09 02:20:28     Re: bfin_mac: <unknown>: hw csum failure

Sonic Zhang (CHINA)

Message: 100503   

 

We have no Gigabit switch to run your test. Please update your result with default 2010R1-RC5 kernel config (WB enabled).

QuoteReplyEditDelete

 

 

2011-05-09 12:28:15     Re: bfin_mac: <unknown>: hw csum failure

Stefan Wanja (GERMANY)

Message: 100520   

 

Hey Sonic,

 

with the default kernel config (WRITE BACK) the problem disappears. Seems to depend on WRITE THROUGH.

 

So, could you please try the uploaded kernel+attached system (WRITE THROUGH) to reproduce the problem on your side?

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2011-05-10 00:18:38     Re: bfin_mac: <unknown>: hw csum failure

Sonic Zhang (CHINA)

Message: 100531   

 

Hardware checksum feature is only enabled when WB cache is disabled. On bf537 v0.2, hardware checksum works as exptected. No crash is perceived. But, on bf537 v0.3, hardware checksum generates wrong results. So, netdev_rx_csum_fault() dumps error information.

 

Current walk around is to disable hardware checksum in bfin_mac.h for bf537 v0.3.

QuoteReplyEditDelete

 

 

2011-05-11 05:30:10     Re: bfin_mac: <unknown>: hw csum failure

Stefan Wanja (GERMANY)

Message: 100571   

 

Hello again,

 

please consider also, that it also happens on our bf527 0.2 board...

QuoteReplyEditDelete

 

 

2011-05-11 05:56:18     Re: bfin_mac: <unknown>: hw csum failure

Sonic Zhang (CHINA)

Message: 100572   

 

It seems this issue only occurs on new version of chips.

Outcomes