Post Go back to editing

Endianess of EMAC DMA transfers

Category: Software
Software Version: CCES 2.11

Hi,

I'm writing some glue-code between the ADI EMAC driver for the ADSP-SC598W and an IP-stack but have run into a strange problem with endianess. The IP-stack sees the ethernet frame as an array of sequential bytes in memory so I need the DMA to treat the data as big endian. However, when I started testing the code it seems that the DMA doesn't do this, it transfers it as little endian. I found the endian setting in the PADS0->PCFG0 register but here is where it gets strange. It was set to big endian, and changing the setting does not cause the endianess of the transmitted data to switch. Am I misunderstanding something here? I've attached screenshots of CCES and Wireshark.

Best regards,

Axel Lindholm

  • So I got this working but its still strange why it does (unless the HRM is wrong). I now set the PADS register before I do anything with the EMAC driver and when I do I can change endianess with the specified bit. However, for the data to be transferred as big-endian the bit needs to be 0, not 1 as stated in CCES and the HRM. Worth noting is that the SC598W EMAC example code included with CCES, in accordance with the HRM, clears this bit and comments the line with "Configure both EMAC0 and EAMC1 blocks as little endian". So unless the example code, the IDE and the HRM are all wrong I'm still lost as to why it is working now...

    [ EDIT ]

    Now that I look at it it would seem that adi_pads_Config() is poorly written. This may actually be the reason my original code didn't work (I now clear and set the register bits directly). It would seem that you cannot confidently call this function twice or your risk clearing settings you previously did, not to mention the logic inside the function is obscure and barely readable. E.g., first calling it to select a PHY interface and then calling it to clear any bit would reset the PHY interface selection to its default value.

    I'm sorry to say that this is not the first bug I've found in your drivers. They are littered with them if you take a closer look. Two others that I've discovered the past week or so:

    1. Calling adi_emac_SubmitDescriptorListTxTwait with DMA channel 4 will not wait for a trigger, instead it will cause a trigger.

    2. Configuring the CAN driver to not use the extended bit-timing register will cause the bit-timing settings to zero out, you overwrite the non-extended bit-timing settings in your driver.

    Are there any good official channels I should go through to report things like this?

    [/EDIT]

  • Hi Axel Lindholm,


    So I got this working but its still strange why it does (unless the HRM is wrong). I now set the PADS register before I do anything with the EMAC driver and when I do I can change endianess with the specified bit. However, for the data to be transferred as big-endian the bit needs to be 0, not 1 as stated in CCES and the HRM. Worth noting is that the SC598W EMAC example code included with CCES, in accordance with the HRM, clears this bit and comments the line with "Configure both EMAC0 and EAMC1 blocks as little endian". So unless the example code, the IDE and the HRM are all wrong I'm still lost as to why it is working now...


    >> We had tested the attached code (which sends continuous packets from SOM+CRR board to PC via EMAC0 at 1000 Mbps speed) and we could see that the value of the bit PADS0_PCFG0. EMAC0_ENDIANNESS and packet order received on wireshark match as expected (as per the HRM) for both little and big endian modes – see screenshots attached. Can you test the attached code and confirm if the same observation is see at your end as well?

    Calling adi_emac_SubmitDescriptorListTxTwait with DMA channel 4 will not wait for a trigger, instead it will cause a trigger.
    >> We assume you are talking about the following highlighted line – please confirm. If so, This seems like a typo, thanks for pointing it out. We’ll fix it in next revision.


    Thanks for the comments on PADS service and CANFD, we have passed that to developers.

    Are there any good official channels I should go through to report things like this?
    >> You can report through this same support channel.

    565044.zip


    Regards,
    Divya.P

  • Yes, that is the line.

    I will test the code as soon as I have a few moments to spare. I also found a problem in your DMA RX interrupt. You assume the DMA descriptor has not been brought into cache after it was flushed in one of the SubmitDescriptor-functions and access it without invalidating the cache, this can and will cause serious problems. Also, the EMAC driver doesn't seem to handle the case where multiple DMA buffers are filled during a single interrupt correctly but then again, I'm not entirely sure about the intended interface.

  • Hi Axel,

    I also found a problem in your DMA RX interrupt. You assume the DMA descriptor has not been brought into cache after it was flushed in one of the SubmitDescriptor-functions and access it without invalidating the cache, this can and will cause serious problems.
    >> Inside the ISR, the descriptors are read only to check which one of them generated the last interrupt. The ISR code doesn’t depend upon any of the bits which are written by the DMA engine to the descriptors. So, as per our understanding it is safe to read the descriptor chain without invalidating the cache.  The APIs adi_emac_IsDescriptorBusyTx and adi_emac_IsDescriptorBusyRx which depend upon the OWN bit written by the DMA engine does perform a flush and invalidation of the cache. Other than these APIs, the expectation is that any other user code which depends upon the descriptor bits modified by the DMA engine is supposed to perform cache invalidation before reading the descriptor. Having said this, could you please elaborate little more regarding the exact use case where not invalidating the descriptors inside the ISR can cause problems ?

    Also, the EMAC driver doesn't seem to handle the case where multiple DMA buffers are filled during a single interrupt correctly but then again, I'm not entirely sure about the intended interface.
    >> Could you please elaborate this point little more for better understanding of the exact use case?

    Regards,
    Divya.P

  • Hi Divya,

    Thak you for explaining the interface in detail. The problem I experienced could be observed during very high RX loads (in my case about 50.000 frames frames per second with all descriptors being IOC) and was kind of hard to debug like many DMA related problems are. However, my theory as to what happens and why is this:

    More than one IOC-tagged descriptor finishes before the ISR executes. The ISR only process the first found IOC-descriptor and this glitch will cause my descriptor-ring to become exhausted after a while, where the application has not been made aware of the completion of the descriptors (hence they will not be resubmitted) and the DMA cannot use them as the OWN-bit is cleared. In my opinion it would make more sense to check the OWN-bit in the ISR, perhaps in combination with the IOC-bit if the ISR is supposed to make the callback for IOC-descriptors only. So, now I have to invalidate the descriptor but you're correct; you shouldn't need to do that if investigating bits that cannot be updated by DMA.

    Another thing, when you say "flush and invalidate", you mean invalidate only (i.e. DC IVAC instruction) right? Not what ARM calls a cache clean? Cleaning the cache here could be risky business as updates the DMA made to the descriptor would be overwritten should the cache line be dirty...

  • Hi Axel,

    Thanks for elaborating your use case details. We completely understand your point that if there’ll be problems if the descriptors are processed by the DMA faster than the interrupt service time. In such cases, we think that it’ll probably be better to have a mechanism in the application to detect such conditions. The problem with adding cache flush/invalidate logic inside the ISR is that it’ll consume extra cycles and will increase the ISR service time which can cause further problems. To check the status of own bit, we have APIs adi_emac_IsDescriptorBusyTx/Rx() – can they used inside the application to address your use case ?
     
    We generally perform flush+invalidate to together and you are right that the updates made by the DMA will be lost if the cache line is dirty. So, the recommendation is to ensure that the core doesn’t update the descriptor when it is owned by the DMA. As long as this is taken care of, flush+invalidate operation is effectively equivalent to invalidate only. Do let us know if you still have further questions regarding this topic.

    Regards,
    Divya.P

  • I don't think adi_emac_IsDescriptorBusyTx/Rx() can be used to address the problem in my case. The problem at hand being that I don't receive callbacks for all descriptors during high load. I'm not sure how you would fix that without modifying the ISR before the callbacks are made.

    That flush+invalidate = invalidate only given that the descriptor isn't updated is true only if the cache line contains nothing but the descriptor. I could probably circumvent the problem in my design as my descriptors are in an array so I'd "just" have to make sure I don't access any of the nighbouring descriptors to cause a dirty cache line either. But this is a tricky problem, if e.g. I would have had the descriptors in a struct with other data in adjecent memory I would have exposed myself to potentially overwriting the DMA data not by updating the descriptor itself but by simply writing to other fields in that struct...

  • Hi Axel,

    I don't think adi_emac_IsDescriptorBusyTx/Rx() can be used to address the problem in my case. The problem at hand being that I don't receive callbacks for all descriptors during high load. I'm not sure how you would fix that without modifying the ISR before the callbacks are made.
    >>The attached section of the HRM discusses about this problem and possible workarounds. Could you please check if any of the mentioned methods helps to mitigate the problem ?



    That flush+invalidate = invalidate only given that the descriptor isn't updated is true only if the cache line contains nothing but the descriptor. I could probably circumvent the problem in my design as my descriptors are in an array so I'd "just" have to make sure I don't access any of the nighbouring descriptors to cause a dirty cache line either. But this is a tricky problem, if e.g. I would have had the descriptors in a struct with other data in adjecent memory I would have exposed myself to potentially overwriting the DMA data not by updating the descriptor itself but by simply writing to other fields in that struct...
    >> We completely understand your concern. As per the current driver design, it is recommended to align both the start and end of a descriptor list with cache line size (64 bytes). Also, it is recommended not to update individual descriptors using core till all the descriptors are processed if any flush/invalidate is performed on the descriptor by the driver code. Having said this, we’ll check internally if there is any API available to perform range-based cache invalidation for ARM core and get back to you.

    Regards,
    Divya.P

  • Hi Axel,

    As of now, we don’t have support for range based invalidation API. We’ll discuss regarding this feedback and requirement internally and take necessary actions in future. In the meantime, we suggest you to follow the recommended alternate options if possible.  

    Regards,
    Divya.P