Post Go back to editing

BF537 hang on PPI / Ethernet producer / consumer app

We have a producer / consumer application that uses a BF537 on a custom board.

The producer is multi-channel data acquisition controlled by an FPGA over the PPI port.
The consumer (the BF537) streams the data to a Windows host over Ethernet.

The app used to work, but the hardware had to be updated due to parts obsolescence. On the new hardware, the app hangs. The app uses semaphores to manage consumer / producer synchronization.

Differences between the old and new revision per the BF537 program are:
--updated (obsolete PHY per http://www.analog.com/media/en/technical-documentation/application-notes/EE_315_Rev_1_06_07.pdf
--updated SDRAM
--slightly higher data throughput rate: PPI and Ethernet

The hang only occurs if when PPI and Ethernet I/O run concurrently.

When the app hangs, the code sits in the idle thread, and PPI interrupts continue to trigger. The handler posts new data-ready semaphores, but the thread that pends on data-ready never wakes up (or has disappeared). Unhit breakpoints at the end of each thread imply the threads should still be running. After more time, a PPI FIFO overflow is generated and the call stack shows a lock up around __wab7 + xxxx.

We are looking for any suggestions on possible causes and things to try to isolate the issue.

FYI, we checked stack usage on all threads ... plenty of room prior to hang.  We have also checked the SDRAM / EBIU settings, and ran a successful SDRAM sweep without error.

  • Thanks for the summary of the issue.  With the details provided, it seems like you are running into a system bandwidth issue due to the "slightly higher throughput rate" on both the PPI and Ethernet, but it is tough to discern given some of the other information provided...you mention that "unhit breakpoints at the end of each thread imply the threads should still be running".  If the breakpoint is NOT hit, doesn't that imply that the thread is NOT running? Please confirm. Eliminating the possibility of thread stack overflows is very helpful, but if the threads are not running, this doesn't really mean much. Have you used the VDK debug windows to verify that the expected threads do run (even if only temporarily until the lock-up occurs)?

    Apart from that, are the semaphores that you are pending on located in internal or external memory? During debug, can you confirm that these semaphores are being set? If external, is it cacheable memory? And if cache is enabled, can you provide your cache settings? I am asking this because of the potential for cache line fills/writes and/or external core read/write operations being held off due to the increased PPI/Ethernet throughput rates, as the DMAs associated with them may be holding off the core accesses required to fetch the cache line externally.  To this, what is the setting of your EBIU_AMGCTL register (specifically the CDPRIO bit, bit 8)? 

    Further along that debug path, can you provide more detail around the "slightly higher" throughput rates?  Did your CLKIN design also change, or are you simply increasing the traffic on both the parallel port and the EMAC while also introducing the hardware changes for the PHY/SDRAM? Have you tried testing the NEW hardware at or (if the CLKIN design changed and you "rounded up" the clocking multipliers) slightly below the original rates to determine whether your hardware changes should be investigated further?

    When you say that the problem occurs only when both EMAC and PPI are enabled together, does that mean that you've separately verified that both the PPI and the EMAC individually work fine (i.e., the "normal" PPI operations work fine if EMAC isn't running, and the Ethernet stack runs well as long as the PPI isn't enabled)? Given the SDRAM sweep passes, the only other hardware change is the PHY, so assuming the Ethernet comms work well without PPI, the focus should be on the software and EBIU bandwidth utilization, given the increased throughput rates on both peripherals.

    I'd think that the semaphore being set/cleared would eventually lead to the expected application flow, regardless of any held-off accesses, but I am not sure what the "__wab7 + xxx" signifies, as I am not a tools developer. I tried simulating a TCP/IP application to see if it is in the LwIP stack or VDK and don't show a __wab7 in the symbol map (though there are __wab0/2/3/5/8). I am using VisualDSP++ 5.1.2, which version are you using? In addition to that, if you could provide some context around this __wab7 label, it might help me in poking around on my side to try to understand where you're hanging.

    Finally, speaking of the hang itself, I just want to make sure I understand it fully. When it happens, you sit in the idle thread but continue to service PPI interrupts correctly (which SHOULD be setting these semaphores) until the overflow occurs (indicating that the PPI DMA to move the receive data to the destination buffer didn't occur in time).  Do the DMA registers agree with this conclusion?  Is the PPI buffer in internal or external memory?

    -Joe

  • This question has been assumed as answered either offline via email or with a multi-part answer. This question has now been closed out. If you have an inquiry related to this topic please post a new question in the applicable product forum.

    Thank you,
    EZ Admin