Post Go back to editing

FreeRTOS port for BF70x, ADI OSAL with Thread Local Storage (TLS) and multi-threading C standard library (libcmt)

Hi all,

we've recently updated to CrossCore Embedded Studio (CCES) v2.10.0. Within that context we've also setup our CCES project from scratch and switched to the most recent FreeRTOS port from Analog Devices (adi-freertos-1.5.0 at that time). We've had issues with the system services and device drivers (SSLDD) before - but has been more or less been fixed by not using the prebuilt ones but compiling and linking them from their sources (as documented in the User Guide for the FreeRTOS port).

When updating the FreeRTOS port we also stumbled across the following lines in the head of adi_osal_ThreadSlotAcquire() in the old version of %RTOS_DIR%\FreeRTOS\portable\CCES\osal\adi_osal_freertos_tls.c:

ADI_OSAL_STATUS
adi_osal_ThreadSlotAcquire(ADI_OSAL_TLS_SLOT_KEY     *pnThreadSlotKey,
                           ADI_OSAL_TLS_CALLBACK_PTR pTerminateCallbackFunc)
{
	/* FreeRTOS can't currently support destroy callbacks */
//    if (NULL != pTerminateCallbackFunc)
//	{
//		return ADI_OSAL_OS_ERROR;
//	}
...

So we've commented out the nullpointer check of pTerminateCallbackFunc for some reason - it must have been a workaround back in April/May 2019 which seemed to do the trick for us. As this seemed a bit hacky to us, we've removed the // so that we have the original code for adi_osal_freertos.c  as in the most recent version of the FreeRTOS port. However, not commenting out these few lines makes our application crash.

The following happens:

A red error message appears in the Console view of CCES. This presumably happens because adi_fatal_error is called and there's an automatic breakpoint in CCES which lets the toolchain fetch the content of some registers, interpret it and print out of the message. The execution flow then jumps to fatal_error where the processor is kept in an idle endless loop.

  

Why is adi_fatal_error called and who calls it? The call stack that I've manually identified and verified is: osal_check (defined as inline function in osal_sync.h) --> _osal_sync_error (defined in osal_error.c) --> adi_fatal_error. _osal_sync_error is only called by osal_check when the argument status does not equal ADI_OSAL_SUCCESS (i.e. an error must have occured).

In _osal_sync_error status is directly handed over to adi_fatal_error alongside with _AFE_G_LibraryError and _AFE_S_OSALBindingError (as seen in the red error message).

Where do we come from? To narrow it further down we've put a breakpoint and checked the RETS register. This lead us to a function call inside of _adi_rtl_get_tls_ptr.

According to the linker output file _adi_rtl_get_tls_ptr is part of libcmt. I've found the sources in %CCES_HOME%\Blackfin\lib\src\libc\osal_tls.c. However modified the source code and re-building the project lead me to the conclusion that the prebuilt C standard libary is linked. This was proved by renaming libcmt.dlb to _libcmt.dlb resulting in a linker error.

Is the prebuilt library compatible with FreeRTOS and the FreeRTOS port? If not, what steps are neccessary to compile and link libcmt from sources?

Did we miss some compiler or linker settings? Did we miss to properly configure FreeRTOS? Are we missing something else or is this a bug and commenting out the nullpointer check in adi_osal_ThreadSlotAcquire is still a valid workaround and should be included in the next release of the FreeRTOS port/ ADI OSAL? Has anybody else faced similar issues?

  

Edit: Is there a way to replace adi_fatal_error by a custom error handling function similar to an exception handler?

Edit: Which role does the Add-In "SSL/DD Add-in (Build ID: 1.0.0)" play here (if any)? It is not installes by default - is it of relevance to this problem?

Best regards,
Matthias



Corrected source code location of adi_osal_ThreadSlotAcquire(): adi_osal_freertos_tls.c
[edited by: matthiaswe at 7:53 AM (GMT -4) on 13 Sep 2021]
  • Hi Matthias,

    So firstly, the prebuild libc (libcmt.dlb) is compatible with FreeRTOS, the mt version will call via the OSAL abstraction layer to use the OS functionality. The only library which isn't compatible out of the box is the ssl/dd which I'll cover below.

    You can replace the adi_fatal_error by providing a new definition of it. The existing source can be found in ${CrossCore}/Blackfin/lib/src/libc/adi_fatal_error.asm.

    You could presumably replace __osal_sync_error instead if you only wanted to catch the sync error.

    The SSL/DD Add-In is important because the SSL/DD libraries need some structures which are appropriately sized for FreeRTOS (this is not currently the default size but we'll address that in future) and so need to be re-built to use with FreeRTOS. You can either do that manually (as it sounds like you are doing) or via the Add-In.

    osal_check() is called regularly so I can't suggest where it'll have called adi_fatal_error. Can you place a breakpoint on __osal_sync_error() to see if the call stack indicates where the error has come from?

    Regards,
    Murray

  • Hi Murray,

    thank you for your answer.

    So the call stack looks like:

    • some function (probably adi_rtl_get_tls_ptris calling adi_osal_ThreadSlotAcq()
    • the same function calls osal_check() to evaluate the return value
    • inside of adi_osal_ThreadSlotAcq() the pointer pTerminateCallbackFunc might be a NULL-pointer or another error occurs,
      • therefore it returns ADI_OSAL_OS_ERROR (or another value != ADI_OSAL_SUCCESS)
      • the callee then calls osal_check() which in turn detects the error, _osal_sync_error() is called and finally adi_fatal_error()

    We've already put a breakpoint in _osal_sync_error() before, please see the initial post. There is no valid call stack visible in the CCES IDE (see screenshot above). But instead I've evaluated the content of the RETS register leading me to adi_rtl_get_tls_ptr().

    I think it might be see the value of the third argument of adi_fatal_error(): the status value as it should give the return value of adi_osal_ThreadSlotAcq(). The documentation of adi_fatal_error() states that the value is written to the global variable adi_fatal_error_value. This value does not seem to be printed on the CCES error console when the automatic breakpoint is hit (to check if it really was ADI_OSAL_OS_ERROR or some other value):

    It's good to know that this is not related to the SSL/DD libaries.

    Best regards,
    Matthias

  • Hi Matthias,

    You are correct, __osal_sync_error will pass the status as the value (3rd param) to adi_fatal_error() which will be written to adi_fatal_error_value.

    This value should be shown on the console, but isn't for this error so I'll log an issue to add the support in a future release.

    Regards,
    Murray

  • Hi Murray,

    thank you for your reply. We've replaced the default implementation of __osal_sync_error() with a custom handler function. adi_osal_ThreadSlotAcquire() returns ADI_OSAL_OS_ERROR.

    We've been able to track the error down a little more (with a lot of C library source code and assembly code research). The error occurs when calling rand() in a multi-threading environment.

    It was not too easy to get a valid call stack as you need to step the assembly code until the right point (e.g. after an RTS instruction):

    We've run out of ideas and have experimentally tried to change the  FreeRTOS configuration value configNUM_THREAD_LOCAL_STORAGE_POINTERS from 1 to 16 - which didn't fix it. We've done it because srand() (which might be an alias of rand()) uses two TLV locations _TLV(__randstate).init and _TLV(__randstate).seed.

    How can this error be tracked down further?

    is the NULL-pointer check in adi_osal_ThreadSlotAcquire() valid?

    Is it allowed or forbidden to call rand() in a MT FreeRTOS environment?

    Regards,
    Matthias

  • Hi all,

    any thoughts on this?

    Best regards,
    Matthias

  • Hi Matthias,

    Sorry for the delay. I'm looking at this now and hope to have a proper answer for you soon.
    In the mean time, it looks like its safe to remove the NULL pointer check at the start of adi_osal_ThreadSlotAcquire() as a workaround.

    Regards,
    Murray