Memory management for a FIFO Queue

In embedded systems, microcontrollers frequently communicate with other devices via various serial communication protocols, such as SPI, I2C, USB, UART, or even proprietary protocols tailored for specific systems. Should the data transfer bus encounter a stall, such as bus contention, the device must efficiently buffer the data in a FIFO queue while the system resolves the issue through arbitration. If arbitration takes too long, buffering can consume significant memory, underscoring the importance of efficient memory management in such scenarios.

Problem Statement:

A legacy system has two microcontrollers taking care of the entire operation of system.

A microcontroller (MCU A) governed by FreeRTOS has one task to transmit data & one task to receive data from another microcontroller (MCU B) governed by FreeRTOS. These two MCUs communicate with each other over UART with a proprietary handshaking mechanism & checksum for every data packet. In case of bus contention, MCU B gets precedence over MCU A for data transfer after stalling the bus for 2 seconds. The implementation of code on MCU B is unknown, but it is guaranteed that MCU B will strictly follow the rules of handshake and checksum for communication.

There are 10 other tasks running on MCU A, and any task can invoke transmit (TxTask) task to send data to MCU B. Length of data packet will vary based on which task is invoking TxTask. Any task can invoke TxTask at any point of time to send data to MCU B. During critical computations phase, a data packet of length 60 bytes is sent to MCU B every 50 milliseconds. During other times, the TxTask is not very busy.

The length of largest data packet being sent to MCU B is 1500 bytes (including handshake and checksum). The length of largest data packet received from MCU B is 2000 bytes (including handshake and checksum). The existing structure that holds the data packet is as follows:

Each task that invokes the TxTask will add an element of txdatapacket_t type to a FreeRTOS message queue of depth 2000. Then, the TxTask will take each element out one by one and send it to MCU B over UART.

Currently, the data is being buffered in an external RAM of 4MB, but we want to get rid of this external RAM and use the internal 32kB RAM for buffering. Implement an efficient memory management scheme for buffering the data.

Proposed solution:

Observations based on problem statement:

The depth of FreeRTOS message queue holding elements of txdatapacket_t type is 2000. One txdatapacket_t element occupies 2058 bytes (assuming no byte padding). Hence, 2000 elements will occupy 4116kB (3.925MB).
During critical computations phase, a message of length 60 bytes is sent every 50ms. Assuming that bus contention occurs during this phase, considering a time of 2 seconds for arbitration, we will need to buffer 40 messages. Given the current implementation, it will take approximately 80kB of memory to buffer the critical computation data during arbitration.
The available memory for new buffering is only 32kB, which is not enough to buffer critical computation data during arbitration phase. However, a major design flaw in legacy system is that it is wasting a huge chunk of memory for every data packet, even though the length of payload is very very small. Currently, each data packet has ability to hold a payload of 2048 bytes, however, the length of largest data packet being transmitted including handshake and checksum is only 1500.

Solution:

Change the message queue to accept (txdatapacket_t *) elements (pointers to the data packet, and not the entire data packet itself) and change the depth to 100.
Calculate the length of payload at runtime, and only reserve required amount memory instead of reserving 2048 bytes for a smaller payload.
Use the 32kB memory in a circular buffer fashion.
Change the txdatapacket_t structure to have a maximum payload of 1550 bytes.
To avoid splitting packet in case of rollover, we will skip the last memory chunk and jump to the beginning of the buffer.

Implementation:

New txdatapacket_t structure:

Changing the FreeRTOS message queue in main.c:

QueueHandle_t txQueue;
//create a queue capable of containing 100 pointers
//to txdatapacket_t structures
txQueue = xQueueCreate(100, (txdatapacket_t *));

if (txQueue == NULL)
{
    //failed to create queue, handle error
    ErrorHandler();
}

Implementation of the 32kb memory management buffer:

#include <stdio.h>
#include <stdint.h>

#define RAM32_KB_BASE_ADDR      (0x50000000)
#define RAM32_KB_END_ADDR       (0x50007FFF)

#define TX_BUFFER_BASE_ADDR     (RAM32_KB_BASE_ADDR)
#define TX_BUFFER_END_ADDR      (RAM32_KB_END_ADDR)
#define TX_BUFFER_SIZE          (TX_BUFFER_END_ADDR-TX_BUFFER_BASE_ADDR+1)

//Global variables
char * const BufferLoc = (char *)TX_BUFFER_BASE_ADDR;
char * LastReadLocation = (char *)TX_BUFFER_BASE_ADDR;
char * NextFreeLocation = (char *)TX_BUFFER_BASE_ADDR;
int32_t BufferLength = 0;

/****************************************************/
//called from CreateTxDataPacket()
uint8_t *GetNextFreeLocation(uint16_t length)
{
    //check if the buffer is full
    if (BufferLength > TX_BUFFER_SIZE)
    {
        return NULL;
    }

    //check if length is greater than
    //available memory in buffer
    if (length >= (TX_BUFFER_SIZE - BufferLength))
    {
        return NULL;
    }

    //packet seems to fit in the buffer
    //get current free location
    char *CurrentFreeLocation = NextFreeLocation;

    //check how close we are to the end of the buffer
    if (
        ((((uint32_t)(TX_BUFFER_END_ADDR)) -
         ((uint32_t)(CurrentFreeLocation))) >= length)
       )
    {
        //we are far enough from the end of the buffer
        NextFreeLocation = CurrentFreeLocation + length;
    }
    else //packet will not fit at the end, we do not want to
        //split packet in half, so check if we are far enough from
        //the beginning of the buffer
    if (
        ((((uint32_t)(LastReadLocation)) -
        ((uint32_t)(TX_BUFFER_END_ADDR))) >= length)
       )
    {
        //we are far enough from the beginning of the buffer
        //skip the last chunk and start at the beginning
        CurrentFreeLocation = BufferLoc;
        NextFreeLocation = CurrentFreeLocation + length;
    }
    else
    {
        //even though there is enough space to fit the
        //packet in buffer, we don't want to split it in two pieces
        //since there is no linear space at bottom or top to fit the
        //packet without splitting it, we return NULL
        return NULL;
    }

    //everything went well, update buffer length
    BufferLength += length;

    return CurrentFreeLocation;
}

/****************************************************/
//called from TxTask after one packet is
//sent out successfully
void UpdateLastReadLocation(uint16_t length)
{
    LastReadLocation += length;

    if (
        ((uint32_t)(LastReadLocation)) >=
        ((uint32_t)(TX_BUFFER_END_ADDR))
       )
    {
        //length exceeded the buffer size
        //according to design, if the packet does not fit
        //at the bottom, we start the packet at the top
        //since packet started at the top, read pointer will
        //be at length-1 bytes
        LastReadLocation = BufferLoc + length - 1;
    }

    //update buffer length
    BufferLength -= length;
    if (BufferLength < 0)
    {
        //negative buffer length is
        //not possible
        BufferLength = 0;
    }
}

Implementation of CreateTxDataPacket():

CreateTxDataPacket() can be called by any task with the length of payload, a specific message ID, and payload. If creation of data packet is successful and TxTask is invoked successfully, this function returns true. False otherwise.

bool CreateTxDataPacket(uint8_t MSG_ID, uint16_t PayloadLength, char *payload)
{
    //we already have payload length, calculate
    //length of data packet
    uint16_t packetLen = PayloadLength + 10; //handshake+checksum

    txdatapacket_t *dataPacket = (txdatapacket_t*) GetNextFreeLocation(packetLen);

    if (datapacket == NULL)
    {
        //error handling mechanism unknown
        ErrorHandler();
        return false;
    }

    //populate handshake data, assuming it depends on message id
    datapacket->handshakeData = (handshake_t) COMM_HANDSHAKE_DATA[MSG_ID];

    char *ptr = datapacket->data;

    //populate payload in the data packet
    while(PayloadLength--)
    {
        *ptr++ = *payload++;
    }

    //method to calculate checksum is unknown
    datapacket->checksum = GetChecksum(payload);

    //invoke TxTask here
    if (xQueueSend(txQueue, datapacket, 0) != pdTRUE)
    {
        //error handling mechanism unknown
        ErrorHandler();
        return false;
    }

    return true;
}

This new implementation efficiently uses only 32kB of memory as opposed to 4MB in the legacy system.

If bus contention occurs during critical computations phase, the new buffer will only end up using 2800 bytes during arbitration as opposed to 82320 in the legacy system. If the bus is stalled for more than 2 seconds, the message queue will overflow, indicating something else is wrong with the bus.

Although the structure that holds the data itself is 1560 bytes, it will only use packetLen number of bytes from the buffer, preventing memory corruption.

Conclusion

By reducing the depth of queue and changing the queue to accept the pointers instead of the elements can reduce the memory footprint on RAM significantly. Furthermore, by efficiently calculating packet length at runtime and reserving memory in buffer instead of reserving a huge chunk of data and wasting most of it will help us get rid of the external 4MB RAM.