Skip to content

Memory Profiling and Optimization

Introduction

Cortex-M Program Image and Memory

In this codelab, we will first explore the concept of a Cortex-M program image. We will then examine different memory profiling techniques for Zephyr RTOS programs, illustrating some classic issues associated with static or dynamic memory usage.

What you’ll build

  • You will modify your BikeComputer program to improve your understanding of its program image and memory map.
  • You will instrument your BikeComputer program to perform dynamic memory analysis.
  • You will modify the BikeComputer program to create memory issues on purpose.

What you’ll learn

  • You will understand how a Cortex-M program image is made and how it is used for starting your program on a Cortex-M device.
  • You will understand the boot sequence of a Zephyr RTOS program.
  • You will understand how a Zephyr RTOS program memory is organized in RAM and how to trace dynamic memory allocations.

What you’ll need

  • Zephyr Development Environment for developing and debugging your program in C++.
  • All BikeComputer and the multi-tasking codelabs are prerequisites for this codelab.

The Program Image

A Cortex-M program image or executable file (e.g. the .elf file on your computer) refers to a piece of code that is ready to execute. The image can occupy up to 512 MiB of memory space, ranging from address 0x00000000 to address 0x1FFFFFFF, as shown in Figure 1 for the nRF5340 MCU.

nRF5340 system memory map

Figure 1: nRF5340 MemoryMap

The program image is usually stored in non-volatile memory such as on-chip Flash memory and it is normally separated from the program data, which is allocated in the SRAM or data region of the code memory space.

To build the program image, the toolchain uses the dtsi file that defines the different memory regions/partitions. These partitions are then used and defined in the dts file. These files are self-documenting, so you can easily recognise the ROM and RAM region definitions.

Based on this information, the toolchain produces a program image (CODE region) that corresponds to the map depicted in the figure above.

Analysing the ELF file produced by the linker can help to better understand this program image. Among other tools, the standard GNU binutils and other specific Zephyr RTOS tools are useful:

  • arm-zephyr-eabi-size prints information about text, data and bss sections such as
    console
       text    data     bss     dec     hex filename
       0x2396c   0x1e4 0x34fce  363294   58b1e build\zephyr\zephyr.elf
    

or more detailed views such as

arm-zephyr-eabi-size output
console
build\zephyr\zephyr.elf  :
section                             size         addr
rom_start                          0x154          0x0
text                             0x1479c        0x154
.ARM.extab                         0x134      0x148f0
.ARM.exidx                         0x250      0x14a24
initlevel                           0x78      0x14c74
device_area                        0x120      0x14cec
sw_isr_table                       0x228      0x14e0c
gpio_driver_api_area                0x24      0x15034
i2c_driver_api_area                 0x18      0x15058
sensor_driver_api_area              0x1c      0x15070
spi_driver_api_area                  0x8      0x1508c
clock_control_driver_api_area       0x1c      0x15094
display_driver_api_area             0x2c      0x150b0
mipi_dbi_driver_api_area            0x18      0x150dc
uart_driver_api_area                 0xc      0x150f4
init_array                          0x14      0x15100
log_const_area                      0xa8      0x15114
log_backend_area                    0x10      0x151bc
tbss                                 0x8      0x151cc
rodata                            0xe7b4      0x151d0
.ramfunc                             0x0   0x20000000
datas                              0x15c   0x20000000
device_states                       0x14   0x2000015c
log_mpsc_pbuf_area                  0x40   0x20000170
log_msg_ptr_area                     0x4   0x200001b0
k_heap_area                         0x18   0x200001b4
.comment                            0x20          0x0
.debug_aranges                    0x3a20          0x0
.debug_info                     0x1008c6          0x0
.debug_abbrev                    0x1eed9          0x0
.debug_line                      0x4836a          0x0
.debug_frame                      0xe2e0          0x0
.debug_str                       0x5ab14          0x0
.debug_loc                       0x54c2d          0x0
.debug_ranges                     0x5e28          0x0
.ARM.attributes                     0x38          0x0
.last_section                        0x4      0x23b50
bss                                0xe86   0x200001d0
noinit                           0x34140   0x20001058
Total                           0x2878e8
  • arm-zephyr-eabi-readelf provides a more detailed view of the different memory sections:
arm-zephyr-eabi-readelf output
console
There are 43 section headers, starting at offset 0x27b454:

Section Headers:
[Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[ 0]                   NULL            00000000 000000 000000 00      0   0  0
[ 1] rom_start         PROGBITS        00000000 000100 000154 00  AX  0   0  4
[ 2] text              PROGBITS        00000154 000254 01479c 00  AX  0   0  4
[ 3] .ARM.extab        PROGBITS        000148f0 0149f0 000134 00   A  0   0  4
[ 4] .ARM.exidx        ARM_EXIDX       00014a24 014b24 000250 00  AL  2   0  4
[ 5] initlevel         PROGBITS        00014c74 014d74 000078 00   A  0   0  4
[ 6] device_area       PROGBITS        00014cec 014dec 000120 00   A  0   0  4
[ 7] sw_isr_table      PROGBITS        00014e0c 014f0c 000228 00   A  0   0  4
[ 8] gpio_driver_[...] PROGBITS        00015034 015134 000024 00   A  0   0  4
[ 9] i2c_driver_a[...] PROGBITS        00015058 015158 000018 00   A  0   0  4
[10] sensor_drive[...] PROGBITS        00015070 015170 00001c 00   A  0   0  4
[11] spi_driver_a[...] PROGBITS        0001508c 01518c 000008 00   A  0   0  4
[12] clock_contro[...] PROGBITS        00015094 015194 00001c 00   A  0   0  4
[13] display_driv[...] PROGBITS        000150b0 0151b0 00002c 00   A  0   0  4
[14] mipi_dbi_dri[...] PROGBITS        000150dc 0151dc 000018 00   A  0   0  4
[15] uart_driver_[...] PROGBITS        000150f4 0151f4 00000c 00   A  0   0  4
[16] init_array        INIT_ARRAY      00015100 015200 000014 04  WA  0   0  4
[17] log_const_area    PROGBITS        00015114 015214 0000a8 00   A  0   0  4
[18] log_backend_area  PROGBITS        000151bc 0152bc 000010 00   A  0   0  4
[19] tbss              NOBITS          000151cc 0152cc 000008 00 WAT  0   0  4
[20] rodata            PROGBITS        000151d0 0152d0 00e7b4 00   A  0   0 16
[21] .ramfunc          PROGBITS        20000000 023c54 000000 00   W  0   0  1
[22] datas             PROGBITS        20000000 023a84 00015c 00  WA  0   0  4
[23] device_states     PROGBITS        2000015c 023be0 000014 00  WA  0   0  1
[24] log_mpsc_pbu[...] PROGBITS        20000170 023bf4 000040 00  WA  0   0  4
[25] log_msg_ptr_area  PROGBITS        200001b0 023c34 000004 00  WA  0   0  4
[26] k_heap_area       PROGBITS        200001b4 023c38 000018 00  WA  0   0  4
[27] .comment          PROGBITS        00000000 023c54 000020 01  MS  0   0  1
[28] .debug_aranges    PROGBITS        00000000 023c78 003a20 00      0   0  8
[29] .debug_info       PROGBITS        00000000 027698 1008c6 00      0   0  1
[30] .debug_abbrev     PROGBITS        00000000 127f5e 01eed9 00      0   0  1
[31] .debug_line       PROGBITS        00000000 146e37 04836a 00      0   0  1
[32] .debug_frame      PROGBITS        00000000 18f1a4 00e2e0 00      0   0  4
[33] .debug_str        PROGBITS        00000000 19d484 05ab14 01  MS  0   0  1
[34] .debug_loc        PROGBITS        00000000 1f7f98 054c2d 00      0   0  1
[35] .debug_ranges     PROGBITS        00000000 24cbc8 005e28 00      0   0  8
[36] .ARM.attributes   ARM_ATTRIBUTES  00000000 2529f0 000038 00      0   0  1
[37] .last_section     PROGBITS        00023b50 023c50 000004 00  WA  0   0  4
[38] bss               NOBITS          200001d0 023c58 000e86 00  WA  0   0  8
[39] noinit            NOBITS          20001058 023c58 034140 00  WA  0   0  8
[40] .symtab           SYMTAB          00000000 252a28 013e50 10     41 3216  4
[41] .strtab           STRTAB          00000000 266878 0149a9 00      0   0  1
[42] .shstrtab         STRTAB          00000000 27b221 000233 00      0   0  1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), y (purecode), p (processor specific)
  • Use west to generate a ROM and a RAM report using west build -t rom_report > rom_report.txt or west build -t ram_report > ram_report after a build.

  • Use the full memory map file generated by the linker (“build/zephyr/zephyr.map”)

The Boot Sequence and Memory Initialization

Understand the Early Boot Sequence

Upon reset, a startup code is executed by the Cortex-M processor. The startup code is specific to each platform and toolchain, but it usually consists of

  1. setting the initial SP,
  2. setting the initial PC to the reset handler function,
  3. setting the vector table entries with the exceptions ISR addresses, and
  4. branching to initialization of the C library, which eventually switches to the main thread that executes the main() function of your program.

When we use the command arm-zephyr-eabi-objdump.exe -D -G on the elf file, we get an output similar to the following:

Elf disassembled
   Disassembly of section rom_start:
   00000000 <_vector_table>:
     0:   20034d98        mulcs   r3, r8, sp
     4:   00005b45        andeq   r5, r0, r5, asr #22

From this output and from the output of the other tools described above, we can observe that:

  • The code region begins with the vector table, with the first entry corresponding to the main stack pointer. Note that objdump has attempted to disassemble the stack pointer and the program counter (PC) as instructions, as it cannot distinguish between code and data. When starting up, the ARM processor fetches whatever is at the address 0x0000_0000 and assumes that it is the stack pointer value. In our case, this corresponds to the address 0x20034d98, which is defined as the pointer to the stack end (recall that the stack grows downwards). As can be seen in the dump file, this corresponds to the system workqueue stack of Zephyr RTOS.
Elf disassembled
   ...
   20034d98 <sys_work_q_stack>:
  • The next entry in the vector table specifies the jump location for starting the program upon reset. This corresponds to the start of the system’s early boot sequence. Looking at the code reveals that the SystemInit function is called first, followed by other functions such as z_early_memset and z_prep_c. All of these functions are defined in the Zephyr RTOS library.

Kernel Initialization

The bootup sequence is illustrated in the diagram below:

Bootup sequence

Bootup sequence

(source: https://academy.nordicsemi.com/courses/nrf-connect-sdk-intermediate/lessons/lesson-1-zephyr-rtos-advanced/topic/boot-up-sequence-execution-context/)

It is important to note the following points:

  • The early bootup sequence starts with the reset handler. This initialization phase makes the system ready for kernel initialization.
  • The configuration levels (PRE_KERNEL1, PRE_KERNEL2, and POST_KERNEL) allow to specify in which order initialization code is executed. What is executed in each phase depends on the platform itself and on the application configuration (as described in the prj.conf file).
  • The kernel initialization makes the kernel ready for the application. Static devices and board-level subsystems (e.g. clock, basic peripherals, console/UART/RTT) are initialized during this phase.
  • After kernel initialization, the scheduler and essential threads (e.g. the main and idle threads) are started; after that Zephyr prints its boot banner (if enabled).
  • After kernel+post-kernel setup, board- and application-specific modules, drivers and threads (as defined in “prj.conf” and DTS overlays) are initialized and started.

Static Memory Analysis Using memap

As explained in the previous section, Zephyr RTOS provides a number of utility tools that help you to understand how the program image is structured and how memory space is used by any Zephyr RTOS application. Running the command west build -t rom_report > rom_report.txt will cause the file “rom_report.txt” to contain detailed information on how the program image is structured. Searching for “bike_system.cpp” or other application-specific files will produce an output similar to the following:

rom_report output

│   │       ├── main.cpp                            172   0.12%  - 
│   │       │   ├── log_const_bike_computer           8   0.01%  0x00015134 log_const_area
│   │       │   └── main                            164   0.11%  0x000022f5 text
│   │       ├── multi_tasking                      2468   1.69%  - 
│   │       │   ├── bike_system.cpp                1842   1.26%  - 

In this report, text refers to code application, rodata refers to read-only constants and datas refers to non-zero initialization data. All ot these are stored in Flash/ROM. datas are values stored in ROM that will serve to initialize variables stored in RAM.

The RAM report display how RAM is allocated:

ram_report output

│   ├── _ZGVZN13bike_computer9getFont16EvE7pFont16    4   0.00%  0x20000a20 bss
│   ├── _ZGVZN13bike_computer9getFont18EvE7pFont1     4   0.00%  0x20000a28 bss

Searching for “bike_computer” in this file shows that few global variables are allocated by the application itself in the BikeComputer program. Reducing the use of global variables is usually good practice.

Understand What Memory Goes Where

To better understand what memory goes where, add the following code in your “main.cpp” file

main_modified.cpp
...
const char szMsg[] = "This is a test message";
static constexpr uint8_t size = 10;
uint32_t randomArray[size] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint32_t randomNumber = 0;
...

int main() {
  ...

  tr_info(szMsg);
  for (uint8_t i = 0; i < size; i++) {
    randomArray[i] = rand();
    tr_info("This is a random number %d", randomArray[i]);
  }
  randomNumber = rand();
  tr_info("This is a random number %d", randomNumber);

  ...
}

Exercice Memory Profiling and Optimization/1

Observe the change in the rom and ram reports for each individual change documented above. Look at how each text, data and bss section is modified for each change and give an explanation.

Reducing Memory Usage by Tuning the Zephyr RTOS Configuration

Both flash memory and RAM sizes are limited on most microcontrollers. Reducing the memory footprint of an application can help you squeeze in more features or reduce cost. This can be done by applying a number of application configuration, as documented in the Zephyr RTOS documentation. One simple way of reducing the RAM/ROM footprint is to optimize a number of runtime parameters such as the stack size or the use of GPIOs. It is also possible to optimize the footprint by configuring the kernel so that only the required kernel features are used.

The use of the C library with the minimal footprint size is also another optimization parameter.

Exercice Memory Profiling and Optimization/2

Compile your application by toggling the following configuration parameters and compare both ROM and RAM footprints:

  • CONFIG_SIZE_OPTIMIZATIONS vs CONFIG_DEBUG_OPTIMIZATIONS
  • Disable/enable all logging configuration parameters, including CONFIG_BOOT_BANNER.
  • Disable/enable FP support in printf with the CONFIG_CBPRINTF_FP_SUPPORT option.
  • Disable/enable nano implementation of the printf functions, with the CONFIG_CBPRINTF_NANO option.

Runtime Memory Tracing

Static memory analysis is required and powerful for analyzing how the program memory is organized at compile time. However, it is also very useful to analyze how an embedded software deals with dynamic memory allocations, both for the heap and stack memory. A program that behaves poorly in terms of dynamic memory allocations will become unstable and will potentially crash.

With Zephyr RTOS, the developer can use memory statistics functions to capture heap use, cumulative stack use or stack use for each thread at runtime. To enable memory use monitoring, you must enable the following Zephyr RTOS configuration options:

bike_computer/prj.conf
CONFIG_THREAD_STACK_INFO=y
CONFIG_THREAD_ANALYZER=y
CONFIG_SYS_HEAP_RUNTIME_STATS=y

Once you enable memory statistics, you may instrument the code and do memory checks at regular intervals or upon requests. This can be implemented with the help of the zpp_lib::Utils class.

By performing a detailed dynamic memory analysis, it is then possible to optimize some parameters, such as reducing the allocated stack size for a given thread or optimizing the use of the heap.

Configuring the use of the heap on Zephyr RTOS

Zephyr RTOS supports different heaps and depending on the application configuration, calls to malloc and new may use different heaps. In the zpp_lib::Utils implementation, the implementation of the new and delete operators is overriden when CONFIG_SYS_HEAP_RUNTIME_STATS is enabled. This ensures that k_malloc and k_free are used and enables heap statistics. This mechanism should, of course, not be used when heap statistics are not needed.

Exercice Memory Profiling and Optimization/3

Instrument the dynamic memory usage of your BikeComputer program with the use of the zpp_lib::Utils class. Use both the zpp_lib::Utils::logThreadsStackInfo and zpp_lib::Utils::logHeapSummary method at regular intervals. After startup, you should observe that your program does not allocate any memory on the heap and that the stack use is also not growing anymore.

By observing the statistics on the console, you should be able to optimize your application configuration and to reduce its RAM usage.

WORK IN PROGRESS

Hunting For Memory Bugs

Detecting a Heap Allocation Error (Memory Leak)

For illustrating analysis of the heap memory, one practical example is the introduction of a memory leak in the code. A memory leak is created when memory allocations are managed in such a way that memory which is NO longer needed is NOT released. For this purpose, you may add a call for allocating memory and not releasing it in a method called at regular intervals. Be aware that allocating memory without using it is not enough, since the compiler will optimize your code and remove unused statements (like allocating an array and only assigning values to the array elements).

If you create a memory leak by creating an instance of the class MemoryLeak below in one of the task method your BikeComputer program and let your program run, you should observe that the allocated memory on the heap grows constantly and ultimately you should observe a crash as illustrated in the log below:

MemoryLeak class
multi-tasking/memory_leak.hpp
#pragma once

#include "mbed.h"

namespace multi_tasking {

class MemoryLeak {
   public:
    static constexpr uint16_t kArraySize = 1024;

    // create a memory leak in the constructor itself
    MemoryLeak() { _ptr = new int[kArraySize]; }

    void use() {
        for (uint16_t i = 0; i < kArraySize; i++) {
            _ptr[i] = i;
        }
    }

   private:
    int* _ptr;
};

}  // namespace multi_tasking
Console
++ MbedOS Error Info ++
Error Status: 0x8001011F Code: 287 Module: 1
Error Message: Operator new[] out of memory

Location: 0x800F025
File: mbed_retarget.cpp+1848
Error Value: 0x5000
Current Thread: main Id: 0x240035B0 Entry: 0x8013581 StackSize: 0x2000 StackMem: 0x24000C20 SP: 0x240022E4 
Next:
main  State: 0x2 Entry: 0x08013581 Stack Size: 0x00002000 Mem: 0x24000C20 SP: 0x240022C8
Ready:
rtx_idle  State: 0x1 Entry: 0x080143A9 Stack Size: 0x00000380 Mem: 0x240035F8 SP: 0x24003928
Wait:
rtx_timer  State: 0x83 Entry: 0x08015081 Stack Size: 0x00000300 Mem: 0x24003978 SP: 0x24003C18
Delay:
For more info, visit: https://mbed.com/s/error?error=0x8001011F&osver=61700&core=0x411FC271&comp=1&ver=6160001&tgt=DISCO_H747I

Note that for getting additional error information, you need to modify the Mbed OS configuration as illustrated below:

mbed_app.json
"target_overrides": {
  "*": {
    "platform.error-all-threads-info":  1,
    "platform.error-filename-capture-enabled": 1
  }
}

From the error log above, we can observe that the system cannot allocate a specific object from the operator new() called from the main thread. We also know that the error happens at line 1848 of the “mbed_retarget.cpp” file.

Heap Fragmentation

A problem that is even more complex to detect is the problem of heap fragmentation. Heap fragmentation is a phenomenon that creates small fragments of memory in the heap space in a way that makes the largest available block of memory smaller and smaller as compared to the total available memory. The fragmentation level can be computed as a ratio between the largest available block of memory and the total available memory:

\(fragmentation = 1 - \frac{largest\ available\ block}{total\ available\ memory}\)

If the fragmentation is \(50\%\) and the available memory is 1  KiB, then the largest available block is 512  bytes. Fragmentation tends to increase over the lifetime of a program and on embedded systems running C++ programs, there is no way of defragmenting the heap. Over time, heap fragmentation tends to

  • create unreliable programs: if your program needs a bigger block than the largest available one, it will not get it and will stop working
  • and to degrade program performance: a highly fragmented heap is slower because the memory allocator takes more time to deliver a new allocated block.

These are very good reasons for using heap memory with care on embedded systems.

For illustrating the heap fragmentation phenomenon, you may create use the following MemoryFragmenter class in your BikeComputer program:

MemoryFragmenter class
multi-tasking/memory_fragmenter.hpp
#pragma once

#include "mbed.h"
#include "memory_logger.hpp"

namespace multi_tasking {

class MemoryFragmenter {
   public:
    // create a memory leak in the constructor itself
    MemoryFragmenter() {}

    void fragmentMemory() {
        // create a memory logger
        MemoryLogger memorLogger;

        // get heap info
        mbed_stats_heap_t heapInfo = {0};
        mbed_stats_heap_get(&heapInfo);
        uint32_t availableSize =
            heapInfo.reserved_size - heapInfo.current_size - heapInfo.overhead_size;
        tr_debug("Available heap size is %" PRIu32 " (reserved %" PRIu32 ")",
                 availableSize,
                 heapInfo.reserved_size);

        // divide the available size by 8 blocks that we allocate
        uint32_t blockSize = (availableSize - kMarginSpace) / kNbrOfBlocks;
        tr_debug("Allocating blocks of size %" PRIu32 "", blockSize);
        char* pBlockArray[kNbrOfBlocks] = {NULL};
        for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex++) {
            pBlockArray[blockIndex] = new char[blockSize];
            if (pBlockArray[blockIndex] == NULL) {
                tr_error("Cannot allocate block memory for index %" PRIu32 "",
                         blockIndex);
            }
            tr_debug("Allocated block index  %" PRIu32 " of size  %" PRIu32
                     " at address 0x%08" PRIx32 "",
                     blockIndex,
                     blockSize,
                     (uint32_t)pBlockArray[blockIndex]);
            // copy to member variable to prevent them from being optimized away
            for (uint32_t index = 0; index < kArraySize; index++) {
                _doubleArray[index] += (double)pBlockArray[blockIndex][index];
            }
        }
        // the full heap (or almost) should be allocated
        tr_debug("Heap statistics after full allocation:");
        memorLogger.getAndPrintHeapStatistics();
        // delete only the even blocks
        for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex += 2) {
            delete[] pBlockArray[blockIndex];
            pBlockArray[blockIndex] = NULL;
        }
        // we should have half of the heap space free
        tr_debug("Heap statistics after half deallocation:");
        memorLogger.getAndPrintHeapStatistics();

        // trying to allocated one block that is slightly bigger
        // without fragmentation, this allocation should succeed
        heapInfo = {0};
        mbed_stats_heap_get(&heapInfo);
        availableSize =
            heapInfo.reserved_size - heapInfo.current_size - heapInfo.overhead_size;
        tr_debug("Available heap size is  %" PRIu32 " (reserved  %" PRIu32 ")",
                 availableSize,
                 heapInfo.reserved_size);
        blockSize += 8;
        // this allocation will fail
        tr_debug("Allocating 1 block of size %" PRIu32 " should succeed !", blockSize);
        pBlockArray[0] = new char[blockSize];
        // copy to member variable to prevent them from being optimized away
        for (uint32_t index = 0; index < kArraySize; index++) {
            _doubleArray[index] += (double)pBlockArray[0][index];
        }
    }

   private:
    static constexpr uint8_t kNbrOfBlocks  = 8;
    static constexpr uint16_t kMarginSpace = 1024;
    static constexpr uint8_t kArraySize    = 100;
    double _doubleArray[kArraySize]        = {0};
};

}  // namespace multi_tasking

If you create an instance of this class in your BikeComputer program and call the MemoryFragmenter::fragmentMemory() method, you will observe an error on the console similar to the one shown below:

Console
[DBG ][MemoryFragmenter]: Available heap size is 501308 (reserved 506044)
[DBG ][MemoryFragmenter]: Allocating blocks of size 62535
[DBG ][MemoryFragmenter]: Allocated block index  0 of size  62535 at address 0x240055f0
[DBG ][MemoryFragmenter]: Allocated block index  1 of size  62535 at address 0x24014a48
[DBG ][MemoryFragmenter]: Allocated block index  2 of size  62535 at address 0x24023ea0
[DBG ][MemoryFragmenter]: Allocated block index  3 of size  62535 at address 0x240332f8
[DBG ][MemoryFragmenter]: Allocated block index  4 of size  62535 at address 0x24042750
[DBG ][MemoryFragmenter]: Allocated block index  5 of size  62535 at address 0x24051ba8
[DBG ][MemoryFragmenter]: Allocated block index  6 of size  62535 at address 0x24061000
[DBG ][MemoryFragmenter]: Allocated block index  7 of size  62535 at address 0x24070458
[DBG ][MemoryFragmenter]: Heap statistics after full allocation:
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]:   Bytes allocated currently: 504892
[DBG ][MemoryLogger]:   Max bytes allocated at a given time: 504892
[DBG ][MemoryLogger]:   Cumulative sum of bytes ever allocated: 504892
[DBG ][MemoryLogger]:   Current number of bytes allocated for the heap: 506044
[DBG ][MemoryLogger]:   Current number of allocations: 16
[DBG ][MemoryLogger]:   Number of failed allocations: 0
[DBG ][MemoryFragmenter]: Heap statistics after half deallocation:
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]:   Bytes allocated currently: 254752
[DBG ][MemoryLogger]:   Max bytes allocated at a given time: 504892
[DBG ][MemoryLogger]:   Cumulative sum of bytes ever allocated: 504892
[DBG ][MemoryLogger]:   Current number of bytes allocated for the heap: 506044
[DBG ][MemoryLogger]:   Current number of allocations: 12
[DBG ][MemoryLogger]:   Number of failed allocations: 0
[DBG ][MemoryFragmenter]: Available heap size is  251100 (reserved  506044)
[DBG ][MemoryFragmenter]: Allocating 1 block of size 62543 should succeed !

++ MbedOS Error Info ++
Error Status: 0x8001011F Code: 287 Module: 1
Error Message: Operator new[] out of memory

As you can observe, while the available heap size is 251100 bytes, an allocation of 62543 bytes fails with an out of memory error.

For minimizing the type of problems illustrated above, it is often recommended to apply the following guidelines on embedded systems:

  • Privilege the use of static allocation vs. dynamic allocation whenever possible.
  • Privilege the use of automatic allocation (stack) when feasible: allocation on the stack is almost free, but in this case, care must be given to stack overflow errors.
  • Use private, application specific memory pools for providing buffers of fixed size to an application (see Mbed OS Memory Pool). This prevents multiple allocation of buffers from the heap. Note that this mechanism is implemented for instance in the Mbed OS Mail API that implements a queuing mechanism for exchanging messages providing a memory pool for allocating the messages.

Detecting a Stack Overflow Error

By using the memory tracing functionalities demonstrated above, we may know which threads are running and the memory space that they are using. This is very useful information for optimizing memory usage for each thread. This is also useful for debugging stack overflow errors.

Stack overflow may happen in very different situations. For understanding how to detect such errors, it is of course easier to simulate one such error. For this purpose, you may add a code allocating more and more memory on the stack in a thread running a loop. An example of such a code is given below:

MemoryStackOverflow class
multi-tasking/memory_stack_overflow.hpp
#pragma once

#include <cstdint>

#include "mbed.h"

namespace multi_tasking {

class MemoryStackOverflow {
   public:
    void allocateOnStack() {
        // allocate an array with growing size until it does not fit on the stack anymore
        size_t allocSize = kArraySize * _multiplier;
        // Create a variable-size object on the stack
        double anotherArray[allocSize];
        for (size_t i = 0; i < allocSize; i++) {
            anotherArray[i] = i;
        }
        // copy to member variable to prevent them from being optimized away
        for (size_t i = 0; i < kArraySize; i++) {
            _doubleArray[i] += anotherArray[i];
        }
        _multiplier++;
    }

   private:
    static constexpr size_t kArraySize = 40;
    double _doubleArray[kArraySize]    = {0};
    size_t _multiplier                 = 1;
};

}  // namespace multi_tasking

If you call the MemoryLogger::printDiffs() method at regular intervals, you will observe that the maximum number of bytes used on the stack of the thread using the MemoryStackOverflow continuously increases. Once the stack overflow happens, you may experience different types of errors, including an application crash or an application running “crazy”. The reason is that the stack gets corrupted and that no stack corruption protection is implemented in the application.

For improving stack corruption check, you may modify the Mbed OS configuration in the “mbed_app.json” file as follows:

mbed_app.json
"macros": [
    ...
    "RTX_STACK_CHECK=1"
],

If you recompile your application and run with RTX_STACK_CHECK=1, then you should get the following error on the console:

Error log
++ MbedOS Error Info ++
Error Status: 0x80020125 Code: 293 Module: 2
Error Message: CMSIS-RTOS error: Stack overflow
Location: 0x8014291
File: mbed_rtx_handlers.c+60
Error Value: 0x1
Current Thread: rtx_idle Id: 0x24003670 Entry: 0x8014411 StackSize: 0x380 StackMem: 0x24003740 SP: 0x2407FF1C 
Next:
rtx_idle  State: 0x2 Entry: 0x08014411 Stack Size: 0x00000380 Mem: 0x24003740 SP: 0x24003A70
Ready:
Wait:
rtx_timer  State: 0x83 Entry: 0x080150E9 Stack Size: 0x00000300 Mem: 0x24003AC0 SP: 0x24003D60
Delay:
main  State: 0x43 Entry: 0x080135E9 Stack Size: 0x00002000 Mem: 0x24000D68 SP: 0x24002410
For more info, visit: https://mbed.com/s/error?error=0x80020125&osver=61700&core=0x411FC271&comp=1&ver=6160001&tgt=DISCO_H747I
-- MbedOS Error Info --

Unfortunately, the log error does not always indicate a stack overflow. There are situations where the RTX stack check mechanism is not able to detect stack corruption, in which case the application ultimately crashes with a generic fault exception.

Exercice Memory Profiling and Optimization/4

Try to figure out how and where the stack overflow detection is implemented in the RTX OS implementation.

Exercice Memory Profiling and Optimization/4

Find and implement another very common way of creating a stack overflow in your BikeComputer program.