Memory Profiling and Optimization

Introduction

Cortex-M Program Image and Memory

In this codelab, we will first explore the concept of a Cortex-M program image. We will then examine different memory profiling techniques for Zephyr RTOS programs, illustrating some classic issues associated with static or dynamic memory usage.

What you’ll build

You will modify your BikeComputer program to improve your understanding of its program image and memory map.
You will instrument your BikeComputer program to perform dynamic memory analysis.
You will modify the BikeComputer program to create memory issues on purpose.

What you’ll learn

You will understand how a Cortex-M program image is made and how it is used for starting your program on a Cortex-M device.
You will understand the boot sequence of a Zephyr RTOS program.
You will understand how a Zephyr RTOS program memory is organized in RAM and how to trace dynamic memory allocations.

What you’ll need

Zephyr Development Environment for developing and debugging your program in C++.
All BikeComputer and the multi-tasking codelabs are prerequisites for this codelab.

The Program Image

A Cortex-M program image or executable file (e.g. the .elf file on your computer) refers to a piece of code that is ready to execute. The image can occupy up to 512 MiB of memory space, ranging from address 0x00000000 to address 0x1FFFFFFF, as shown in Figure 1 for the nRF5340 MCU.

nRF5340 system memory map — Figure 1: nRF5340 MemoryMap

The program image is usually stored in non-volatile memory such as on-chip Flash memory and it is normally separated from the program data, which is allocated in the SRAM or data region of the code memory space.

To build the program image, the toolchain uses the dtsi file that defines the different memory regions/partitions. These partitions are then used and defined in the dts file. These files are self-documenting, so you can easily recognise the ROM and RAM region definitions.

Based on this information, the toolchain produces a program image (CODE region) that corresponds to the map depicted in the figure above.

Analysing the ELF file produced by the linker can help to better understand this program image. Among other tools, the standard GNU binutils and other specific Zephyr RTOS tools are useful:

arm-zephyr-eabi-size prints information about text, data and bss sections such as

console

   text    data     bss     dec     hex filename
   0x2396c   0x1e4 0x34fce  363294   58b1e build\zephyr\zephyr.elf

or more detailed views such as

arm-zephyr-eabi-size output

console

build\zephyr\zephyr.elf  :
section                             size         addr
rom_start                          0x154          0x0
text                             0x1479c        0x154
.ARM.extab                         0x134      0x148f0
.ARM.exidx                         0x250      0x14a24
initlevel                           0x78      0x14c74
device_area                        0x120      0x14cec
sw_isr_table                       0x228      0x14e0c
gpio_driver_api_area                0x24      0x15034
i2c_driver_api_area                 0x18      0x15058
sensor_driver_api_area              0x1c      0x15070
spi_driver_api_area                  0x8      0x1508c
clock_control_driver_api_area       0x1c      0x15094
display_driver_api_area             0x2c      0x150b0
mipi_dbi_driver_api_area            0x18      0x150dc
uart_driver_api_area                 0xc      0x150f4
init_array                          0x14      0x15100
log_const_area                      0xa8      0x15114
log_backend_area                    0x10      0x151bc
tbss                                 0x8      0x151cc
rodata                            0xe7b4      0x151d0
.ramfunc                             0x0   0x20000000
datas                              0x15c   0x20000000
device_states                       0x14   0x2000015c
log_mpsc_pbuf_area                  0x40   0x20000170
log_msg_ptr_area                     0x4   0x200001b0
k_heap_area                         0x18   0x200001b4
.comment                            0x20          0x0
.debug_aranges                    0x3a20          0x0
.debug_info                     0x1008c6          0x0
.debug_abbrev                    0x1eed9          0x0
.debug_line                      0x4836a          0x0
.debug_frame                      0xe2e0          0x0
.debug_str                       0x5ab14          0x0
.debug_loc                       0x54c2d          0x0
.debug_ranges                     0x5e28          0x0
.ARM.attributes                     0x38          0x0
.last_section                        0x4      0x23b50
bss                                0xe86   0x200001d0
noinit                           0x34140   0x20001058
Total                           0x2878e8

arm-zephyr-eabi-readelf provides a more detailed view of the different memory sections:

arm-zephyr-eabi-readelf output

console

There are 43 section headers, starting at offset 0x27b454:

Section Headers:
[Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[ 0]                   NULL            00000000 000000 000000 00      0   0  0
[ 1] rom_start         PROGBITS        00000000 000100 000154 00  AX  0   0  4
[ 2] text              PROGBITS        00000154 000254 01479c 00  AX  0   0  4
[ 3] .ARM.extab        PROGBITS        000148f0 0149f0 000134 00   A  0   0  4
[ 4] .ARM.exidx        ARM_EXIDX       00014a24 014b24 000250 00  AL  2   0  4
[ 5] initlevel         PROGBITS        00014c74 014d74 000078 00   A  0   0  4
[ 6] device_area       PROGBITS        00014cec 014dec 000120 00   A  0   0  4
[ 7] sw_isr_table      PROGBITS        00014e0c 014f0c 000228 00   A  0   0  4
[ 8] gpio_driver_[...] PROGBITS        00015034 015134 000024 00   A  0   0  4
[ 9] i2c_driver_a[...] PROGBITS        00015058 015158 000018 00   A  0   0  4
[10] sensor_drive[...] PROGBITS        00015070 015170 00001c 00   A  0   0  4
[11] spi_driver_a[...] PROGBITS        0001508c 01518c 000008 00   A  0   0  4
[12] clock_contro[...] PROGBITS        00015094 015194 00001c 00   A  0   0  4
[13] display_driv[...] PROGBITS        000150b0 0151b0 00002c 00   A  0   0  4
[14] mipi_dbi_dri[...] PROGBITS        000150dc 0151dc 000018 00   A  0   0  4
[15] uart_driver_[...] PROGBITS        000150f4 0151f4 00000c 00   A  0   0  4
[16] init_array        INIT_ARRAY      00015100 015200 000014 04  WA  0   0  4
[17] log_const_area    PROGBITS        00015114 015214 0000a8 00   A  0   0  4
[18] log_backend_area  PROGBITS        000151bc 0152bc 000010 00   A  0   0  4
[19] tbss              NOBITS          000151cc 0152cc 000008 00 WAT  0   0  4
[20] rodata            PROGBITS        000151d0 0152d0 00e7b4 00   A  0   0 16
[21] .ramfunc          PROGBITS        20000000 023c54 000000 00   W  0   0  1
[22] datas             PROGBITS        20000000 023a84 00015c 00  WA  0   0  4
[23] device_states     PROGBITS        2000015c 023be0 000014 00  WA  0   0  1
[24] log_mpsc_pbu[...] PROGBITS        20000170 023bf4 000040 00  WA  0   0  4
[25] log_msg_ptr_area  PROGBITS        200001b0 023c34 000004 00  WA  0   0  4
[26] k_heap_area       PROGBITS        200001b4 023c38 000018 00  WA  0   0  4
[27] .comment          PROGBITS        00000000 023c54 000020 01  MS  0   0  1
[28] .debug_aranges    PROGBITS        00000000 023c78 003a20 00      0   0  8
[29] .debug_info       PROGBITS        00000000 027698 1008c6 00      0   0  1
[30] .debug_abbrev     PROGBITS        00000000 127f5e 01eed9 00      0   0  1
[31] .debug_line       PROGBITS        00000000 146e37 04836a 00      0   0  1
[32] .debug_frame      PROGBITS        00000000 18f1a4 00e2e0 00      0   0  4
[33] .debug_str        PROGBITS        00000000 19d484 05ab14 01  MS  0   0  1
[34] .debug_loc        PROGBITS        00000000 1f7f98 054c2d 00      0   0  1
[35] .debug_ranges     PROGBITS        00000000 24cbc8 005e28 00      0   0  8
[36] .ARM.attributes   ARM_ATTRIBUTES  00000000 2529f0 000038 00      0   0  1
[37] .last_section     PROGBITS        00023b50 023c50 000004 00  WA  0   0  4
[38] bss               NOBITS          200001d0 023c58 000e86 00  WA  0   0  8
[39] noinit            NOBITS          20001058 023c58 034140 00  WA  0   0  8
[40] .symtab           SYMTAB          00000000 252a28 013e50 10     41 3216  4
[41] .strtab           STRTAB          00000000 266878 0149a9 00      0   0  1
[42] .shstrtab         STRTAB          00000000 27b221 000233 00      0   0  1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), y (purecode), p (processor specific)

Use west to generate a ROM and a RAM report using west build -t rom_report > rom_report.txt or west build -t ram_report > ram_report after a build.
Use the full memory map file generated by the linker (“build/zephyr/zephyr.map”)

The Boot Sequence and Memory Initialization

Understand the Early Boot Sequence

Upon reset, a startup code is executed by the Cortex-M processor. The startup code is specific to each platform and toolchain, but it usually consists of

setting the initial SP,
setting the initial PC to the reset handler function,
setting the vector table entries with the exceptions ISR addresses, and
branching to initialization of the C library, which eventually switches to the main thread that executes the main() function of your program.

When we use the command arm-zephyr-eabi-objdump.exe -D -G on the elf file, we get an output similar to the following:

Elf disassembled

   Disassembly of section rom_start:
   00000000 <_vector_table>:
     0:   20034d98        mulcs   r3, r8, sp
     4:   00005b45        andeq   r5, r0, r5, asr #22

From this output and from the output of the other tools described above, we can observe that:

The code region begins with the vector table, with the first entry corresponding to the main stack pointer. Note that objdump has attempted to disassemble the stack pointer and the program counter (PC) as instructions, as it cannot distinguish between code and data. When starting up, the ARM processor fetches whatever is at the address 0x0000_0000 and assumes that it is the stack pointer value. In our case, this corresponds to the address 0x20034d98, which is defined as the pointer to the stack end (recall that the stack grows downwards). As can be seen in the dump file, this corresponds to the system workqueue stack of Zephyr RTOS.

Elf disassembled

   ...
   20034d98 <sys_work_q_stack>:

The next entry in the vector table specifies the jump location for starting the program upon reset. This corresponds to the start of the system’s early boot sequence. Looking at the code reveals that the SystemInit function is called first, followed by other functions such as z_early_memset and z_prep_c. All of these functions are defined in the Zephyr RTOS library.

Kernel Initialization

The bootup sequence is illustrated in the diagram below:

It is important to note the following points:

The early bootup sequence starts with the reset handler. This initialization phase makes the system ready for kernel initialization.
The configuration levels (PRE_KERNEL1, PRE_KERNEL2, and POST_KERNEL) allow to specify in which order initialization code is executed. What is executed in each phase depends on the platform itself and on the application configuration (as described in the prj.conf file).
The kernel initialization makes the kernel ready for the application. Static devices and board-level subsystems (e.g. clock, basic peripherals, console/UART/RTT) are initialized during this phase.
After kernel initialization, the scheduler and essential threads (e.g. the main and idle threads) are started; after that Zephyr prints its boot banner (if enabled).
After kernel+post-kernel setup, board- and application-specific modules, drivers and threads (as defined in “prj.conf” and DTS overlays) are initialized and started.

Static Memory Analysis Using memap

As explained in the previous section, Zephyr RTOS provides a number of utility tools that help you to understand how the program image is structured and how memory space is used by any Zephyr RTOS application. Running the command west build -t rom_report > rom_report.txt will cause the file “rom_report.txt” to contain detailed information on how the program image is structured. Searching for “bike_system.cpp” or other application-specific files will produce an output similar to the following:

rom_report output

│   │       ├── main.cpp                            172   0.12%  - 
│   │       │   ├── log_const_bike_computer           8   0.01%  0x00015134 log_const_area
│   │       │   └── main                            164   0.11%  0x000022f5 text
│   │       ├── multi_tasking                      2468   1.69%  - 
│   │       │   ├── bike_system.cpp                1842   1.26%  -

In this report, text refers to code application, rodata refers to read-only constants and datas refers to non-zero initialization data. All ot these are stored in Flash/ROM. datas are values stored in ROM that will serve to initialize variables stored in RAM.

The RAM report display how RAM is allocated:

ram_report output

│   ├── _ZGVZN13bike_computer9getFont16EvE7pFont16    4   0.00%  0x20000a20 bss
│   ├── _ZGVZN13bike_computer9getFont18EvE7pFont1     4   0.00%  0x20000a28 bss

Searching for “bike_computer” in this file shows that few global variables are allocated by the application itself in the BikeComputer program. Reducing the use of global variables is usually good practice.

Understand What Memory Goes Where

To better understand what memory goes where, add the following code in your “main.cpp” file

main_modified.cpp

...
const char szMsg[] = "This is a test message";
static constexpr uint8_t size = 10;
uint32_t randomArray[size] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint32_t randomNumber = 0;
...

int main() {
  ...

  tr_info(szMsg);
  for (uint8_t i = 0; i < size; i++) {
    randomArray[i] = rand();
    tr_info("This is a random number %d", randomArray[i]);
  }
  randomNumber = rand();
  tr_info("This is a random number %d", randomNumber);

  ...
}

Exercice Memory Profiling and Optimization/1

Observe the change in the rom and ram reports for each individual change documented above. Look at how each text, data and bss section is modified for each change and give an explanation.

**Reducing Memory Usage by Tuning the Zephyr RTOS Configuration**

Both flash memory and RAM sizes are limited on most microcontrollers. Reducing the memory footprint of an application can help you squeeze in more features or reduce cost. This can be done by applying a number of application configuration, as documented in the Zephyr RTOS documentation. One simple way of reducing the RAM/ROM footprint is to optimize a number of runtime parameters such as the stack size or the use of GPIOs. It is also possible to optimize the footprint by configuring the kernel so that only the required kernel features are used.

The use of the C library with the minimal footprint size is also another optimization parameter.

Exercice Memory Profiling and Optimization/2

Compile your application by toggling the following configuration parameters and compare both ROM and RAM footprints:

CONFIG_SIZE_OPTIMIZATIONS vs CONFIG_DEBUG_OPTIMIZATIONS
Disable/enable all logging configuration parameters, including CONFIG_BOOT_BANNER.
Disable/enable FP support in printf with the CONFIG_CBPRINTF_FP_SUPPORT option.
Disable/enable nano implementation of the printf functions, with the CONFIG_CBPRINTF_NANO option.

Runtime Memory Tracing

Static memory analysis is required and powerful for analyzing how the program memory is organized at compile time. However, it is also very useful to analyze how an embedded software deals with dynamic memory allocations, both for the heap and stack memory. A program that behaves poorly in terms of dynamic memory allocations will become unstable and will potentially crash.

With Zephyr RTOS, the developer can use memory statistics functions to capture heap use, cumulative stack use or stack use for each thread at runtime. To enable memory use monitoring, you must enable the following Zephyr RTOS configuration options:

bike_computer/prj.conf

CONFIG_THREAD_STACK_INFO=y
CONFIG_THREAD_ANALYZER=y
CONFIG_SYS_HEAP_RUNTIME_STATS=y

Once you enable memory statistics, you may instrument the code and do memory checks at regular intervals or upon requests. This can be implemented with the help of the zpp_lib::Utils class.

By performing a detailed dynamic memory analysis, it is then possible to optimize some parameters, such as reducing the allocated stack size for a given thread or optimizing the use of the heap.

Configuring the use of the heap on Zephyr RTOS

Zephyr RTOS supports different heaps and depending on the application configuration, calls to malloc and new may use different heaps. In the zpp_lib::Utils implementation, the implementation of the new and delete operators is overriden when CONFIG_SYS_HEAP_RUNTIME_STATS is enabled. This ensures that k_malloc and k_free are used and enables heap statistics. This mechanism should, of course, not be used when heap statistics are not needed.

Exercice Memory Profiling and Optimization/3

Instrument the dynamic memory usage of your BikeComputer program with the use of the zpp_lib::Utils class. Use both the zpp_lib::Utils::logThreadsStackInfo and zpp_lib::Utils::logHeapSummary method at regular intervals. After startup, you should observe that your program does not allocate any memory on the heap and that the stack use is also not growing anymore.

By observing the statistics on the console, you should be able to optimize your application configuration and to reduce its RAM usage.

Hunting For Memory Bugs

Detecting a Heap Allocation Error (Memory Leak)

For illustrating analysis of the heap memory, one practical example is the introduction of a memory leak in the code. A memory leak is created when memory allocations are managed in such a way that memory which is NO longer needed is NOT released. For this purpose, you may add a call for allocating memory and not releasing it in a method called at regular intervals. Be aware that allocating memory without using it is not enough, since the compiler will optimize your code and remove unused statements (like allocating an array and only assigning values to the array elements).

If you create a memory leak by creating an instance of the class MemoryLeak as demonstrated in the main() function and let your program run, you should observe that the allocated memory on the heap grows constantly and ultimately you should observe a crash as illustrated in the log below. Note that CONFIG_ASSERT must be enabled to build the application.

Main program for the memory demo

memory/src/main.cpp

// zephyr
#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>

// zpp_lib
#include "zpp_include/this_thread.hpp"
#include "zpp_include/utils.hpp"
#include "zpp_include/interrupt_in.hpp"

// local
#include "memory_leak.hpp"

LOG_MODULE_REGISTER(memory_demo, CONFIG_APP_LOG_LEVEL);

int main(void) {
  using namespace std::literals;

  LOG_DBG("Memory demo program started");

  // check which button is pressed
  zpp_lib::InterruptIn<zpp_lib::PinName::BUTTON1> button1;
  zpp_lib::InterruptIn<zpp_lib::PinName::BUTTON2> button2;
  zpp_lib::InterruptIn<zpp_lib::PinName::BUTTON3> button3;
  if (button1.read() == zpp_lib::kPolarityPressed) {
    LOG_DBG("Starting MemoryLeak demo");

    static constexpr uint8_t kNbrOfIterations = 10;
    for (int i = 0; i < kNbrOfIterations; i++) {
      memory_demo::MemoryLeak memoryLeak;
      memoryLeak.use();

      zpp_lib::Utils::logHeapSummary();

      zpp_lib::ThisThread::sleep_for(1s);
    }    
  } else if (button2.read() == zpp_lib::kPolarityPressed) {
    memory_demo::MemoryFragmenter memoryFragmenter;
    memoryFragmenter.fragmentMemory();
  }

  return 0;
}

MemoryLeak class

memory/src/memory_leak.hpp

// Copyright 2025 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

/****************************************************************************
 * @file memory_leak.hpp
 * @author Serge Ayer <serge.ayer@hefr.ch>
 *
 * @brief Declaration/Implementation of the MemoryLeak class
 *
 * @date 2025-07-01
 * @version 1.0.0
 ***************************************************************************/

#pragma once

// zephyr
#include <zephyr/kernel.h>

namespace memory_demo {

class MemoryLeak {
 public:
  static constexpr uint16_t kArraySize = 1024;

  // create a memory leak in the constructor itself
  MemoryLeak() {
    _ptr = new uint8_t[kArraySize];
    __ASSERT(_ptr != nullptr, "Cannot allocate memory");
  }
  ~MemoryLeak() {}

  MemoryLeak(const MemoryLeak&)            = delete;
  MemoryLeak& operator=(const MemoryLeak&) = delete;

  void use() {
    for (uint16_t i = 0; i < kArraySize; i++) {
      _ptr[i] = i;
    }
  }

 private:
  uint8_t* _ptr;
};

}  // namespace memory_demo

Console

ASSERTION FAIL [_ptr != nullptr] @ WEST_TOPDIR/memory/src/memory_leak.hpp:16
        Cannot allocate memory
[00:00:15.667,297] <err> os: r0/a1:  0x00000004  r1/a2:  0x00000010  r2/a3:  0x00000000
[00:00:15.675,994] <err> os: r3/a4:  0x00000004 r12/ip:  0x00000004 r14/lr:  0x00000465
[00:00:15.684,661] <err> os:  xpsr:  0x09000000
[00:00:15.689,880] <err> os: Faulting instruction address (r15/pc): 0x0000aa92
[00:00:15.697,784] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:00:15.705,505] <err> os: Current thread: 0x20000240 (main)
[00:00:15.712,005] <err> os: Halting system

From the error log above, we can observe that the system cannot allocate a specific object from the operator new() called in the constructor of the MemoryLeak class.

Heap Fragmentation

A problem that is even more complex to detect is the problem of heap fragmentation. Heap fragmentation is a phenomenon that creates small fragments of memory in the heap space in a way that makes the largest available block of memory smaller and smaller as compared to the total available memory. The fragmentation level can be computed as a ratio between the largest available block of memory and the total available memory:

\(fragmentation = 1 - \frac{largest\ available\ block}{total\ available\ memory}\)

If the fragmentation is \(50\%\) and the available memory is 1 KiB, then the largest available block is 512 bytes. Fragmentation tends to increase over the lifetime of a program and on embedded systems running C++ programs, there is no way of defragmenting the heap. Over time, heap fragmentation tends to

create unreliable programs: if your program needs a bigger block than the largest available one, it will not get it and will stop working
and to degrade program performance: a highly fragmented heap is slower because the memory allocator takes more time to deliver a new allocated block.

These are very good reasons for using heap memory with care on embedded systems.

For illustrating the heap fragmentation phenomenon, you may create use the following MemoryFragmenter class in your BikeComputer program:

MemoryFragmenter class

memory/src/memory_fragmenter.hpp

// Copyright 2025 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

/****************************************************************************
 * @file memory_fragmenter.hpp
 * @author Serge Ayer <serge.ayer@hefr.ch>
 *
 * @brief Declaration/Implementation of the MemoryFragmenter class
 *
 * @date 2025-07-01
 * @version 1.0.0
 ***************************************************************************/

#pragma once

// zephyr
#include <zephyr/kernel.h>

// zpp_lib
#include "zpp_include/utils.hpp"

extern "C" {
// Zephyr defines this symbol globally
// To access it you need to define CONFIG_HEAP_MEM_POOL_SIZE=...
extern struct sys_heap _system_heap;
}

namespace memory_demo {

class MemoryFragmenter {
 public:
  // create a memory leak in the constructor itself
  MemoryFragmenter() {}

  void fragmentMemory() {
    // log heap info
    zpp_lib::Utils::logHeapSummary();

    // get heap available size
    struct sys_memory_stats stats;
    sys_heap_runtime_stats_get(&_system_heap, &stats);

    // divide the available size by 8 blocks that we allocate
    uint32_t blockSize = (stats.free_bytes - kMarginSpace) / kNbrOfBlocks;
    printk("Allocating blocks of size %" PRIu32 "\n", blockSize);
    char* pBlockArray[kNbrOfBlocks] = {NULL};
    for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex++) {
      pBlockArray[blockIndex] = new char[blockSize];
      __ASSERT(pBlockArray[blockIndex] != nullptr,
               "Allocation of block %d of size %d failed",
               blockIndex,
               blockSize);
      printk("Allocated block index  %" PRIu32 " of size  %" PRIu32
             " at address 0x%08" PRIx32 "\n",
             blockIndex,
             blockSize,
             static_cast<uint32_t>(*pBlockArray[blockIndex]));
      // copy to member variable to prevent them from being optimized away
      for (uint32_t index = 0; index < kArraySize; index++) {
        _doubleArray[index] += static_cast<double>(pBlockArray[blockIndex][index]);
      }
    }

    // the full heap (or almost) should be allocated
    printk("Heap statistics after full allocation:\n");
    zpp_lib::Utils::logHeapSummary();

    // delete only the even blocks
    for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex += 2) {
      delete[] pBlockArray[blockIndex];
      pBlockArray[blockIndex] = NULL;
    }
    // we should have half of the heap space free
    printk("Heap statistics after half deallocation:\n");
    zpp_lib::Utils::logHeapSummary();

    // trying to allocate one block of initial size
    // it will succeed
    printk("Allocating 1 block of size %" PRIu32 " succeeds !\n", blockSize);
    pBlockArray[0] = new char[blockSize];
    __ASSERT(
        pBlockArray[0] != nullptr, "Allocation of block of size %d failed", blockSize);

    printk("Heap statistics after allocating one more block of size %d:\n", blockSize);
    zpp_lib::Utils::logHeapSummary();
    // trying to allocated one block that is slightly bigger
    // without fragmentation, this allocation should succeed
    // but it will fail...
    blockSize += 8;
    // this allocation will fail
    printk("Allocating 1 block of size %" PRIu32 " should succeed !\n", blockSize);
    pBlockArray[1] = new char[blockSize];
    __ASSERT(
        pBlockArray[1] != nullptr, "Allocation of block of size %d failed", blockSize);

    // copy to member variable to prevent them from being optimized away
    for (uint32_t index = 0; index < kArraySize; index++) {
      _doubleArray[index] +=
          static_cast<double>(pBlockArray[0][index] + pBlockArray[1][index]);
    }
  }

 private:
  static constexpr uint8_t kNbrOfBlocks  = 8;
  static constexpr uint16_t kMarginSpace = 1024;
  static constexpr uint8_t kArraySize    = 100;
  double _doubleArray[kArraySize]        = {0};
};

}  // namespace memory_demo

This class is used in the main() function given above. If you run the program by pressing Button 2, you will see an error on the console similar to the one shown below:

Console

Allocating 1 block of size 1910 succeeds !
Heap statistics after allocating one more block of size 1910:
[00:00:00.417,236] <inf> zpp_rtos: === Heap Summary ===
[00:00:00.422,882] <inf> zpp_rtos:      Allocated: 9600 bytes
[00:00:00.428,710] <inf> zpp_rtos:      Free:      6704 bytes
[00:00:00.434,509] <inf> zpp_rtos:      Max Alloc: 15360 bytes

Allocating 1 block of size 1918 should succeed !
ASSERTION FAIL [pBlockArray[1] != nullptr] @ WEST_TOPDIR/memory/src/memory_fragmenter.hpp:99
        Allocation of block of size 1918 failed
[00:00:00.456,665] <err> os: r0/a1:  0x00000004  r1/a2:  0x00000063  r2/a3:  0x00000000
[00:00:00.465,332] <err> os: r3/a4:  0x00000004 r12/ip:  0x00000000 r14/lr:  0x0000094d
[00:00:00.473,999] <err> os:  xpsr:  0x09000000
[00:00:00.479,217] <err> os: Faulting instruction address (r15/pc): 0x0000b002
[00:00:00.487,121] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:00:00.494,842] <err> os: Current thread: 0x20000240 (main)
[00:00:00.501,342] <err> os: Halting system

As you can see in the log file, although the available heap size is 6704 bytes, an allocation of 1918 bytes fails.

To minimize the type of problems illustrated above, it is often recommended to apply the following guidelines on embedded systems:

Privilege the use of static allocation vs. dynamic allocation whenever possible.
Privilege the use of automatic allocation (stack) when feasible: allocation on the stack is almost free, but in this case, care must be given to stack overflow errors.
Use memory pools to provide buffers of fixed size to an application (see Zephyr RTOS Memory Slabs) for instance. This prevents fragmentation.

Detecting a Stack Overflow Error

By using the memory tracing functionalities demonstrated above, we may know which threads are running and the memory space that they are using. This is very useful information for optimizing memory usage for each thread. This is also useful for debugging stack overflow errors.

Stack overflow can occur in a variety of situations. To understand how to detect such errors, it is easier to simulate one. To this end, you can write code that allocates more and more memory to the stack in a thread that runs a loop. An example of such code is given below:

MemoryStackOverflow class

memory/src/stack_overflow.hpp

// Copyright 2025 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

/****************************************************************************
 * @file stack_overflow.hpp
 * @author Serge Ayer <serge.ayer@hefr.ch>
 *
 * @brief Declaration/Implementation of the StackOverflow class
 *
 * @date 2025-07-01
 * @version 1.0.0
 ***************************************************************************/

#pragma once

// zephyr
#include <zephyr/kernel.h>

// std
#include <cstdint>

namespace memory_demo {

class StackOverflow {
 public:
  void allocateOnStack() {
    // allocate an array with growing size until it does not fit on the stack anymore
    size_t allocSize = kArraySize * _multiplier;
    // Create a variable-size object on the stack
    double anotherArray[allocSize];  // NOLINT(runtime/arrays)
    for (size_t i = 0; i < allocSize; i++) {
      anotherArray[i] = i;
    }
    // copy to member variable to prevent them from being optimized away
    for (size_t i = 0; i < kArraySize; i++) {
      _doubleArray[i] += anotherArray[i];
    }
    _multiplier++;
  }

 private:
  static constexpr size_t kArraySize = 40;
  double _doubleArray[kArraySize]    = {0};
  size_t _multiplier                 = 1;
};

}  // namespace memory_demo

In the main() function, allocateOnStack() is called in a loop and heap information is displayed at each iteration. In the console, you see that the maximum number of bytes used on the stack of the main thread continuously increases. Once the stack overflow happens, you will a stack overflow error.

Exercice Memory Profiling and Optimization/4

With the standard configuration using a Cortex-M CPU with MPU, hardware-based protection against stack overflows is enabled (when the system is running in privileged mode). Try to disable hardware protection by adding CONFIG_HW_STACK_PROTECTION=n in the prj.conf file and observe what happens when running the same program.

Exercice Memory Profiling and Optimization/5

Find and implement another very common way of creating a stack overflow in any program.