Memory Profiling and Optimization
Introduction
Cortex-M Program Image and Memory
In this codelab, we will first explore the concept of a Cortex-M program image. We will then examine different memory profiling techniques for Zephyr RTOS programs, illustrating some classic issues associated with static or dynamic memory usage.
What you’ll build
- You will modify your
BikeComputerprogram to improve your understanding of its program image and memory map. - You will instrument your
BikeComputerprogram to perform dynamic memory analysis. - You will modify the
BikeComputerprogram to create memory issues on purpose.
What you’ll learn
- You will understand how a Cortex-M program image is made and how it is used for starting your program on a Cortex-M device.
- You will understand the boot sequence of a Zephyr RTOS program.
- You will understand how a Zephyr RTOS program memory is organized in RAM and how to trace dynamic memory allocations.
What you’ll need
- Zephyr Development Environment for developing and debugging your program in C++.
- All
BikeComputerand the multi-tasking codelabs are prerequisites for this codelab.
The Program Image
A Cortex-M program image or executable file (e.g. the .elf file on your computer) refers to a piece of code that is ready to execute. The image can occupy up to 512 MiB of memory space, ranging from address 0x00000000 to address 0x1FFFFFFF, as shown in Figure 1 for the nRF5340 MCU.
The program image is usually stored in non-volatile memory such as on-chip Flash memory and it is normally separated from the program data, which is allocated in the SRAM or data region of the code memory space.
To build the program image, the toolchain uses the dtsi file that defines the different memory regions/partitions. These partitions are then used and defined in the dts file. These files are self-documenting, so you can easily recognise the ROM and RAM region definitions.
Based on this information, the toolchain produces a program image (CODE region) that corresponds to the map depicted in the figure above.
Analysing the ELF file produced by the linker can help to better understand this program image. Among other tools, the standard GNU binutils and other specific Zephyr RTOS tools are useful:
arm-zephyr-eabi-sizeprints information abouttext,dataandbsssections such asconsoletext data bss dec hex filename 0x2396c 0x1e4 0x34fce 363294 58b1e build\zephyr\zephyr.elf
or more detailed views such as
arm-zephyr-eabi-size output
build\zephyr\zephyr.elf :
section size addr
rom_start 0x154 0x0
text 0x1479c 0x154
.ARM.extab 0x134 0x148f0
.ARM.exidx 0x250 0x14a24
initlevel 0x78 0x14c74
device_area 0x120 0x14cec
sw_isr_table 0x228 0x14e0c
gpio_driver_api_area 0x24 0x15034
i2c_driver_api_area 0x18 0x15058
sensor_driver_api_area 0x1c 0x15070
spi_driver_api_area 0x8 0x1508c
clock_control_driver_api_area 0x1c 0x15094
display_driver_api_area 0x2c 0x150b0
mipi_dbi_driver_api_area 0x18 0x150dc
uart_driver_api_area 0xc 0x150f4
init_array 0x14 0x15100
log_const_area 0xa8 0x15114
log_backend_area 0x10 0x151bc
tbss 0x8 0x151cc
rodata 0xe7b4 0x151d0
.ramfunc 0x0 0x20000000
datas 0x15c 0x20000000
device_states 0x14 0x2000015c
log_mpsc_pbuf_area 0x40 0x20000170
log_msg_ptr_area 0x4 0x200001b0
k_heap_area 0x18 0x200001b4
.comment 0x20 0x0
.debug_aranges 0x3a20 0x0
.debug_info 0x1008c6 0x0
.debug_abbrev 0x1eed9 0x0
.debug_line 0x4836a 0x0
.debug_frame 0xe2e0 0x0
.debug_str 0x5ab14 0x0
.debug_loc 0x54c2d 0x0
.debug_ranges 0x5e28 0x0
.ARM.attributes 0x38 0x0
.last_section 0x4 0x23b50
bss 0xe86 0x200001d0
noinit 0x34140 0x20001058
Total 0x2878e8
arm-zephyr-eabi-readelfprovides a more detailed view of the different memory sections:
arm-zephyr-eabi-readelf output
There are 43 section headers, starting at offset 0x27b454:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] rom_start PROGBITS 00000000 000100 000154 00 AX 0 0 4
[ 2] text PROGBITS 00000154 000254 01479c 00 AX 0 0 4
[ 3] .ARM.extab PROGBITS 000148f0 0149f0 000134 00 A 0 0 4
[ 4] .ARM.exidx ARM_EXIDX 00014a24 014b24 000250 00 AL 2 0 4
[ 5] initlevel PROGBITS 00014c74 014d74 000078 00 A 0 0 4
[ 6] device_area PROGBITS 00014cec 014dec 000120 00 A 0 0 4
[ 7] sw_isr_table PROGBITS 00014e0c 014f0c 000228 00 A 0 0 4
[ 8] gpio_driver_[...] PROGBITS 00015034 015134 000024 00 A 0 0 4
[ 9] i2c_driver_a[...] PROGBITS 00015058 015158 000018 00 A 0 0 4
[10] sensor_drive[...] PROGBITS 00015070 015170 00001c 00 A 0 0 4
[11] spi_driver_a[...] PROGBITS 0001508c 01518c 000008 00 A 0 0 4
[12] clock_contro[...] PROGBITS 00015094 015194 00001c 00 A 0 0 4
[13] display_driv[...] PROGBITS 000150b0 0151b0 00002c 00 A 0 0 4
[14] mipi_dbi_dri[...] PROGBITS 000150dc 0151dc 000018 00 A 0 0 4
[15] uart_driver_[...] PROGBITS 000150f4 0151f4 00000c 00 A 0 0 4
[16] init_array INIT_ARRAY 00015100 015200 000014 04 WA 0 0 4
[17] log_const_area PROGBITS 00015114 015214 0000a8 00 A 0 0 4
[18] log_backend_area PROGBITS 000151bc 0152bc 000010 00 A 0 0 4
[19] tbss NOBITS 000151cc 0152cc 000008 00 WAT 0 0 4
[20] rodata PROGBITS 000151d0 0152d0 00e7b4 00 A 0 0 16
[21] .ramfunc PROGBITS 20000000 023c54 000000 00 W 0 0 1
[22] datas PROGBITS 20000000 023a84 00015c 00 WA 0 0 4
[23] device_states PROGBITS 2000015c 023be0 000014 00 WA 0 0 1
[24] log_mpsc_pbu[...] PROGBITS 20000170 023bf4 000040 00 WA 0 0 4
[25] log_msg_ptr_area PROGBITS 200001b0 023c34 000004 00 WA 0 0 4
[26] k_heap_area PROGBITS 200001b4 023c38 000018 00 WA 0 0 4
[27] .comment PROGBITS 00000000 023c54 000020 01 MS 0 0 1
[28] .debug_aranges PROGBITS 00000000 023c78 003a20 00 0 0 8
[29] .debug_info PROGBITS 00000000 027698 1008c6 00 0 0 1
[30] .debug_abbrev PROGBITS 00000000 127f5e 01eed9 00 0 0 1
[31] .debug_line PROGBITS 00000000 146e37 04836a 00 0 0 1
[32] .debug_frame PROGBITS 00000000 18f1a4 00e2e0 00 0 0 4
[33] .debug_str PROGBITS 00000000 19d484 05ab14 01 MS 0 0 1
[34] .debug_loc PROGBITS 00000000 1f7f98 054c2d 00 0 0 1
[35] .debug_ranges PROGBITS 00000000 24cbc8 005e28 00 0 0 8
[36] .ARM.attributes ARM_ATTRIBUTES 00000000 2529f0 000038 00 0 0 1
[37] .last_section PROGBITS 00023b50 023c50 000004 00 WA 0 0 4
[38] bss NOBITS 200001d0 023c58 000e86 00 WA 0 0 8
[39] noinit NOBITS 20001058 023c58 034140 00 WA 0 0 8
[40] .symtab SYMTAB 00000000 252a28 013e50 10 41 3216 4
[41] .strtab STRTAB 00000000 266878 0149a9 00 0 0 1
[42] .shstrtab STRTAB 00000000 27b221 000233 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), y (purecode), p (processor specific)
-
Use west to generate a ROM and a RAM report using
west build -t rom_report > rom_report.txtorwest build -t ram_report > ram_reportafter a build. -
Use the full memory map file generated by the linker (“build/zephyr/zephyr.map”)
The Boot Sequence and Memory Initialization
Understand the Early Boot Sequence
Upon reset, a startup code is executed by the Cortex-M processor. The startup code is specific to each platform and toolchain, but it usually consists of
- setting the initial SP,
- setting the initial PC to the reset handler function,
- setting the vector table entries with the exceptions ISR addresses, and
- branching to initialization of the C library, which eventually switches to the main thread that executes the
main()function of your program.
When we use the command arm-zephyr-eabi-objdump.exe -D -G on the elf file, we
get an output similar to the following:
Disassembly of section rom_start:
00000000 <_vector_table>:
0: 20034d98 mulcs r3, r8, sp
4: 00005b45 andeq r5, r0, r5, asr #22
From this output and from the output of the other tools described above, we can observe that:
- The code region begins with the vector table, with the first entry
corresponding to the main stack pointer. Note that
objdumphas attempted to disassemble the stack pointer and the program counter (PC) as instructions, as it cannot distinguish between code and data. When starting up, the ARM processor fetches whatever is at the address0x0000_0000and assumes that it is the stack pointer value. In our case, this corresponds to the address 0x20034d98, which is defined as the pointer to the stack end (recall that the stack grows downwards). As can be seen in the dump file, this corresponds to the system workqueue stack of Zephyr RTOS.
...
20034d98 <sys_work_q_stack>:
- The next entry in the vector table specifies the jump location for starting the program upon reset. This corresponds to the start of the system’s early boot sequence. Looking at the code reveals that the
SystemInitfunction is called first, followed by other functions such asz_early_memsetandz_prep_c. All of these functions are defined in the Zephyr RTOS library.
Kernel Initialization
The bootup sequence is illustrated in the diagram below:
It is important to note the following points:
- The early bootup sequence starts with the reset handler. This initialization phase makes the system ready for kernel initialization.
- The configuration levels (PRE_KERNEL1, PRE_KERNEL2, and POST_KERNEL) allow to specify in which order initialization code is executed. What is executed in each phase depends on the platform itself and on the application configuration (as described in the prj.conf file).
- The kernel initialization makes the kernel ready for the application. Static devices and board-level subsystems (e.g. clock, basic peripherals, console/UART/RTT) are initialized during this phase.
- After kernel initialization, the scheduler and essential threads (e.g. the main and idle threads) are started; after that Zephyr prints its boot banner (if enabled).
- After kernel+post-kernel setup, board- and application-specific modules, drivers and threads (as defined in “prj.conf” and DTS overlays) are initialized and started.
Static Memory Analysis Using memap
As explained in the previous section, Zephyr RTOS provides a number of utility
tools that help you to understand how the program image is structured and how
memory space is used by any Zephyr RTOS application. Running the command west
build -t rom_report > rom_report.txt will cause the file “rom_report.txt” to
contain detailed information on how the program image is structured. Searching
for “bike_system.cpp” or other application-specific files will produce an output
similar to the following:
rom_report output
│ │ ├── main.cpp 172 0.12% -
│ │ │ ├── log_const_bike_computer 8 0.01% 0x00015134 log_const_area
│ │ │ └── main 164 0.11% 0x000022f5 text
│ │ ├── multi_tasking 2468 1.69% -
│ │ │ ├── bike_system.cpp 1842 1.26% -
In this report, text refers to code application, rodata refers to read-only
constants and datas refers to non-zero initialization data. All ot these are stored in
Flash/ROM. datas are values stored in ROM that will serve to initialize
variables stored in RAM.
The RAM report display how RAM is allocated:
ram_report output
│ ├── _ZGVZN13bike_computer9getFont16EvE7pFont16 4 0.00% 0x20000a20 bss
│ ├── _ZGVZN13bike_computer9getFont18EvE7pFont1 4 0.00% 0x20000a28 bss
Searching for “bike_computer” in this file shows that few global variables are
allocated by the application itself in the BikeComputer program. Reducing the use of
global variables is usually good practice.
Understand What Memory Goes Where
To better understand what memory goes where, add the following code in your “main.cpp” file
...
const char szMsg[] = "This is a test message";
static constexpr uint8_t size = 10;
uint32_t randomArray[size] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint32_t randomNumber = 0;
...
int main() {
...
tr_info(szMsg);
for (uint8_t i = 0; i < size; i++) {
randomArray[i] = rand();
tr_info("This is a random number %d", randomArray[i]);
}
randomNumber = rand();
tr_info("This is a random number %d", randomNumber);
...
}
Exercice Memory Profiling and Optimization/1
Observe the change in the rom and ram reports for each individual change documented
above. Look at how each text, data and bss
section is modified for each change and give an explanation.
Reducing Memory Usage by Tuning the Zephyr RTOS Configuration
Both flash memory and RAM sizes are limited on most microcontrollers. Reducing the memory footprint of an application can help you squeeze in more features or reduce cost. This can be done by applying a number of application configuration, as documented in the Zephyr RTOS documentation. One simple way of reducing the RAM/ROM footprint is to optimize a number of runtime parameters such as the stack size or the use of GPIOs. It is also possible to optimize the footprint by configuring the kernel so that only the required kernel features are used.
The use of the C library with the minimal footprint size is also another optimization parameter.
Exercice Memory Profiling and Optimization/2
Compile your application by toggling the following configuration parameters and compare both ROM and RAM footprints:
CONFIG_SIZE_OPTIMIZATIONSvsCONFIG_DEBUG_OPTIMIZATIONS- Disable/enable all logging configuration parameters, including
CONFIG_BOOT_BANNER. - Disable/enable FP support in printf with the
CONFIG_CBPRINTF_FP_SUPPORToption. - Disable/enable nano implementation of the
printffunctions, with theCONFIG_CBPRINTF_NANOoption.
Runtime Memory Tracing
Static memory analysis is required and powerful for analyzing how the program memory is organized at compile time. However, it is also very useful to analyze how an embedded software deals with dynamic memory allocations, both for the heap and stack memory. A program that behaves poorly in terms of dynamic memory allocations will become unstable and will potentially crash.
With Zephyr RTOS, the developer can use memory statistics functions to capture heap use, cumulative stack use or stack use for each thread at runtime. To enable memory use monitoring, you must enable the following Zephyr RTOS configuration options:
CONFIG_THREAD_STACK_INFO=y
CONFIG_THREAD_ANALYZER=y
CONFIG_SYS_HEAP_RUNTIME_STATS=y
Once you enable memory statistics, you may instrument the code and do memory
checks at regular intervals or upon requests. This can be implemented with the help of the zpp_lib::Utils class.
By performing a detailed dynamic memory analysis, it is then possible to optimize some parameters, such as reducing the allocated stack size for a given thread or optimizing the use of the heap.
Configuring the use of the heap on Zephyr RTOS
Zephyr RTOS supports different heaps and depending on the application
configuration, calls to malloc and new may use different heaps. In the
zpp_lib::Utils implementation, the implementation of the new and delete operators is overriden when CONFIG_SYS_HEAP_RUNTIME_STATS is enabled. This ensures
that k_malloc and k_free are used and enables heap statistics. This mechanism should,
of course, not be used when heap statistics are not needed.
Exercice Memory Profiling and Optimization/3
Instrument the dynamic memory usage of your BikeComputer program with the
use of the zpp_lib::Utils class. Use both the zpp_lib::Utils::logThreadsStackInfo
and zpp_lib::Utils::logHeapSummary method at regular intervals. After startup, you should observe that your program does not allocate any memory on the heap and that the stack use is also not growing anymore.
By observing the statistics on the console, you should be able to optimize your application configuration and to reduce its RAM usage.
Hunting For Memory Bugs
Detecting a Heap Allocation Error (Memory Leak)
For illustrating analysis of the heap memory, one practical example is the introduction of a memory leak in the code. A memory leak is created when memory allocations are managed in such a way that memory which is NO longer needed is NOT released. For this purpose, you may add a call for allocating memory and not releasing it in a method called at regular intervals. Be aware that allocating memory without using it is not enough, since the compiler will optimize your code and remove unused statements (like allocating an array and only assigning values to the array elements).
If you create a memory leak by creating an instance of the class MemoryLeak as
demonstrated in the main() function and let your program run, you should
observe that the allocated memory on the heap grows constantly and ultimately
you should observe a crash as illustrated in the log below. Note that
CONFIG_ASSERT must be enabled to build the application.
Main program for the memory demo
// zephyr
#include <zephyr/kernel.h>
#include <zephyr/logging/log.h>
// zpp_lib
#include "zpp_include/this_thread.hpp"
#include "zpp_include/utils.hpp"
#include "zpp_include/interrupt_in.hpp"
// local
#include "memory_leak.hpp"
LOG_MODULE_REGISTER(memory_demo, CONFIG_APP_LOG_LEVEL);
int main(void) {
using namespace std::literals;
LOG_DBG("Memory demo program started");
// check which button is pressed
zpp_lib::InterruptIn<zpp_lib::PinName::BUTTON1> button1;
zpp_lib::InterruptIn<zpp_lib::PinName::BUTTON2> button2;
zpp_lib::InterruptIn<zpp_lib::PinName::BUTTON3> button3;
if (button1.read() == zpp_lib::kPolarityPressed) {
LOG_DBG("Starting MemoryLeak demo");
static constexpr uint8_t kNbrOfIterations = 10;
for (int i = 0; i < kNbrOfIterations; i++) {
memory_demo::MemoryLeak memoryLeak;
memoryLeak.use();
zpp_lib::Utils::logHeapSummary();
zpp_lib::ThisThread::sleep_for(1s);
}
} else if (button2.read() == zpp_lib::kPolarityPressed) {
memory_demo::MemoryFragmenter memoryFragmenter;
memoryFragmenter.fragmentMemory();
}
return 0;
}
MemoryLeak class
// Copyright 2025 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/****************************************************************************
* @file memory_leak.hpp
* @author Serge Ayer <serge.ayer@hefr.ch>
*
* @brief Declaration/Implementation of the MemoryLeak class
*
* @date 2025-07-01
* @version 1.0.0
***************************************************************************/
#pragma once
// zephyr
#include <zephyr/kernel.h>
namespace memory_demo {
class MemoryLeak {
public:
static constexpr uint16_t kArraySize = 1024;
// create a memory leak in the constructor itself
MemoryLeak() {
_ptr = new uint8_t[kArraySize];
__ASSERT(_ptr != nullptr, "Cannot allocate memory");
}
~MemoryLeak() {}
MemoryLeak(const MemoryLeak&) = delete;
MemoryLeak& operator=(const MemoryLeak&) = delete;
void use() {
for (uint16_t i = 0; i < kArraySize; i++) {
_ptr[i] = i;
}
}
private:
uint8_t* _ptr;
};
} // namespace memory_demo
Console
ASSERTION FAIL [_ptr != nullptr] @ WEST_TOPDIR/memory/src/memory_leak.hpp:16
Cannot allocate memory
[00:00:15.667,297] <err> os: r0/a1: 0x00000004 r1/a2: 0x00000010 r2/a3: 0x00000000
[00:00:15.675,994] <err> os: r3/a4: 0x00000004 r12/ip: 0x00000004 r14/lr: 0x00000465
[00:00:15.684,661] <err> os: xpsr: 0x09000000
[00:00:15.689,880] <err> os: Faulting instruction address (r15/pc): 0x0000aa92
[00:00:15.697,784] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:00:15.705,505] <err> os: Current thread: 0x20000240 (main)
[00:00:15.712,005] <err> os: Halting system
From the error log above, we can observe that the system cannot allocate a
specific object from the operator new() called in the constructor of the MemoryLeak class.
Heap Fragmentation
A problem that is even more complex to detect is the problem of heap fragmentation. Heap fragmentation is a phenomenon that creates small fragments of memory in the heap space in a way that makes the largest available block of memory smaller and smaller as compared to the total available memory. The fragmentation level can be computed as a ratio between the largest available block of memory and the total available memory:
\(fragmentation = 1 - \frac{largest\ available\ block}{total\ available\ memory}\)
If the fragmentation is \(50\%\) and the available memory is 1 KiB, then the largest available block is 512 bytes. Fragmentation tends to increase over the lifetime of a program and on embedded systems running C++ programs, there is no way of defragmenting the heap. Over time, heap fragmentation tends to
- create unreliable programs: if your program needs a bigger block than the largest available one, it will not get it and will stop working
- and to degrade program performance: a highly fragmented heap is slower because the memory allocator takes more time to deliver a new allocated block.
These are very good reasons for using heap memory with care on embedded systems.
For illustrating the heap fragmentation phenomenon, you may create use the following
MemoryFragmenter class in your BikeComputer program:
MemoryFragmenter class
// Copyright 2025 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/****************************************************************************
* @file memory_fragmenter.hpp
* @author Serge Ayer <serge.ayer@hefr.ch>
*
* @brief Declaration/Implementation of the MemoryFragmenter class
*
* @date 2025-07-01
* @version 1.0.0
***************************************************************************/
#pragma once
// zephyr
#include <zephyr/kernel.h>
// zpp_lib
#include "zpp_include/utils.hpp"
extern "C" {
// Zephyr defines this symbol globally
// To access it you need to define CONFIG_HEAP_MEM_POOL_SIZE=...
extern struct sys_heap _system_heap;
}
namespace memory_demo {
class MemoryFragmenter {
public:
// create a memory leak in the constructor itself
MemoryFragmenter() {}
void fragmentMemory() {
// log heap info
zpp_lib::Utils::logHeapSummary();
// get heap available size
struct sys_memory_stats stats;
sys_heap_runtime_stats_get(&_system_heap, &stats);
// divide the available size by 8 blocks that we allocate
uint32_t blockSize = (stats.free_bytes - kMarginSpace) / kNbrOfBlocks;
printk("Allocating blocks of size %" PRIu32 "\n", blockSize);
char* pBlockArray[kNbrOfBlocks] = {NULL};
for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex++) {
pBlockArray[blockIndex] = new char[blockSize];
__ASSERT(pBlockArray[blockIndex] != nullptr,
"Allocation of block %d of size %d failed",
blockIndex,
blockSize);
printk("Allocated block index %" PRIu32 " of size %" PRIu32
" at address 0x%08" PRIx32 "\n",
blockIndex,
blockSize,
static_cast<uint32_t>(*pBlockArray[blockIndex]));
// copy to member variable to prevent them from being optimized away
for (uint32_t index = 0; index < kArraySize; index++) {
_doubleArray[index] += static_cast<double>(pBlockArray[blockIndex][index]);
}
}
// the full heap (or almost) should be allocated
printk("Heap statistics after full allocation:\n");
zpp_lib::Utils::logHeapSummary();
// delete only the even blocks
for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex += 2) {
delete[] pBlockArray[blockIndex];
pBlockArray[blockIndex] = NULL;
}
// we should have half of the heap space free
printk("Heap statistics after half deallocation:\n");
zpp_lib::Utils::logHeapSummary();
// trying to allocate one block of initial size
// it will succeed
printk("Allocating 1 block of size %" PRIu32 " succeeds !\n", blockSize);
pBlockArray[0] = new char[blockSize];
__ASSERT(
pBlockArray[0] != nullptr, "Allocation of block of size %d failed", blockSize);
printk("Heap statistics after allocating one more block of size %d:\n", blockSize);
zpp_lib::Utils::logHeapSummary();
// trying to allocated one block that is slightly bigger
// without fragmentation, this allocation should succeed
// but it will fail...
blockSize += 8;
// this allocation will fail
printk("Allocating 1 block of size %" PRIu32 " should succeed !\n", blockSize);
pBlockArray[1] = new char[blockSize];
__ASSERT(
pBlockArray[1] != nullptr, "Allocation of block of size %d failed", blockSize);
// copy to member variable to prevent them from being optimized away
for (uint32_t index = 0; index < kArraySize; index++) {
_doubleArray[index] +=
static_cast<double>(pBlockArray[0][index] + pBlockArray[1][index]);
}
}
private:
static constexpr uint8_t kNbrOfBlocks = 8;
static constexpr uint16_t kMarginSpace = 1024;
static constexpr uint8_t kArraySize = 100;
double _doubleArray[kArraySize] = {0};
};
} // namespace memory_demo
This class is used in the main() function given above. If you run the program by pressing Button 2, you will see an error on the console similar to the one shown below:
Console
Allocating 1 block of size 1910 succeeds !
Heap statistics after allocating one more block of size 1910:
[00:00:00.417,236] <inf> zpp_rtos: === Heap Summary ===
[00:00:00.422,882] <inf> zpp_rtos: Allocated: 9600 bytes
[00:00:00.428,710] <inf> zpp_rtos: Free: 6704 bytes
[00:00:00.434,509] <inf> zpp_rtos: Max Alloc: 15360 bytes
Allocating 1 block of size 1918 should succeed !
ASSERTION FAIL [pBlockArray[1] != nullptr] @ WEST_TOPDIR/memory/src/memory_fragmenter.hpp:99
Allocation of block of size 1918 failed
[00:00:00.456,665] <err> os: r0/a1: 0x00000004 r1/a2: 0x00000063 r2/a3: 0x00000000
[00:00:00.465,332] <err> os: r3/a4: 0x00000004 r12/ip: 0x00000000 r14/lr: 0x0000094d
[00:00:00.473,999] <err> os: xpsr: 0x09000000
[00:00:00.479,217] <err> os: Faulting instruction address (r15/pc): 0x0000b002
[00:00:00.487,121] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[00:00:00.494,842] <err> os: Current thread: 0x20000240 (main)
[00:00:00.501,342] <err> os: Halting system
As you can see in the log file, although the available heap size is 6704 bytes, an
allocation of 1918 bytes fails.
To minimize the type of problems illustrated above, it is often recommended to apply the following guidelines on embedded systems:
- Privilege the use of static allocation vs. dynamic allocation whenever possible.
- Privilege the use of automatic allocation (stack) when feasible: allocation on the stack is almost free, but in this case, care must be given to stack overflow errors.
- Use memory pools to provide buffers of fixed size to an application (see Zephyr RTOS Memory Slabs) for instance. This prevents fragmentation.
Detecting a Stack Overflow Error
By using the memory tracing functionalities demonstrated above, we may know which threads are running and the memory space that they are using. This is very useful information for optimizing memory usage for each thread. This is also useful for debugging stack overflow errors.
Stack overflow can occur in a variety of situations. To understand how to detect such errors, it is easier to simulate one. To this end, you can write code that allocates more and more memory to the stack in a thread that runs a loop. An example of such code is given below:
MemoryStackOverflow class
// Copyright 2025 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/****************************************************************************
* @file stack_overflow.hpp
* @author Serge Ayer <serge.ayer@hefr.ch>
*
* @brief Declaration/Implementation of the StackOverflow class
*
* @date 2025-07-01
* @version 1.0.0
***************************************************************************/
#pragma once
// zephyr
#include <zephyr/kernel.h>
// std
#include <cstdint>
namespace memory_demo {
class StackOverflow {
public:
void allocateOnStack() {
// allocate an array with growing size until it does not fit on the stack anymore
size_t allocSize = kArraySize * _multiplier;
// Create a variable-size object on the stack
double anotherArray[allocSize]; // NOLINT(runtime/arrays)
for (size_t i = 0; i < allocSize; i++) {
anotherArray[i] = i;
}
// copy to member variable to prevent them from being optimized away
for (size_t i = 0; i < kArraySize; i++) {
_doubleArray[i] += anotherArray[i];
}
_multiplier++;
}
private:
static constexpr size_t kArraySize = 40;
double _doubleArray[kArraySize] = {0};
size_t _multiplier = 1;
};
} // namespace memory_demo
In the main() function, allocateOnStack() is called in a loop and heap information is displayed at each iteration. In the console, you see that the maximum number of bytes used on the stack of the main thread continuously increases. Once the stack overflow
happens, you will a stack overflow error.
Exercice Memory Profiling and Optimization/4
With the standard configuration using a Cortex-M CPU with MPU, hardware-based protection against stack overflows is enabled (when the system is running in privileged mode). Try to disable hardware protection by adding CONFIG_HW_STACK_PROTECTION=n in the prj.conf file and observe what happens when running the same program.
Exercice Memory Profiling and Optimization/5
Find and implement another very common way of creating a stack overflow in any program.