Memory Profiling and Optimization
Introduction
Cortex-M Program Image and Memory
In this codelab, we will first explore the concept of a Cortex-M program image. We will then examine different memory profiling techniques for Zephyr RTOS programs, illustrating some classic issues associated with static or dynamic memory usage.
What you’ll build
- You will modify your
BikeComputerprogram to improve your understanding of its program image and memory map. - You will instrument your
BikeComputerprogram to perform dynamic memory analysis. - You will modify the
BikeComputerprogram to create memory issues on purpose.
What you’ll learn
- You will understand how a Cortex-M program image is made and how it is used for starting your program on a Cortex-M device.
- You will understand the boot sequence of a Zephyr RTOS program.
- You will understand how a Zephyr RTOS program memory is organized in RAM and how to trace dynamic memory allocations.
What you’ll need
- Zephyr Development Environment for developing and debugging your program in C++.
- All
BikeComputerand the multi-tasking codelabs are prerequisites for this codelab.
The Program Image
A Cortex-M program image or executable file (e.g. the .elf file on your computer) refers to a piece of code that is ready to execute. The image can occupy up to 512 MiB of memory space, ranging from address 0x00000000 to address 0x1FFFFFFF, as shown in Figure 1 for the nRF5340 MCU.
The program image is usually stored in non-volatile memory such as on-chip Flash memory and it is normally separated from the program data, which is allocated in the SRAM or data region of the code memory space.
To build the program image, the toolchain uses the dtsi file that defines the different memory regions/partitions. These partitions are then used and defined in the dts file. These files are self-documenting, so you can easily recognise the ROM and RAM region definitions.
Based on this information, the toolchain produces a program image (CODE region) that corresponds to the map depicted in the figure above.
Analysing the ELF file produced by the linker can help to better understand this program image. Among other tools, the standard GNU binutils and other specific Zephyr RTOS tools are useful:
arm-zephyr-eabi-sizeprints information abouttext,dataandbsssections such asconsoletext data bss dec hex filename 0x2396c 0x1e4 0x34fce 363294 58b1e build\zephyr\zephyr.elf
or more detailed views such as
arm-zephyr-eabi-size output
build\zephyr\zephyr.elf :
section size addr
rom_start 0x154 0x0
text 0x1479c 0x154
.ARM.extab 0x134 0x148f0
.ARM.exidx 0x250 0x14a24
initlevel 0x78 0x14c74
device_area 0x120 0x14cec
sw_isr_table 0x228 0x14e0c
gpio_driver_api_area 0x24 0x15034
i2c_driver_api_area 0x18 0x15058
sensor_driver_api_area 0x1c 0x15070
spi_driver_api_area 0x8 0x1508c
clock_control_driver_api_area 0x1c 0x15094
display_driver_api_area 0x2c 0x150b0
mipi_dbi_driver_api_area 0x18 0x150dc
uart_driver_api_area 0xc 0x150f4
init_array 0x14 0x15100
log_const_area 0xa8 0x15114
log_backend_area 0x10 0x151bc
tbss 0x8 0x151cc
rodata 0xe7b4 0x151d0
.ramfunc 0x0 0x20000000
datas 0x15c 0x20000000
device_states 0x14 0x2000015c
log_mpsc_pbuf_area 0x40 0x20000170
log_msg_ptr_area 0x4 0x200001b0
k_heap_area 0x18 0x200001b4
.comment 0x20 0x0
.debug_aranges 0x3a20 0x0
.debug_info 0x1008c6 0x0
.debug_abbrev 0x1eed9 0x0
.debug_line 0x4836a 0x0
.debug_frame 0xe2e0 0x0
.debug_str 0x5ab14 0x0
.debug_loc 0x54c2d 0x0
.debug_ranges 0x5e28 0x0
.ARM.attributes 0x38 0x0
.last_section 0x4 0x23b50
bss 0xe86 0x200001d0
noinit 0x34140 0x20001058
Total 0x2878e8
arm-zephyr-eabi-readelfprovides a more detailed view of the different memory sections:
arm-zephyr-eabi-readelf output
There are 43 section headers, starting at offset 0x27b454:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] rom_start PROGBITS 00000000 000100 000154 00 AX 0 0 4
[ 2] text PROGBITS 00000154 000254 01479c 00 AX 0 0 4
[ 3] .ARM.extab PROGBITS 000148f0 0149f0 000134 00 A 0 0 4
[ 4] .ARM.exidx ARM_EXIDX 00014a24 014b24 000250 00 AL 2 0 4
[ 5] initlevel PROGBITS 00014c74 014d74 000078 00 A 0 0 4
[ 6] device_area PROGBITS 00014cec 014dec 000120 00 A 0 0 4
[ 7] sw_isr_table PROGBITS 00014e0c 014f0c 000228 00 A 0 0 4
[ 8] gpio_driver_[...] PROGBITS 00015034 015134 000024 00 A 0 0 4
[ 9] i2c_driver_a[...] PROGBITS 00015058 015158 000018 00 A 0 0 4
[10] sensor_drive[...] PROGBITS 00015070 015170 00001c 00 A 0 0 4
[11] spi_driver_a[...] PROGBITS 0001508c 01518c 000008 00 A 0 0 4
[12] clock_contro[...] PROGBITS 00015094 015194 00001c 00 A 0 0 4
[13] display_driv[...] PROGBITS 000150b0 0151b0 00002c 00 A 0 0 4
[14] mipi_dbi_dri[...] PROGBITS 000150dc 0151dc 000018 00 A 0 0 4
[15] uart_driver_[...] PROGBITS 000150f4 0151f4 00000c 00 A 0 0 4
[16] init_array INIT_ARRAY 00015100 015200 000014 04 WA 0 0 4
[17] log_const_area PROGBITS 00015114 015214 0000a8 00 A 0 0 4
[18] log_backend_area PROGBITS 000151bc 0152bc 000010 00 A 0 0 4
[19] tbss NOBITS 000151cc 0152cc 000008 00 WAT 0 0 4
[20] rodata PROGBITS 000151d0 0152d0 00e7b4 00 A 0 0 16
[21] .ramfunc PROGBITS 20000000 023c54 000000 00 W 0 0 1
[22] datas PROGBITS 20000000 023a84 00015c 00 WA 0 0 4
[23] device_states PROGBITS 2000015c 023be0 000014 00 WA 0 0 1
[24] log_mpsc_pbu[...] PROGBITS 20000170 023bf4 000040 00 WA 0 0 4
[25] log_msg_ptr_area PROGBITS 200001b0 023c34 000004 00 WA 0 0 4
[26] k_heap_area PROGBITS 200001b4 023c38 000018 00 WA 0 0 4
[27] .comment PROGBITS 00000000 023c54 000020 01 MS 0 0 1
[28] .debug_aranges PROGBITS 00000000 023c78 003a20 00 0 0 8
[29] .debug_info PROGBITS 00000000 027698 1008c6 00 0 0 1
[30] .debug_abbrev PROGBITS 00000000 127f5e 01eed9 00 0 0 1
[31] .debug_line PROGBITS 00000000 146e37 04836a 00 0 0 1
[32] .debug_frame PROGBITS 00000000 18f1a4 00e2e0 00 0 0 4
[33] .debug_str PROGBITS 00000000 19d484 05ab14 01 MS 0 0 1
[34] .debug_loc PROGBITS 00000000 1f7f98 054c2d 00 0 0 1
[35] .debug_ranges PROGBITS 00000000 24cbc8 005e28 00 0 0 8
[36] .ARM.attributes ARM_ATTRIBUTES 00000000 2529f0 000038 00 0 0 1
[37] .last_section PROGBITS 00023b50 023c50 000004 00 WA 0 0 4
[38] bss NOBITS 200001d0 023c58 000e86 00 WA 0 0 8
[39] noinit NOBITS 20001058 023c58 034140 00 WA 0 0 8
[40] .symtab SYMTAB 00000000 252a28 013e50 10 41 3216 4
[41] .strtab STRTAB 00000000 266878 0149a9 00 0 0 1
[42] .shstrtab STRTAB 00000000 27b221 000233 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), y (purecode), p (processor specific)
-
Use west to generate a ROM and a RAM report using
west build -t rom_report > rom_report.txtorwest build -t ram_report > ram_reportafter a build. -
Use the full memory map file generated by the linker (“build/zephyr/zephyr.map”)
The Boot Sequence and Memory Initialization
Understand the Early Boot Sequence
Upon reset, a startup code is executed by the Cortex-M processor. The startup code is specific to each platform and toolchain, but it usually consists of
- setting the initial SP,
- setting the initial PC to the reset handler function,
- setting the vector table entries with the exceptions ISR addresses, and
- branching to initialization of the C library, which eventually switches to the main thread that executes the
main()function of your program.
When we use the command arm-zephyr-eabi-objdump.exe -D -G on the elf file, we
get an output similar to the following:
Disassembly of section rom_start:
00000000 <_vector_table>:
0: 20034d98 mulcs r3, r8, sp
4: 00005b45 andeq r5, r0, r5, asr #22
From this output and from the output of the other tools described above, we can observe that:
- The code region begins with the vector table, with the first entry
corresponding to the main stack pointer. Note that
objdumphas attempted to disassemble the stack pointer and the program counter (PC) as instructions, as it cannot distinguish between code and data. When starting up, the ARM processor fetches whatever is at the address0x0000_0000and assumes that it is the stack pointer value. In our case, this corresponds to the address 0x20034d98, which is defined as the pointer to the stack end (recall that the stack grows downwards). As can be seen in the dump file, this corresponds to the system workqueue stack of Zephyr RTOS.
...
20034d98 <sys_work_q_stack>:
- The next entry in the vector table specifies the jump location for starting the program upon reset. This corresponds to the start of the system’s early boot sequence. Looking at the code reveals that the
SystemInitfunction is called first, followed by other functions such asz_early_memsetandz_prep_c. All of these functions are defined in the Zephyr RTOS library.
Kernel Initialization
The bootup sequence is illustrated in the diagram below:
It is important to note the following points:
- The early bootup sequence starts with the reset handler. This initialization phase makes the system ready for kernel initialization.
- The configuration levels (PRE_KERNEL1, PRE_KERNEL2, and POST_KERNEL) allow to specify in which order initialization code is executed. What is executed in each phase depends on the platform itself and on the application configuration (as described in the prj.conf file).
- The kernel initialization makes the kernel ready for the application. Static devices and board-level subsystems (e.g. clock, basic peripherals, console/UART/RTT) are initialized during this phase.
- After kernel initialization, the scheduler and essential threads (e.g. the main and idle threads) are started; after that Zephyr prints its boot banner (if enabled).
- After kernel+post-kernel setup, board- and application-specific modules, drivers and threads (as defined in “prj.conf” and DTS overlays) are initialized and started.
Static Memory Analysis Using memap
As explained in the previous section, Zephyr RTOS provides a number of utility
tools that help you to understand how the program image is structured and how
memory space is used by any Zephyr RTOS application. Running the command west
build -t rom_report > rom_report.txt will cause the file “rom_report.txt” to
contain detailed information on how the program image is structured. Searching
for “bike_system.cpp” or other application-specific files will produce an output
similar to the following:
rom_report output
│ │ ├── main.cpp 172 0.12% -
│ │ │ ├── log_const_bike_computer 8 0.01% 0x00015134 log_const_area
│ │ │ └── main 164 0.11% 0x000022f5 text
│ │ ├── multi_tasking 2468 1.69% -
│ │ │ ├── bike_system.cpp 1842 1.26% -
In this report, text refers to code application, rodata refers to read-only
constants and datas refers to non-zero initialization data. All ot these are stored in
Flash/ROM. datas are values stored in ROM that will serve to initialize
variables stored in RAM.
The RAM report display how RAM is allocated:
ram_report output
│ ├── _ZGVZN13bike_computer9getFont16EvE7pFont16 4 0.00% 0x20000a20 bss
│ ├── _ZGVZN13bike_computer9getFont18EvE7pFont1 4 0.00% 0x20000a28 bss
Searching for “bike_computer” in this file shows that few global variables are
allocated by the application itself in the BikeComputer program. Reducing the use of
global variables is usually good practice.
Understand What Memory Goes Where
To better understand what memory goes where, add the following code in your “main.cpp” file
...
const char szMsg[] = "This is a test message";
static constexpr uint8_t size = 10;
uint32_t randomArray[size] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint32_t randomNumber = 0;
...
int main() {
...
tr_info(szMsg);
for (uint8_t i = 0; i < size; i++) {
randomArray[i] = rand();
tr_info("This is a random number %d", randomArray[i]);
}
randomNumber = rand();
tr_info("This is a random number %d", randomNumber);
...
}
Exercice Memory Profiling and Optimization/1
Observe the change in the rom and ram reports for each individual change documented
above. Look at how each text, data and bss
section is modified for each change and give an explanation.
Reducing Memory Usage by Tuning the Zephyr RTOS Configuration
Both flash memory and RAM sizes are limited on most microcontrollers. Reducing the memory footprint of an application can help you squeeze in more features or reduce cost. This can be done by applying a number of application configuration, as documented in the Zephyr RTOS documentation. One simple way of reducing the RAM/ROM footprint is to optimize a number of runtime parameters such as the stack size or the use of GPIOs. It is also possible to optimize the footprint by configuring the kernel so that only the required kernel features are used.
The use of the C library with the minimal footprint size is also another optimization parameter.
Exercice Memory Profiling and Optimization/2
Compile your application by toggling the following configuration parameters and compare both ROM and RAM footprints:
CONFIG_SIZE_OPTIMIZATIONSvsCONFIG_DEBUG_OPTIMIZATIONS- Disable/enable all logging configuration parameters, including
CONFIG_BOOT_BANNER. - Disable/enable FP support in printf with the
CONFIG_CBPRINTF_FP_SUPPORToption. - Disable/enable nano implementation of the
printffunctions, with theCONFIG_CBPRINTF_NANOoption.
Runtime Memory Tracing
Static memory analysis is required and powerful for analyzing how the program memory is organized at compile time. However, it is also very useful to analyze how an embedded software deals with dynamic memory allocations, both for the heap and stack memory. A program that behaves poorly in terms of dynamic memory allocations will become unstable and will potentially crash.
With Zephyr RTOS, the developer can use memory statistics functions to capture heap use, cumulative stack use or stack use for each thread at runtime. To enable memory use monitoring, you must enable the following Zephyr RTOS configuration options:
CONFIG_THREAD_STACK_INFO=y
CONFIG_THREAD_ANALYZER=y
CONFIG_SYS_HEAP_RUNTIME_STATS=y
Once you enable memory statistics, you may instrument the code and do memory
checks at regular intervals or upon requests. This can be implemented with the help of the zpp_lib::Utils class.
By performing a detailed dynamic memory analysis, it is then possible to optimize some parameters, such as reducing the allocated stack size for a given thread or optimizing the use of the heap.
Configuring the use of the heap on Zephyr RTOS
Zephyr RTOS supports different heaps and depending on the application
configuration, calls to malloc and new may use different heaps. In the
zpp_lib::Utils implementation, the implementation of the new and delete operators is overriden when CONFIG_SYS_HEAP_RUNTIME_STATS is enabled. This ensures
that k_malloc and k_free are used and enables heap statistics. This mechanism should,
of course, not be used when heap statistics are not needed.
Exercice Memory Profiling and Optimization/3
Instrument the dynamic memory usage of your BikeComputer program with the
use of the zpp_lib::Utils class. Use both the zpp_lib::Utils::logThreadsStackInfo
and zpp_lib::Utils::logHeapSummary method at regular intervals. After startup, you should observe that your program does not allocate any memory on the heap and that the stack use is also not growing anymore.
By observing the statistics on the console, you should be able to optimize your application configuration and to reduce its RAM usage.
WORK IN PROGRESS
Hunting For Memory Bugs
Detecting a Heap Allocation Error (Memory Leak)
For illustrating analysis of the heap memory, one practical example is the introduction of a memory leak in the code. A memory leak is created when memory allocations are managed in such a way that memory which is NO longer needed is NOT released. For this purpose, you may add a call for allocating memory and not releasing it in a method called at regular intervals. Be aware that allocating memory without using it is not enough, since the compiler will optimize your code and remove unused statements (like allocating an array and only assigning values to the array elements).
If you create a memory leak by creating an instance of the class MemoryLeak below
in one of the task method your BikeComputer program and let your program run,
you should observe that the allocated memory on the heap grows constantly and
ultimately you should observe a crash as illustrated in the log below:
MemoryLeak class
#pragma once
#include "mbed.h"
namespace multi_tasking {
class MemoryLeak {
public:
static constexpr uint16_t kArraySize = 1024;
// create a memory leak in the constructor itself
MemoryLeak() { _ptr = new int[kArraySize]; }
void use() {
for (uint16_t i = 0; i < kArraySize; i++) {
_ptr[i] = i;
}
}
private:
int* _ptr;
};
} // namespace multi_tasking
Console
++ MbedOS Error Info ++
Error Status: 0x8001011F Code: 287 Module: 1
Error Message: Operator new[] out of memory
Location: 0x800F025
File: mbed_retarget.cpp+1848
Error Value: 0x5000
Current Thread: main Id: 0x240035B0 Entry: 0x8013581 StackSize: 0x2000 StackMem: 0x24000C20 SP: 0x240022E4
Next:
main State: 0x2 Entry: 0x08013581 Stack Size: 0x00002000 Mem: 0x24000C20 SP: 0x240022C8
Ready:
rtx_idle State: 0x1 Entry: 0x080143A9 Stack Size: 0x00000380 Mem: 0x240035F8 SP: 0x24003928
Wait:
rtx_timer State: 0x83 Entry: 0x08015081 Stack Size: 0x00000300 Mem: 0x24003978 SP: 0x24003C18
Delay:
For more info, visit: https://mbed.com/s/error?error=0x8001011F&osver=61700&core=0x411FC271&comp=1&ver=6160001&tgt=DISCO_H747I
Note that for getting additional error information, you need to modify the Mbed OS configuration as illustrated below:
"target_overrides": {
"*": {
"platform.error-all-threads-info": 1,
"platform.error-filename-capture-enabled": 1
}
}
From the error log above, we can observe that the system cannot allocate a
specific object from the operator new() called from the main thread. We also
know that the error happens at line 1848 of the “mbed_retarget.cpp” file.
Heap Fragmentation
A problem that is even more complex to detect is the problem of heap fragmentation. Heap fragmentation is a phenomenon that creates small fragments of memory in the heap space in a way that makes the largest available block of memory smaller and smaller as compared to the total available memory. The fragmentation level can be computed as a ratio between the largest available block of memory and the total available memory:
\(fragmentation = 1 - \frac{largest\ available\ block}{total\ available\ memory}\)
If the fragmentation is \(50\%\) and the available memory is 1 KiB, then the largest available block is 512 bytes. Fragmentation tends to increase over the lifetime of a program and on embedded systems running C++ programs, there is no way of defragmenting the heap. Over time, heap fragmentation tends to
- create unreliable programs: if your program needs a bigger block than the largest available one, it will not get it and will stop working
- and to degrade program performance: a highly fragmented heap is slower because the memory allocator takes more time to deliver a new allocated block.
These are very good reasons for using heap memory with care on embedded systems.
For illustrating the heap fragmentation phenomenon, you may create use the following
MemoryFragmenter class in your BikeComputer program:
MemoryFragmenter class
#pragma once
#include "mbed.h"
#include "memory_logger.hpp"
namespace multi_tasking {
class MemoryFragmenter {
public:
// create a memory leak in the constructor itself
MemoryFragmenter() {}
void fragmentMemory() {
// create a memory logger
MemoryLogger memorLogger;
// get heap info
mbed_stats_heap_t heapInfo = {0};
mbed_stats_heap_get(&heapInfo);
uint32_t availableSize =
heapInfo.reserved_size - heapInfo.current_size - heapInfo.overhead_size;
tr_debug("Available heap size is %" PRIu32 " (reserved %" PRIu32 ")",
availableSize,
heapInfo.reserved_size);
// divide the available size by 8 blocks that we allocate
uint32_t blockSize = (availableSize - kMarginSpace) / kNbrOfBlocks;
tr_debug("Allocating blocks of size %" PRIu32 "", blockSize);
char* pBlockArray[kNbrOfBlocks] = {NULL};
for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex++) {
pBlockArray[blockIndex] = new char[blockSize];
if (pBlockArray[blockIndex] == NULL) {
tr_error("Cannot allocate block memory for index %" PRIu32 "",
blockIndex);
}
tr_debug("Allocated block index %" PRIu32 " of size %" PRIu32
" at address 0x%08" PRIx32 "",
blockIndex,
blockSize,
(uint32_t)pBlockArray[blockIndex]);
// copy to member variable to prevent them from being optimized away
for (uint32_t index = 0; index < kArraySize; index++) {
_doubleArray[index] += (double)pBlockArray[blockIndex][index];
}
}
// the full heap (or almost) should be allocated
tr_debug("Heap statistics after full allocation:");
memorLogger.getAndPrintHeapStatistics();
// delete only the even blocks
for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex += 2) {
delete[] pBlockArray[blockIndex];
pBlockArray[blockIndex] = NULL;
}
// we should have half of the heap space free
tr_debug("Heap statistics after half deallocation:");
memorLogger.getAndPrintHeapStatistics();
// trying to allocated one block that is slightly bigger
// without fragmentation, this allocation should succeed
heapInfo = {0};
mbed_stats_heap_get(&heapInfo);
availableSize =
heapInfo.reserved_size - heapInfo.current_size - heapInfo.overhead_size;
tr_debug("Available heap size is %" PRIu32 " (reserved %" PRIu32 ")",
availableSize,
heapInfo.reserved_size);
blockSize += 8;
// this allocation will fail
tr_debug("Allocating 1 block of size %" PRIu32 " should succeed !", blockSize);
pBlockArray[0] = new char[blockSize];
// copy to member variable to prevent them from being optimized away
for (uint32_t index = 0; index < kArraySize; index++) {
_doubleArray[index] += (double)pBlockArray[0][index];
}
}
private:
static constexpr uint8_t kNbrOfBlocks = 8;
static constexpr uint16_t kMarginSpace = 1024;
static constexpr uint8_t kArraySize = 100;
double _doubleArray[kArraySize] = {0};
};
} // namespace multi_tasking
If you create an instance of this class in your BikeComputer program and call
the MemoryFragmenter::fragmentMemory() method, you will observe an error on the
console similar to the one shown below:
Console
[DBG ][MemoryFragmenter]: Available heap size is 501308 (reserved 506044)
[DBG ][MemoryFragmenter]: Allocating blocks of size 62535
[DBG ][MemoryFragmenter]: Allocated block index 0 of size 62535 at address 0x240055f0
[DBG ][MemoryFragmenter]: Allocated block index 1 of size 62535 at address 0x24014a48
[DBG ][MemoryFragmenter]: Allocated block index 2 of size 62535 at address 0x24023ea0
[DBG ][MemoryFragmenter]: Allocated block index 3 of size 62535 at address 0x240332f8
[DBG ][MemoryFragmenter]: Allocated block index 4 of size 62535 at address 0x24042750
[DBG ][MemoryFragmenter]: Allocated block index 5 of size 62535 at address 0x24051ba8
[DBG ][MemoryFragmenter]: Allocated block index 6 of size 62535 at address 0x24061000
[DBG ][MemoryFragmenter]: Allocated block index 7 of size 62535 at address 0x24070458
[DBG ][MemoryFragmenter]: Heap statistics after full allocation:
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]: Bytes allocated currently: 504892
[DBG ][MemoryLogger]: Max bytes allocated at a given time: 504892
[DBG ][MemoryLogger]: Cumulative sum of bytes ever allocated: 504892
[DBG ][MemoryLogger]: Current number of bytes allocated for the heap: 506044
[DBG ][MemoryLogger]: Current number of allocations: 16
[DBG ][MemoryLogger]: Number of failed allocations: 0
[DBG ][MemoryFragmenter]: Heap statistics after half deallocation:
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]: Bytes allocated currently: 254752
[DBG ][MemoryLogger]: Max bytes allocated at a given time: 504892
[DBG ][MemoryLogger]: Cumulative sum of bytes ever allocated: 504892
[DBG ][MemoryLogger]: Current number of bytes allocated for the heap: 506044
[DBG ][MemoryLogger]: Current number of allocations: 12
[DBG ][MemoryLogger]: Number of failed allocations: 0
[DBG ][MemoryFragmenter]: Available heap size is 251100 (reserved 506044)
[DBG ][MemoryFragmenter]: Allocating 1 block of size 62543 should succeed !
++ MbedOS Error Info ++
Error Status: 0x8001011F Code: 287 Module: 1
Error Message: Operator new[] out of memory
As you can observe, while the available heap size is 251100 bytes, an
allocation of 62543 bytes fails with an out of memory error.
For minimizing the type of problems illustrated above, it is often recommended to apply the following guidelines on embedded systems:
- Privilege the use of static allocation vs. dynamic allocation whenever possible.
- Privilege the use of automatic allocation (stack) when feasible: allocation on the stack is almost free, but in this case, care must be given to stack overflow errors.
- Use private, application specific memory pools for providing buffers of fixed size to an application (see Mbed OS Memory Pool). This prevents multiple allocation of buffers from the heap. Note that this mechanism is implemented for instance in the Mbed OS Mail API that implements a queuing mechanism for exchanging messages providing a memory pool for allocating the messages.
Detecting a Stack Overflow Error
By using the memory tracing functionalities demonstrated above, we may know which threads are running and the memory space that they are using. This is very useful information for optimizing memory usage for each thread. This is also useful for debugging stack overflow errors.
Stack overflow may happen in very different situations. For understanding how to detect such errors, it is of course easier to simulate one such error. For this purpose, you may add a code allocating more and more memory on the stack in a thread running a loop. An example of such a code is given below:
MemoryStackOverflow class
#pragma once
#include <cstdint>
#include "mbed.h"
namespace multi_tasking {
class MemoryStackOverflow {
public:
void allocateOnStack() {
// allocate an array with growing size until it does not fit on the stack anymore
size_t allocSize = kArraySize * _multiplier;
// Create a variable-size object on the stack
double anotherArray[allocSize];
for (size_t i = 0; i < allocSize; i++) {
anotherArray[i] = i;
}
// copy to member variable to prevent them from being optimized away
for (size_t i = 0; i < kArraySize; i++) {
_doubleArray[i] += anotherArray[i];
}
_multiplier++;
}
private:
static constexpr size_t kArraySize = 40;
double _doubleArray[kArraySize] = {0};
size_t _multiplier = 1;
};
} // namespace multi_tasking
If you call the MemoryLogger::printDiffs() method at regular intervals, you
will observe that the maximum number of bytes used on the stack of the thread
using the MemoryStackOverflow continuously increases. Once the stack overflow
happens, you may experience different types of errors, including an application
crash or an application running “crazy”. The reason is that the stack gets
corrupted and that no stack corruption protection is implemented in the
application.
For improving stack corruption check, you may modify the Mbed OS configuration in the “mbed_app.json” file as follows:
"macros": [
...
"RTX_STACK_CHECK=1"
],
If you recompile your application and run with RTX_STACK_CHECK=1, then you
should get the following error on the console:
Error log
++ MbedOS Error Info ++
Error Status: 0x80020125 Code: 293 Module: 2
Error Message: CMSIS-RTOS error: Stack overflow
Location: 0x8014291
File: mbed_rtx_handlers.c+60
Error Value: 0x1
Current Thread: rtx_idle Id: 0x24003670 Entry: 0x8014411 StackSize: 0x380 StackMem: 0x24003740 SP: 0x2407FF1C
Next:
rtx_idle State: 0x2 Entry: 0x08014411 Stack Size: 0x00000380 Mem: 0x24003740 SP: 0x24003A70
Ready:
Wait:
rtx_timer State: 0x83 Entry: 0x080150E9 Stack Size: 0x00000300 Mem: 0x24003AC0 SP: 0x24003D60
Delay:
main State: 0x43 Entry: 0x080135E9 Stack Size: 0x00002000 Mem: 0x24000D68 SP: 0x24002410
For more info, visit: https://mbed.com/s/error?error=0x80020125&osver=61700&core=0x411FC271&comp=1&ver=6160001&tgt=DISCO_H747I
-- MbedOS Error Info --
Unfortunately, the log error does not always indicate a stack overflow. There are situations where the RTX stack check mechanism is not able to detect stack corruption, in which case the application ultimately crashes with a generic fault exception.
Exercice Memory Profiling and Optimization/4
Try to figure out how and where the stack overflow detection is implemented in the RTX OS implementation.
Exercice Memory Profiling and Optimization/4
Find and implement another very common way of creating a stack overflow in your BikeComputer program.