Memory Profiling and Optimization
Introduction
Cortex-M Program Image and Memory
In this codelab, we will first dig into the understanding of a Mbed OS Cortex-M program image and of the Mbed OS memory model. We will then investigate different memory profiling techniques of Mbed OS programs and illustrate some classical issues when dealing with static or dynamic usage of the memory.
What you’ll build
- You will make modifications in your BikeComputer program for a better understanding of its program image and memory map.
- You will instrument your BikeComputer program for performing dynamic memory analysis.
- You will modify the BikeComputer program for creating memory issues on purpose.
What you’ll learn
- You will understand how a Cortex-M program image is made and how it is used for starting your program on a Cortex-M device.
- You will understand the boot sequence of a Mbed OS program and how the program memory is configured and initialized.
- You will understand how a Mbed OS program memory is organized in RAM and how to trace dynamic memory allocations.
What you’ll need
- Mbed Studio for developing and debugging your program in C++.
- All BikeComputer and the multi-tasking codelabs are prerequisites for this codelab.
The Program Image
A Cortex-M program image or executable file (e.g. the .elf file on your computer) refers to a piece of code that is ready to execute. The image can occupy up to 512 MiB of memory space, ranging from address 0x00000000 to address 0x1FFFFFFF, as shown in Figure 1 for the Cortex-M7 architecture. The code memory map of the STM32H747 MCU is shown in Figure 2.
The program image is usually stored in non-volatile memory such as on-chip Flash memory and it is normally separated from the program data, which is allocated in the SRAM or data region of the code memory space.
For building the program image, the linker uses a scatter file that defines its different memory regions. The scatter file of your target device (the “stm32h747xI_CM7.sct” file) is shown below - this file can be easily understood and you may easily recognize the definitions of the ROM and RAM regions for instance:
STM32H747I scatter file
#! armclang -E --target=arm-arm-none-eabi -x c -mcpu=cortex-m7
; Scatter-Loading Description File
;
; SPDX-License-Identifier: BSD-3-Clause
;******************************************************************************
;* @attention
;*
;* Copyright (c) 2016-2020 STMicroelectronics.
;* All rights reserved.
;*
;* This software component is licensed by ST under BSD 3-Clause license,
;* the "License"; You may not use this file except in compliance with the
;* License. You may obtain a copy of the License at:
;* opensource.org/licenses/BSD-3-Clause
;*
;******************************************************************************
#include "../cmsis_nvic.h"
#if !defined(MBED_APP_START)
#define MBED_APP_START MBED_ROM_START
#endif
#if !defined(MBED_APP_SIZE)
#define MBED_APP_SIZE MBED_ROM_SIZE
#endif
#if !defined(MBED_CONF_TARGET_BOOT_STACK_SIZE)
/* This value is normally defined by the tools to 0x1000 for bare metal and 0x400 for RTOS */
#if defined(MBED_BOOT_STACK_SIZE)
#define MBED_CONF_TARGET_BOOT_STACK_SIZE MBED_BOOT_STACK_SIZE
#else
#define MBED_CONF_TARGET_BOOT_STACK_SIZE 0x400
#endif
#endif
/* Round up VECTORS_SIZE to 8 bytes */
#define VECTORS_SIZE (((NVIC_NUM_VECTORS * 4) + 7) AND ~7)
LR_IROM1 MBED_APP_START MBED_APP_SIZE {
ER_IROM1 MBED_APP_START MBED_APP_SIZE {
*.o (RESET, +First)
*(InRoot$$Sections)
.ANY (+RO)
}
RW_IRAM1 (MBED_RAM_START) { ; RW data
.ANY (+RW +ZI)
}
ARM_LIB_HEAP AlignExpr(+0, 16) EMPTY (MBED_RAM_START + MBED_RAM_SIZE - MBED_CONF_TARGET_BOOT_STACK_SIZE - AlignExpr(ImageLimit(RW_IRAM1), 16)) { ; Heap growing up
}
ARM_LIB_STACK (MBED_RAM_START + MBED_RAM_SIZE) EMPTY -MBED_CONF_TARGET_BOOT_STACK_SIZE { ; Stack region growing down
}
RW_DMARxDscrTab 0x30040000 0x60 {
*(.RxDecripSection)
}
RW_DMATxDscrTab 0x30040100 0x140 {
*(.TxDecripSection)
}
RW_Rx_Buffb 0x30040400 0x1800 {
*(.RxArraySection)
}
RW_Eth_Ram 0x30044000 0x4000 {
*(.ethusbram)
}
}
Based on this information, the linker produces a program image (CODE region) that corresponds to the map depicted in the figure above. This program image can be better understood by analyzing the elf file produced by the linker. Note that there are a number of tools for analyzing elf files and using them is beyond the scope of this codelab. It is however useful to give some more details about the structure of the program image using the result produced by one of these tools (more specifically the Keil fromelf program as documented on Keil FromElf). The full file showing an example of program image analysis for the BikeComputer program is shown here.
From this file, we can observe that:
- The Flash memory section containing the program code (denoted “ER_IROM1”) starts at address 0x0800_0000 and has a size of 338400 bytes. The “SHT_EXECINSTR” attribute means that this section contains executable machine instructions. We can check that this corresponds to the Flash memory bank 1 section (part of the Code section) of the target device as shown in Figure 2.
** Section #1 'ER_IROM1' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 338400 bytes (alignment 8)
Address: 0x08000000
- The code region starts with the Vector table and the first entry in the Vector
table corresponds to the main stack pointer. As explained in full details
below, the first thing that the ARM processor does upon starting is that it
fetches whatever is at the address
0x0000_0000
(or0x0800_0000
for this target), and it assumes that it is the Stack Pointer value. In our case, it corresponds to the address0x2408_0000
which is defined as being the pointer to the stack end (recall that the stack grows downwards, with a size of 1024 bytes in this case):
** Section #1 'ER_IROM1' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 338400 bytes (alignment 8)
Address: 0x08000000
$d.realdata
RESET
__Vectors
0x08000000: 24080000 ...$ DCD 604504064
...
** Section #5 'ARM_LIB_STACK' (SHT_NOBITS) [SHF_ALLOC + SHF_WRITE]
Size : 1024 bytes (alignment 4)
Address: 0x2407fc00
...
3091 Image$$ARM_LIB_STACK$$ZI$$Base
0x2407fc00 Gb Abs -- Hi
3092 Image$$ARM_LIB_STACK$$ZI$$Limit
0x24080000 Gb Abs -- Hi
- The next entry in the Vector table is the
Reset_Handler
, which is treated as a jump location for starting the program upon reset. For our BikeComputer program, you may notice that theReset_Handler
resides at0x0800_04a0
. You may also notice that the address0x0800_0004
has the address of reset handler0x08000004: 080004a1
. Actually the location0x0800_0004
contains the address0x0800_04a1
, instead of0x0800_04a0
. The LSB is ignored, and assumed as0
instead of1
, as the value of1
at LSB indicates a Thumb instruction type. So0x0800_04a1
will cause the processor to jump to0x0800_04a0
(address ofReset_Handler
).
$d.realdata
RESET
__Vectors
0x08000000: 24080000 ...$ DCD 604504064
0x08000004: 080004a1 .... DCD 134218913
...
.text
$v0
Reset_Handler
0x080004a0: 4806 .H LDR r0,[pc,#24] ; [0x80004bc] = 0x8008841
0x080004a2: 4780 .G BLX r0
0x080004a4: 4806 .H LDR r0,[pc,#24] ; [0x80004c0] = 0x8000299
0x080004a6: 4700 .G BX r0
- As you can see from the Elf file analysis, there is a lot of code which is surplus to your ‘main’ code. This surplus information includes the startup code and it is required to put the binary elf file into a format which the ARM architecture will be able to execute.
The startup code and boot sequence are explained in more detail in the next section.
Exercice Memory Profiling and Optimization/1
For this exercice, you need to:
-
Understand the memory map for the target device available in the reference manual, at page 136-137.
-
Open the image analysis document available in the analysis document.
-
Map the
AXI SRAM
region described in the reference manual with sections described in the analysis document.
Solution
- Region “AXI SRAM” is used by different sections.
** Section #2 'RW_IRAM1' (SHT_PROGBITS) [SHF_ALLOC + SHF_WRITE] Size : 92 bytes (alignment 4) Address: 0x24000000 ... ** Section #3 'RW_IRAM1' (SHT_NOBITS) [SHF_ALLOC + SHF_WRITE] Size : 16860 bytes (alignment 8) Address: 0x24000168 ... ** Section #4 'ARM_LIB_HEAP' (SHT_NOBITS) [SHF_ALLOC + SHF_WRITE] Size : 506032 bytes (alignment 4) Address: 0x24004350 ... ** Section #5 'ARM_LIB_STACK' (SHT_NOBITS) [SHF_ALLOC + SHF_WRITE] Size : 1024 bytes (alignment 4) Address: 0x2407fc00
The Boot Sequence and Memory Initialization
Upon reset, a startup code is executed by the Cortex-M processor. The startup code is specific to each platform and toolchain, but it usually consists of
- setting the initial SP,
- setting the initial PC to the
Reset_Handler
value, - setting the vector table entries with the exceptions ISR addresses, and
- branching to
__main
in the C library, which eventually calls themain()
function of your program.
Note that after Reset the Cortex-M processor is in “Thread” mode, priority is “Privileged”, and the Stack is set to Main.
Before the user main()
function is executed, the __main
startup function is
executed at the start of the binary executable. This function calls other
functions and is the real entry point of the user’s program. This __main
function is pre-defined (though the programmers can write their own __main
)
and it is different from the main()
function in the user’s C-program.
The __main
startup function calls the __rt_entry
function, which is defined in
the “mbed_boot_arm_std.c” file (located in the cmsis/device/rtos/TOOLCHAIN_ARM_STD
folder). This function initializes the stack and heap addresses, initializes and
starts Mbed OS - which ultimately calls your main()
function.
Load Address vs Execution Address
The BikeComputer program written above contains application code and data constants. When the compiled version of application code and data is put into the memory of a microcontroller, we may differentiate between regions for which the load address is its execution address, and those for which the addresses are different. The regions for which addresses are different requires relocation.
In a typical embedded system, all the program and data is stored in some non-volatile memory when the system is powered off. However, when the system is powered-on, some of the data or code may be moved into system SRAM (volatile memory), before it is executed (if code) or before it is used (if data).
As explained above, at link time, an image of the program is produced. This is the binary executable file which the system can execute. The binary image is typically divided into different segments that are either read-only (containing code and read-only data) or read-write regions (containing data, which can be initialized or zero initialized or uninitialized).
Usually the read-only segment is placed in non-volatile memory and does not have a requirement to be moved from where it is in the memory. We may say that it is executed from where it is, i.e. it is executed in place. To the contrary, the read-write segments must often be moved into the system’s fast read-write memory (e.g. SRAM) before the execution begins.
Hence for certain parts of the image, the memory location where that part resides when the system is powered off is the same when the system is powered on. But for certain parts of the image, the memory location where that part resides when the system is powered off is different to the memory location where that part is moved to when the system is powered on. So this code must be moved and relocated at startup. And in this case, we say that the load address and the execution address are different.
The linker will add the code into the program which the processor will execute for moving those parts of the code, which are required to be moved into the system’s SRAM at power-up. This relocation code is executed at startup.
In summary, the full sequence of the execution of the program may depend on the specific platform and toolchain, but it is always something like this:
- Stack Pointer SP is loaded from whatever the contents of the memory are at
0x0000_0000
(0x0800_0000
for our target) - Program Counter of the processor is loaded to the location of
Reset_Handler
, this location will be present at the memory location0x0000_0004
(0x0800_0004
for our target). Reset_Handler
is platform specific but it is mainly a jump to__main
. On our target, theReset_Handler
function calls theSystemInit
function (at address0x08008840
) and then the__main
function (at address0x08000298
).__main
first calls__scatterload
. The role of the__scatterload
function is the initialization of memory (__scatterload_null
), the initialization of ZI (Zero Initialization) regions to 0 (__scatterload_zero_init
) and the load of regions requiring relocation to execution addresses.__main
then calls__rt_entry
, as explained above.- The
main()
function from the user application is then called. On Mbed OS, themain()
function is executed in a thread called the main thread. __rt_lib_shutdown
is called when themain()
function exits - which usually never happens.
The startup and initialization steps are explained in all details in the following document. Note that these steps refer to Cortex-M processors using the ARM toolchain and these steps may be different when using another MCU or another toolchain.
Exercice Memory Profiling and Optimization/2
For this exercice, you need to:
-
Read the document explaining the c library startup and understand where the
__main
function is called from the bootup sequence in the elf file. -
Find the definition of the
__rt_entry()
function in the Mbed OS library. -
From the
__rt_entry()
code, understand the subsequent initialization steps, for instance how the heap is initialized. -
Understand where and how the call to your
main()
program function is made and how the stack for the main thread is set.
Solution
- The
__rt_entry()
is defined in the “mbed-os\cmsis\device\rtos\TOOLCHAIN_ARM_STD\mbed_boot_arm_std.c” file. - The
__rt_entry()
function initializes the stack and heap start pointers and sizes. It then callsmbed_init()
and ultimatelymbed_rtos_start
. - The
mbed_rtos_start()
function creates a thread named"main"
that is launched by executing thembed_start
function. In thembed_start
function, the usermain()
function is ultimately called. - The
mbed_rtos_start()
function ultimately calls theosKernelStart
function that launches the scheduler.
Static Memory Analysis Using memap
For understanding how the program image is structured and how the memory space is used, Mbed OS provides a simple utility tool called memap that displays static memory information required by any Mbed OS application. This information is produced by analyzing the memory map file previously generated by your toolchain. Memap is automatically run at the end of each build operation and you can read the result in the Output window as shown below:
Memap output
| Module | .text | .data | .bss |
|----------------------------------------|-------------|---------|-----------|
...
| advdembsof_library\display | 165494(+0) | 0(+0) | 528(+0) |
| advdembsof_library\sensors | 735(+0) | 0(+0) | 0(+0) |
| advdembsof_library\utils | 1077(+0) | 24(+0) | 0(+0) |
...
| common\sensor_device.o | 86(+0) | 0(+0) | 0(+0) |
| common\speedometer.o | 698(+0) | 0(+0) | 0(+0) |
| disco_h747i\CM7 | 52(+0) | 0(+0) | 0(+0) |
| disco_h747i\Drivers | 4801(+0) | 0(+0) | 585(+0) |
| disco_h747i\Wrappers | 77320(+0) | 0(+0) | 517(+0) |
| main.o | 111(+0) | 0(+0) | 0(+0) |
...
| multi_tasking\bike_system.o | 2371(+0) | 0(+0) | 0(+0) |
| multi_tasking\gear_device.o | 774(+0) | 0(+0) | 0(+0) |
| multi_tasking\pedal_device.o | 776(+0) | 0(+0) | 0(+0) |
| multi_tasking\reset_device.o | 46(+0) | 0(+0) | 0(+0) |
| Subtotals | 334292(-54) | 357(+0) | 17836(+0) |
Total Static RAM memory (data + bss): 18193(+0) bytes
Total Flash memory (text + data): 334649(-54) bytes
In the image above, the meaning of the .text
, .data
and .bss
sections is the
following:
.text
: is where the code application and constants are located in Flash..data
: nonzero initialized variables; allocated in both RAM and Flash memory (variables are copied from Flash to RAM at runtime)..bss
: uninitialized data allocated in RAM, or variables initialized to zero.
Note that in this view, the numbers in parentheses (e.g. (-54)
) indicate the
changes in sizes (number of bytes) since the last build. This is a very useful
tool for understanding the changes introduced in the code since the last build.
For a better understanding of what memory goes where, add the following code in your “main.cpp” file
...
const char szMsg[] = "This is a test message";
static constexpr uint8_t size = 10;
uint32_t randomArray[size] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
uint32_t randomNumber = 0;
...
int main() {
...
tr_info(szMsg);
for (uint8_t i = 0; i < size; i++) {
randomArray[i] = rand();
tr_info("This is a random number %d", randomArray[i]);
}
randomNumber = rand();
tr_info("This is a random number %d", randomNumber);
...
}
Exercice Memory Profiling and Optimization/3
Observe the change in the memory map for each individual change documented above, for the “main.o” object file. Look at how each .text, .data and .bss section is modified for each change and give an explanation.
Solution
szMsg
: the size of the.text
section grows by44
bytes. Both the additional call totr_info
and theszMsg
are allocated in the code section.randomArray
: the size of the.text
/.data
sections grows by81
/40
bytes. The additional code goes into the.text
region, whilerandomArray
goes into the.data
section (10 x int = 40 bytes) (nonzero initialized variables).randomNumber
: the size of the.text
/.bss
sections grows by61
/4
bytes. The additional code goes into the.text
region, whilerandomNumber
goes into the.bss
section (1 int = 4 bytes) (zero initialized variables).
For observing the changes in the program image for the code, we may perform an analysis of the “bike_computer.elf” file. If we do so, what we can observe is the following:
-
The constant string
szMsg
is stored in the constant data section of the program imageaddress size variable name type 0x0803fee4 0x17 szMsg array[23] of const char _ZL5szMsg 0x0803fee4: 73696854 This DCD 1936287828 0x0803fee8: 20736920 is DCD 544434464 0x0803feec: 65742061 a te DCD 1702109281 0x0803fef0: 6d207473 st m DCD 1830843507 0x0803fef4: 61737365 essa DCD 1634956133 0x0803fef8: 6567 ge DCW 25959 0x0803fefa: 00 . DCB 0
-
The random integer array
randomArray
is allocated in the initialized static data section (RAM section).address size variable name type 0x24000160 0x28 randomArray array[10] of uint32_t
The variable randomArray
is initialized with the values defined in the
“main.cpp” file at startup. This is done in the __scatterload
function that goes
through the region table and initializes the various execution-time
regions. As already mentioned, this function initializes the Zero Initialized
(ZI) regions to zero and copies or decompresses the non-root code and
data region from their load-time locations to the execute-time regions.
- The global variable
randomNumber
is allocated in the .bss section.address size variable name type 0x24003d88 0x4 randomNumber uint32_t
Run the memory map analyzer
You can also run the memory map analyzer at any time by running the command “python “MbedStudioInstallDir”\library-pipeline\mbed-os\tools\memap.py” with the appropriate set of parameters. While running the analyzer separately, you may choose the directory depth level for displaying the memory analysis report (by default 2).
Another more interactive way of displaying the memory map information is available through Linker-Report. This tool allows you to display the memory map information in a visual and interactive way as demonstrated on interactive memory map. Install the utility as documented on Linker-Report and build an interactive memory map of your BikeComputer program. It should look like the BikeComputer Interactive Map.
Reducing Memory Usage by Tuning the Mbed OS Configuration
Both flash memory and RAM sizes are limited on most microcontrollers. Reducing the memory footprint of an application can help you squeeze in more features or reduce cost. This can be done by replacing standard I/O calls with a smaller implementation.
For the printf
function and in particular if you are using a tracing library with
precompiler options, the easiest way of reducing the size of the binary is to
exclude all printf
calls in a release build. But, while debugging an
application, doing logging is an essential feature. In this case, switching to
versions of stdio
libraries with reduced footprint is a good alternative. You
can do this by changing the printf library in your application by modifying the
“mbed_app.json” file:
"target_overrides": {
"*": {
"target.printf_lib": "std"
}
}
"target_overrides": {
"*": {
"target.printf_lib": "minimal-printf"
}
}
printf
only when necessary:
"target_overrides": {
"*": {
"platform.minimal-printf-enable-floating-point": true,
"platform.minimal-printf-set-floating-point-max-decimals": 6,
}
}
The minimal-printf library supports both printf
and sprintf
in 1252 bytes of
flash. An interesting comparison of the size of the blinky program compiled with
different options is available on
minimal-printf.
The memory usage of an application can be further optimized by tuning the Mbed OS configuration to a specific application’s needs. If an application doesn’t need all the features of Mbed OS, the memory usage can be reduced by reducing the number of tasks, by decreasing the thread stack sizes or by disabling user timers. The Mbed OS configuration parameters can be modified in the “mbed_app.json”. The parameters available for configuration can be listed with the “mbed compile –config command -t ARMC6”. Note that you can also run the command for the “GCC_ARM” toolchain. If you run this command you should get an output similar to
Available configuration parameters
rtos-api.present = 1 (macro name: "MBED_CONF_RTOS_API_PRESENT")
rtos.evflags-num = 0 (macro name: "MBED_CONF_RTOS_EVFLAGS_NUM")
rtos.idle-thread-stack-size = 512 (macro name: "MBED_CONF_RTOS_IDLE_THREAD_STACK_SIZE")
rtos.idle-thread-stack-size-debug-extra = 128 (macro name: "MBED_CONF_RTOS_IDLE_THREAD_STACK_SIZE_DEBUG_EXTRA")
rtos.idle-thread-stack-size-tickless-extra = 256 (macro name: "MBED_CONF_RTOS_IDLE_THREAD_STACK_SIZE_TICKLESS_EXTRA")
rtos.main-thread-stack-size = 4096 (macro name: "MBED_CONF_RTOS_MAIN_THREAD_STACK_SIZE")
rtos.msgqueue-data-size = 0 (macro name: "MBED_CONF_RTOS_MSGQUEUE_DATA_SIZE")
rtos.msgqueue-num = 0 (macro name: "MBED_CONF_RTOS_MSGQUEUE_NUM")
rtos.mutex-num = 0 (macro name: "MBED_CONF_RTOS_MUTEX_NUM")
rtos.present = 1 (macro name: "MBED_CONF_RTOS_PRESENT")
rtos.semaphore-num = 0 (macro name: "MBED_CONF_RTOS_SEMAPHORE_NUM")
rtos.thread-num = 0 (macro name: "MBED_CONF_RTOS_THREAD_NUM")
rtos.thread-stack-size = 4096 (macro name: "MBED_CONF_RTOS_THREAD_STACK_SIZE")
rtos.thread-user-stack-size = 0 (macro name: "MBED_CONF_RTOS_THREAD_USER_STACK_SIZE")
rtos.timer-num = 0 (macro name: "MBED_CONF_RTOS_TIMER_NUM")
rtos.timer-thread-stack-size = 768 (macro name: "MBED_CONF_RTOS_TIMER_THREAD_STACK_SIZE")
All macros displayed above can be modified in the “mbed_app.json” for an optimized use of the Mbed OS configuration for a specific application. One may for instance reduce the user or main stack size.
Exercice Memory Profiling and Optimization/4
Compile your application for using the standard printf
library and the
minimal-printf
library, and compare the size of applications.
Solution
You should observer a reduction of the .text
region of approximately 3000-4000 bytes,
when compiling with minimal-printf
rather than std
. Other sections are not impacted.
Runtime Memory Tracing
Static memory analysis is required and powerful for analyzing how the program memory is organized at compile time. However, it is also very useful to analyze how an embedded software deals with dynamic memory allocations, both for the heap and stack memory. A program that behaves poorly in terms of dynamic memory allocations will become unstable and will potentially crash.
With Mbed OS, the developer can use memory statistics functions to capture heap use, cumulative stack use or stack use for each thread at runtime. To enable memory use monitoring, you must enable the following Mbed OS configuration options:
{
"target_overrides": {
"*": {
"platform.heap-stats-enabled": true,
"platform.stack-stats-enabled": true
}
}
}
Alternatively, you may also enable all Mbed OS stats at once:
{
"target_overrides": {
"*": {
"platform.all-stats-enabled": true
}
}
}
Once you enable memory statistics, you may instrument the code and do memory
checks at regular intervals or upon requests. This can be implemented with the help of the MemoryLogger
class
provided with the “advembsof” library.
MemoryLogger declaration
// Copyright 2022 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/****************************************************************************
* @file memory_logger.hpp
* @author Serge Ayer <serge.ayer@hefr.ch>
*
* @brief Memory logger header file
*
* @date 2023-08-20
* @version 1.0.0
***************************************************************************/
#pragma once
#include "mbed.h"
namespace advembsof {
#if defined(MBED_ALL_STATS_ENABLED)
class MemoryLogger {
public:
// methods used by owners
void getAndPrintStatistics();
void printDiffs();
void printRuntimeMemoryMap();
void getAndPrintHeapStatistics();
void getAndPrintStackStatistics();
void getAndPrintThreadStatistics();
private:
// data members
static constexpr uint8_t kMaxThreadInfo = 10;
mbed_stats_heap_t _heapInfo = {0};
mbed_stats_stack_t _stackInfo[kMaxThreadInfo] = {0};
mbed_stats_stack_t _globalStackInfo = {0};
mbed_stats_thread_t _threadInfo[kMaxThreadInfo] = {0};
};
#endif // MBED_ALL_STATS_ENABLED
} // namespace advembsof
MemoryLogger implementation
// Copyright 2022 Haute école d'ingénierie et d'architecture de Fribourg
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/****************************************************************************
* @file memory_logger.cpp
* @author Serge Ayer <serge.ayer@hefr.ch>
*
* @brief Memory logger implementation
*
* @date 2023-08-20
* @version 1.0.0
***************************************************************************/
#include "memory_logger.hpp"
#include "mbed_trace.h"
#if MBED_CONF_MBED_TRACE_ENABLE
#define TRACE_GROUP "MemoryLogger"
#endif // MBED_CONF_MBED_TRACE_ENABLE
#if defined(MBED_ALL_STATS_ENABLED)
extern unsigned char* mbed_stack_isr_start;
extern uint32_t mbed_stack_isr_size;
extern unsigned char* mbed_heap_start;
extern uint32_t mbed_heap_size;
#endif // MBED_ALL_STATS_ENABLED
namespace advembsof {
#if defined(MBED_ALL_STATS_ENABLED)
void MemoryLogger::printDiffs() {
{
tr_debug("MemoryStats (Heap):");
mbed_stats_heap_t heapInfo = {0};
mbed_stats_heap_get(&heapInfo);
uint32_t currentSizeDiff = heapInfo.current_size - _heapInfo.current_size;
if (currentSizeDiff > 0) {
tr_debug("\tBytes allocated increased by %" PRIu32 " to %" PRIu32 " bytes",
currentSizeDiff,
heapInfo.current_size);
}
uint32_t maxSizeDiff = heapInfo.max_size - _heapInfo.max_size;
if (maxSizeDiff > 0) {
tr_debug("\tMax bytes allocated at a given time increased by %" PRIu32
" to %" PRIu32 " bytes (max heap size is %" PRIu32 " bytes)",
maxSizeDiff,
heapInfo.max_size,
heapInfo.reserved_size);
}
_heapInfo = heapInfo;
}
{
mbed_stats_stack_t globalStackInfo = {0};
mbed_stats_stack_get(&globalStackInfo);
tr_debug("Cumulative Stack Info:");
uint32_t maxSizeDiff = globalStackInfo.max_size - _globalStackInfo.max_size;
if (maxSizeDiff > 0) {
tr_debug("\tMaximum number of bytes used on the stack increased by %" PRIu32
" to %" PRIu32 " bytes (stack size is %" PRIu32 " bytes)",
maxSizeDiff,
globalStackInfo.max_size,
globalStackInfo.reserved_size);
}
uint32_t stackCntDiff = globalStackInfo.stack_cnt - _globalStackInfo.stack_cnt;
if (stackCntDiff > 0) {
tr_debug("\tNumber of stacks stats accumulated increased by %" PRIu32
" to %" PRIu32 "",
stackCntDiff,
globalStackInfo.stack_cnt);
}
_globalStackInfo = globalStackInfo;
mbed_stats_stack_t stackInfo[kMaxThreadInfo] = {0};
mbed_stats_stack_get_each(stackInfo, kMaxThreadInfo);
tr_debug("Thread Stack Info:");
for (uint32_t i = 0; i < kMaxThreadInfo; i++) {
if (stackInfo[i].thread_id != 0) {
for (uint32_t j = 0; j < kMaxThreadInfo; j++) {
if (stackInfo[i].thread_id == _stackInfo[j].thread_id) {
maxSizeDiff = stackInfo[i].max_size - _stackInfo[j].max_size;
if (maxSizeDiff > 0) {
tr_debug("\tThread: %" PRIu32 "", j);
tr_debug(
"\t\tThread Id: 0x%08" PRIx32 " with name %s",
_stackInfo[j].thread_id,
osThreadGetName((osThreadId_t)_stackInfo[j].thread_id));
tr_debug(
"\t\tMaximum number of bytes used on the stack increased "
"by %" PRIu32 " to %" PRIu32
" bytes (stack size is %" PRIu32 " bytes)",
maxSizeDiff,
stackInfo[i].max_size,
stackInfo[i].reserved_size);
}
_stackInfo[j] = stackInfo[i];
}
}
}
}
}
}
void MemoryLogger::getAndPrintHeapStatistics() {
tr_debug("MemoryStats (Heap):");
mbed_stats_heap_get(&_heapInfo);
tr_debug("\tBytes allocated currently: %" PRIu32 "", _heapInfo.current_size);
tr_debug("\tMax bytes allocated at a given time: %" PRIu32 "", _heapInfo.max_size);
tr_debug("\tCumulative sum of bytes ever allocated: %" PRIu32 "",
_heapInfo.total_size);
tr_debug("\tCurrent number of bytes allocated for the heap: %" PRIu32 "",
_heapInfo.reserved_size);
tr_debug("\tCurrent number of allocations: %" PRIu32 "", _heapInfo.alloc_cnt);
tr_debug("\tNumber of failed allocations: %" PRIu32 "", _heapInfo.alloc_fail_cnt);
}
void MemoryLogger::getAndPrintStackStatistics() {
mbed_stats_stack_get(&_globalStackInfo);
tr_debug("Cumulative Stack Info:");
tr_debug("\tMaximum number of bytes used on the stack: %" PRIu32 "",
_globalStackInfo.max_size);
tr_debug("\tCurrent number of bytes allocated for the stack: %" PRIu32 "",
_globalStackInfo.reserved_size);
tr_debug("\tNumber of stacks stats accumulated in the structure: %" PRIu32 "",
_globalStackInfo.stack_cnt);
mbed_stats_stack_get_each(_stackInfo, kMaxThreadInfo);
tr_debug("Thread Stack Info:");
for (uint32_t i = 0; i < kMaxThreadInfo; i++) {
if (_stackInfo[i].thread_id != 0) {
tr_debug("\tThread: %" PRIu32 "", i);
tr_debug("\t\tThread Id: 0x%08" PRIx32 " with name %s",
_stackInfo[i].thread_id,
osThreadGetName((osThreadId_t)_stackInfo[i].thread_id));
tr_debug("\t\tMaximum number of bytes used on the stack: %" PRIu32 "",
_stackInfo[i].max_size);
tr_debug("\t\tCurrent number of bytes allocated for the stack: %" PRIu32 "",
_stackInfo[i].reserved_size);
tr_debug("\t\tNumber of stacks stats accumulated in the structure: %" PRIu32
"",
_stackInfo[i].stack_cnt);
}
}
}
void MemoryLogger::getAndPrintThreadStatistics() {
static const char* state[] = {"Ready", "Running", "Waiting"};
mbed_stats_thread_get_each(_threadInfo, kMaxThreadInfo);
tr_debug("Thread Info:");
for (uint32_t i = 0; i < kMaxThreadInfo; i++) {
if (_threadInfo[i].id != 0) {
tr_debug("\tThread: %" PRIu32 "", i);
tr_debug("\t\tThread Id: 0x%08" PRIx32 " with name %s, state %s, priority %" PRIu32 "",
_threadInfo[i].id,
_threadInfo[i].name,
state[_threadInfo[i].state - 1],
_threadInfo[i].priority);
tr_debug("\t\tStack size %" PRIu32 " (free bytes remaining %" PRIu32 ")",
_threadInfo[i].stack_size,
_threadInfo[i].stack_space);
}
}
}
void MemoryLogger::getAndPrintStatistics() {
getAndPrintHeapStatistics();
getAndPrintStackStatistics();
getAndPrintThreadStatistics();
}
void MemoryLogger::printRuntimeMemoryMap() {
// defined in rtx_thread.c
// uint32_t osThreadEnumerate (osThreadId_t *thread_array, uint32_t array_items)
tr_debug("Runtime Memory Map:");
osThreadId_t threadIdArray[kMaxThreadInfo] = {0};
uint32_t nbrOfThreads = osThreadEnumerate(threadIdArray, kMaxThreadInfo);
for (uint32_t threadIndex = 0; threadIndex < nbrOfThreads; threadIndex++) {
osRtxThread_t* pThreadCB =
// cppcheck-suppress cstyleCast
(osRtxThread_t*)threadIdArray[threadIndex]; // NOLINT(readability/casting)
uint8_t state = pThreadCB->state & osRtxThreadStateMask;
const char* szThreadState = (state & osThreadInactive) ? "Inactive"
: (state & osThreadReady) ? "Ready"
: (state & osThreadRunning) ? "Running"
: (state & osThreadBlocked) ? "Blocked"
: (state & osThreadTerminated) ? "Terminated"
: "Unknown";
tr_debug("\t thread with name %s, stack_start: %p, stack_end: %p, size: %" PRIu32
", priority: %" PRIu8 ", state: %s",
pThreadCB->name,
pThreadCB->stack_mem,
// cppcheck-suppress cstyleCast
(char*)pThreadCB->stack_mem + // NOLINT(readability/casting)
pThreadCB->stack_size,
pThreadCB->stack_size,
pThreadCB->priority,
szThreadState);
}
tr_debug("\t mbed_heap_start: %p, mbed_heap_end: %p, size: %" PRIu32 "",
mbed_heap_start,
(mbed_heap_start + mbed_heap_size),
mbed_heap_size);
tr_debug("\t mbed_stack_isr_start: %p, mbed_stack_isr_end: %p, size: %" PRIu32 "",
mbed_stack_isr_start,
(mbed_stack_isr_start + mbed_stack_isr_size),
mbed_stack_isr_size);
}
#endif // MBED_ALL_STATS_ENABLED
} // namespace advembsof
The BikeComputer
class can create a MemoryLogger
attribute for logging the
memory state of the program. It can call the
MemoryLogger::getAndPrintStatistics()
method in the BikeSystem::start()
method and then call the MemoryLogger::printDiffs()
methods at regular
intervals. By doing so, one should get the following output from the memory
logger at startup:
Memory logger: getAndPrintStatistics
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]: Bytes allocated currently: 8724
[DBG ][MemoryLogger]: Max bytes allocated at a given time: 8724
[DBG ][MemoryLogger]: Cumulative sum of bytes ever allocated: 8724
[DBG ][MemoryLogger]: Current number of bytes allocated for the heap: 505772
[DBG ][MemoryLogger]: Current number of allocations: 11
[DBG ][MemoryLogger]: Number of failed allocations: 0
[DBG ][MemoryLogger]: Cumulative Stack Info:
[DBG ][MemoryLogger]: Maximum number of bytes used on the stack: 3328
[DBG ][MemoryLogger]: Current number of bytes allocated for the stack: 13952
[DBG ][MemoryLogger]: Number of stacks stats accumulated in the structure: 4
[DBG ][MemoryLogger]: Thread Stack Info:
[DBG ][MemoryLogger]: Thread: 0
[DBG ][MemoryLogger]: Thread Id: 0x240036b8 with name main
[DBG ][MemoryLogger]: Maximum number of bytes used on the stack: 2648
[DBG ][MemoryLogger]: Current number of bytes allocated for the stack: 8192
[DBG ][MemoryLogger]: Number of stacks stats accumulated in the structure: 1
[DBG ][MemoryLogger]: Thread: 1
[DBG ][MemoryLogger]: Thread Id: 0x24003630 with name rtx_idle
[DBG ][MemoryLogger]: Maximum number of bytes used on the stack: 320
[DBG ][MemoryLogger]: Current number of bytes allocated for the stack: 896
[DBG ][MemoryLogger]: Number of stacks stats accumulated in the structure: 1
[DBG ][MemoryLogger]: Thread: 2
[DBG ][MemoryLogger]: Thread Id: 0x24003674 with name rtx_timer
[DBG ][MemoryLogger]: Maximum number of bytes used on the stack: 96
[DBG ][MemoryLogger]: Current number of bytes allocated for the stack: 768
[DBG ][MemoryLogger]: Number of stacks stats accumulated in the structure: 1
[DBG ][MemoryLogger]: Thread: 3
[DBG ][MemoryLogger]: Thread Id: 0x240026c0 with name deferredISRThread
[DBG ][MemoryLogger]: Maximum number of bytes used on the stack: 264
[DBG ][MemoryLogger]: Current number of bytes allocated for the stack: 4096
[DBG ][MemoryLogger]: Number of stacks stats accumulated in the structure: 1
...
and the following output when printing memory changes:
MemoryLogger: printDiffs
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]: Bytes allocated increased by 16 to 8740 bytes
[DBG ][MemoryLogger]: Max bytes allocated at a given time increased by 40 to 8764 bytes (max heap size is 505772 bytes)
[DBG ][MemoryLogger]: Cumulative Stack Info:
[DBG ][MemoryLogger]: Maximum number of bytes used on the stack increased by 280 to 3608 bytes (stack size is 13952 bytes)
[DBG ][MemoryLogger]: Thread Stack Info:
[DBG ][MemoryLogger]: Thread: 0
[DBG ][MemoryLogger]: Thread Id: 0x240036b8 with name main
[DBG ][MemoryLogger]: Maximum number of bytes used on the stack increased by 280 to 2928 bytes (stack size is 8192 bytes)
By performing a detailed dynamic memory analysis, it is then possible to optimize some parameters, such as reducing the allocated stack size for a given thread or optimizing the use of the heap.
One further possibility for getting runtime memory information and logging the
memory location of the heap and of the stack of each thread is to use the RTX
API directly. The code in the MemoryLogger::printRuntimeMemoryMap()
method
shows how to log some additional runtime memory information.
This static method uses the Thread Control Block (or TCB) structure
osRtxThread_t
defined in “rtx_os.h”. The TCB structure stores all
information about a thread that is used by the OS for switching the context from
one thread to another. If you execute this method from your BikeComputer
program, you should observe the following output on the console:
MemoryLogger: printRuntimeMemoryMap
[DBG ][MemoryLogger]: Runtime Memory Map:
[DBG ][MemoryLogger]: thread with name main, stack_start: 0x24000D28, stack_end: 0x24002D28, size: 8192, priority: 24, state: Running
[DBG ][MemoryLogger]: thread with name rtx_idle, stack_start: 0x24003700, stack_end: 0x24003A80, size: 896, priority: 1, state: Ready
[DBG ][MemoryLogger]: thread with name rtx_timer, stack_start: 0x24003A80, stack_end: 0x24003D80, size: 768, priority: 40, state: Ready
[DBG ][MemoryLogger]: thread with name deferredISRThread, stack_start: 0x24005730, stack_end: 0x24006730, size: 4096, priority: 24, state: Ready
[DBG ][MemoryLogger]: mbed_heap_start: 0x24004454, mbed_heap_end: 0x2407FC00, size: 505772
[DBG ][MemoryLogger]: mbed_stack_isr_start: 0x2407FC00, mbed_stack_isr_end: 0x24080000, size: 1024
Note that the method also prints the thread priorities and state. You can observe that the logging is executed from the main thread which is the active thread.
Exercice Memory Profiling and Optimization/5
Instrument the dynamic memory usage of your BikeComputer program with the
use of the MemoryLogger
class. Use both the MemoryLogger::getAndPrintStatistics
at startup and MemoryLogger::printDiffs
method at regular intervals.
After startup, you should observe that your program does not allocate any memory
on the heap and that the stack use is also not growing anymore.
By observing the statistics on the console, you should understand how the displayed values match your BikeComputer implementation (including the Mbed OS configuration such as the stack size of the different threads).
Hunting For Memory Bugs
Detecting a Heap Allocation Error (Memory Leak)
For illustrating analysis of the heap memory, one practical example is the introduction of a memory leak in the code. A memory leak is created when memory allocations are managed in such a way that memory which is NO longer needed is NOT released. For this purpose, you may add a call for allocating memory and not releasing it in a method called at regular intervals. Be aware that allocating memory without using it is not enough, since the compiler will optimize your code and remove unused statements (like allocating an array and only assigning values to the array elements).
If you create a memory leak by creating an instance of the class MemoryLeak
below
in one of the task method your BikeComputer program and let your program run,
you should observe that the allocated memory on the heap grows constantly and
ultimately you should observe a crash as illustrated in the log below:
MemoryLeak class
#pragma once
#include "mbed.h"
namespace multi_tasking {
class MemoryLeak {
public:
static constexpr uint16_t kArraySize = 1024;
// create a memory leak in the constructor itself
MemoryLeak() { _ptr = new int[kArraySize]; }
void use() {
for (uint16_t i = 0; i < kArraySize; i++) {
_ptr[i] = i;
}
}
private:
int* _ptr;
};
} // namespace multi_tasking
Console
++ MbedOS Error Info ++
Error Status: 0x8001011F Code: 287 Module: 1
Error Message: Operator new[] out of memory
Location: 0x800F025
File: mbed_retarget.cpp+1848
Error Value: 0x5000
Current Thread: main Id: 0x240035B0 Entry: 0x8013581 StackSize: 0x2000 StackMem: 0x24000C20 SP: 0x240022E4
Next:
main State: 0x2 Entry: 0x08013581 Stack Size: 0x00002000 Mem: 0x24000C20 SP: 0x240022C8
Ready:
rtx_idle State: 0x1 Entry: 0x080143A9 Stack Size: 0x00000380 Mem: 0x240035F8 SP: 0x24003928
Wait:
rtx_timer State: 0x83 Entry: 0x08015081 Stack Size: 0x00000300 Mem: 0x24003978 SP: 0x24003C18
Delay:
For more info, visit: https://mbed.com/s/error?error=0x8001011F&osver=61700&core=0x411FC271&comp=1&ver=6160001&tgt=DISCO_H747I
Note that for getting additional error information, you need to modify the Mbed OS configuration as illustrated below:
"target_overrides": {
"*": {
"platform.error-all-threads-info": 1,
"platform.error-filename-capture-enabled": 1
}
}
From the error log above, we can observe that the system cannot allocate a
specific object from the operator new()
called from the main thread. We also
know that the error happens at line 1848 of the “mbed_retarget.cpp” file.
Heap Fragmentation
A problem that is even more complex to detect is the problem of heap fragmentation. Heap fragmentation is a phenomenon that creates small fragments of memory in the heap space in a way that makes the largest available block of memory smaller and smaller as compared to the total available memory. The fragmentation level can be computed as a ratio between the largest available block of memory and the total available memory:
\(fragmentation = 1 - \frac{largest\ available\ block}{total\ available\ memory}\)
If the fragmentation is \(50\%\) and the available memory is 1 KiB, then the largest available block is 512 bytes. Fragmentation tends to increase over the lifetime of a program and on embedded systems running C++ programs, there is no way of defragmenting the heap. Over time, heap fragmentation tends to
- create unreliable programs: if your program needs a bigger block than the largest available one, it will not get it and will stop working
- and to degrade program performance: a highly fragmented heap is slower because the memory allocator takes more time to deliver a new allocated block.
These are very good reasons for using heap memory with care on embedded systems.
For illustrating the heap fragmentation phenomenon, you may create use the following
MemoryFragmenter
class in your BikeComputer program:
MemoryFragmenter class
#pragma once
#include "mbed.h"
#include "memory_logger.hpp"
namespace multi_tasking {
class MemoryFragmenter {
public:
// create a memory leak in the constructor itself
MemoryFragmenter() {}
void fragmentMemory() {
// create a memory logger
MemoryLogger memorLogger;
// get heap info
mbed_stats_heap_t heapInfo = {0};
mbed_stats_heap_get(&heapInfo);
uint32_t availableSize =
heapInfo.reserved_size - heapInfo.current_size - heapInfo.overhead_size;
tr_debug("Available heap size is %" PRIu32 " (reserved %" PRIu32 ")",
availableSize,
heapInfo.reserved_size);
// divide the available size by 8 blocks that we allocate
uint32_t blockSize = (availableSize - kMarginSpace) / kNbrOfBlocks;
tr_debug("Allocating blocks of size %" PRIu32 "", blockSize);
char* pBlockArray[kNbrOfBlocks] = {NULL};
for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex++) {
pBlockArray[blockIndex] = new char[blockSize];
if (pBlockArray[blockIndex] == NULL) {
tr_error("Cannot allocate block memory for index %" PRIu32 "",
blockIndex);
}
tr_debug("Allocated block index %" PRIu32 " of size %" PRIu32
" at address 0x%08" PRIx32 "",
blockIndex,
blockSize,
(uint32_t)pBlockArray[blockIndex]);
// copy to member variable to prevent them from being optimized away
for (uint32_t index = 0; index < kArraySize; index++) {
_doubleArray[index] += (double)pBlockArray[blockIndex][index];
}
}
// the full heap (or almost) should be allocated
tr_debug("Heap statistics after full allocation:");
memorLogger.getAndPrintHeapStatistics();
// delete only the even blocks
for (uint32_t blockIndex = 0; blockIndex < kNbrOfBlocks; blockIndex += 2) {
delete[] pBlockArray[blockIndex];
pBlockArray[blockIndex] = NULL;
}
// we should have half of the heap space free
tr_debug("Heap statistics after half deallocation:");
memorLogger.getAndPrintHeapStatistics();
// trying to allocated one block that is slightly bigger
// without fragmentation, this allocation should succeed
heapInfo = {0};
mbed_stats_heap_get(&heapInfo);
availableSize =
heapInfo.reserved_size - heapInfo.current_size - heapInfo.overhead_size;
tr_debug("Available heap size is %" PRIu32 " (reserved %" PRIu32 ")",
availableSize,
heapInfo.reserved_size);
blockSize += 8;
// this allocation will fail
tr_debug("Allocating 1 block of size %" PRIu32 " should succeed !", blockSize);
pBlockArray[0] = new char[blockSize];
// copy to member variable to prevent them from being optimized away
for (uint32_t index = 0; index < kArraySize; index++) {
_doubleArray[index] += (double)pBlockArray[0][index];
}
}
private:
static constexpr uint8_t kNbrOfBlocks = 8;
static constexpr uint16_t kMarginSpace = 1024;
static constexpr uint8_t kArraySize = 100;
double _doubleArray[kArraySize] = {0};
};
} // namespace multi_tasking
If you create an instance of this class in your BikeComputer program and call
the MemoryFragmenter::fragmentMemory()
method, you will observe an error on the
console similar to the one shown below:
Console
[DBG ][MemoryFragmenter]: Available heap size is 501308 (reserved 506044)
[DBG ][MemoryFragmenter]: Allocating blocks of size 62535
[DBG ][MemoryFragmenter]: Allocated block index 0 of size 62535 at address 0x240055f0
[DBG ][MemoryFragmenter]: Allocated block index 1 of size 62535 at address 0x24014a48
[DBG ][MemoryFragmenter]: Allocated block index 2 of size 62535 at address 0x24023ea0
[DBG ][MemoryFragmenter]: Allocated block index 3 of size 62535 at address 0x240332f8
[DBG ][MemoryFragmenter]: Allocated block index 4 of size 62535 at address 0x24042750
[DBG ][MemoryFragmenter]: Allocated block index 5 of size 62535 at address 0x24051ba8
[DBG ][MemoryFragmenter]: Allocated block index 6 of size 62535 at address 0x24061000
[DBG ][MemoryFragmenter]: Allocated block index 7 of size 62535 at address 0x24070458
[DBG ][MemoryFragmenter]: Heap statistics after full allocation:
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]: Bytes allocated currently: 504892
[DBG ][MemoryLogger]: Max bytes allocated at a given time: 504892
[DBG ][MemoryLogger]: Cumulative sum of bytes ever allocated: 504892
[DBG ][MemoryLogger]: Current number of bytes allocated for the heap: 506044
[DBG ][MemoryLogger]: Current number of allocations: 16
[DBG ][MemoryLogger]: Number of failed allocations: 0
[DBG ][MemoryFragmenter]: Heap statistics after half deallocation:
[DBG ][MemoryLogger]: MemoryStats (Heap):
[DBG ][MemoryLogger]: Bytes allocated currently: 254752
[DBG ][MemoryLogger]: Max bytes allocated at a given time: 504892
[DBG ][MemoryLogger]: Cumulative sum of bytes ever allocated: 504892
[DBG ][MemoryLogger]: Current number of bytes allocated for the heap: 506044
[DBG ][MemoryLogger]: Current number of allocations: 12
[DBG ][MemoryLogger]: Number of failed allocations: 0
[DBG ][MemoryFragmenter]: Available heap size is 251100 (reserved 506044)
[DBG ][MemoryFragmenter]: Allocating 1 block of size 62543 should succeed !
++ MbedOS Error Info ++
Error Status: 0x8001011F Code: 287 Module: 1
Error Message: Operator new[] out of memory
As you can observe, while the available heap size is 251100
bytes, an
allocation of 62543
bytes fails with an out of memory
error.
For minimizing the type of problems illustrated above, it is often recommended to apply the following guidelines on embedded systems:
- Privilege the use of static allocation vs. dynamic allocation whenever possible.
- Privilege the use of automatic allocation (stack) when feasible: allocation on the stack is almost free, but in this case, care must be given to stack overflow errors.
- Use private, application specific memory pools for providing buffers of fixed size to an application (see Mbed OS Memory Pool). This prevents multiple allocation of buffers from the heap. Note that this mechanism is implemented for instance in the Mbed OS Mail API that implements a queuing mechanism for exchanging messages providing a memory pool for allocating the messages.
Detecting a Stack Overflow Error
By using the memory tracing functionalities demonstrated above, we may know which threads are running and the memory space that they are using. This is very useful information for optimizing memory usage for each thread. This is also useful for debugging stack overflow errors.
Stack overflow may happen in very different situations. For understanding how to detect such errors, it is of course easier to simulate one such error. For this purpose, you may add a code allocating more and more memory on the stack in a thread running a loop. An example of such a code is given below:
MemoryStackOverflow class
#pragma once
#include <cstdint>
#include "mbed.h"
namespace multi_tasking {
class MemoryStackOverflow {
public:
void allocateOnStack() {
// allocate an array with growing size until it does not fit on the stack anymore
size_t allocSize = kArraySize * _multiplier;
// Create a variable-size object on the stack
double anotherArray[allocSize];
for (size_t i = 0; i < allocSize; i++) {
anotherArray[i] = i;
}
// copy to member variable to prevent them from being optimized away
for (size_t i = 0; i < kArraySize; i++) {
_doubleArray[i] += anotherArray[i];
}
_multiplier++;
}
private:
static constexpr size_t kArraySize = 40;
double _doubleArray[kArraySize] = {0};
size_t _multiplier = 1;
};
} // namespace multi_tasking
If you call the MemoryLogger::printDiffs()
method at regular intervals, you
will observe that the maximum number of bytes used on the stack of the thread
using the MemoryStackOverflow
continuously increases. Once the stack overflow
happens, you may experience different types of errors, including an application
crash or an application running “crazy”. The reason is that the stack gets
corrupted and that no stack corruption protection is implemented in the
application.
For improving stack corruption check, you may modify the Mbed OS configuration in the “mbed_app.json” file as follows:
"macros": [
...
"RTX_STACK_CHECK=1"
],
If you recompile your application and run with RTX_STACK_CHECK=1
, then you
should get the following error on the console:
Error log
++ MbedOS Error Info ++
Error Status: 0x80020125 Code: 293 Module: 2
Error Message: CMSIS-RTOS error: Stack overflow
Location: 0x8014291
File: mbed_rtx_handlers.c+60
Error Value: 0x1
Current Thread: rtx_idle Id: 0x24003670 Entry: 0x8014411 StackSize: 0x380 StackMem: 0x24003740 SP: 0x2407FF1C
Next:
rtx_idle State: 0x2 Entry: 0x08014411 Stack Size: 0x00000380 Mem: 0x24003740 SP: 0x24003A70
Ready:
Wait:
rtx_timer State: 0x83 Entry: 0x080150E9 Stack Size: 0x00000300 Mem: 0x24003AC0 SP: 0x24003D60
Delay:
main State: 0x43 Entry: 0x080135E9 Stack Size: 0x00002000 Mem: 0x24000D68 SP: 0x24002410
For more info, visit: https://mbed.com/s/error?error=0x80020125&osver=61700&core=0x411FC271&comp=1&ver=6160001&tgt=DISCO_H747I
-- MbedOS Error Info --
Unfortunately, the log error does not always indicate a stack overflow. There are situations where the RTX stack check mechanism is not able to detect stack corruption, in which case the application ultimately crashes with a generic fault exception.
Exercice Memory Profiling and Optimization/6
Try to figure out how and where the stack overflow detection is implemented in the RTX OS implementation.
Solution
The check is implemented in the “mbed-os/cmsis/CMSIS_5/RTOS2/RTX/Source/rtx_thread.c” file with the
osRtxThreadStackCheck()
function. The function basically checks that whether
the current stack pointer is beyond the stack memory or whether the value at the top of the stack still contains the
stack magic word (initialized at thread creation). The function is called from the SVC_ContextSaveSP
assembly function
(when `RTX_STACK_CHECK != 0)
Exercice Memory Profiling and Optimization/6
Find and implement another very common way of creating a stack overflow in your BikeComputer program.
Solution
It is as simple as creating a infinite recursive call on a given thread. If you do so for instance in
the ProcessingThread
thread (with a call to ThisThread::sleep_for()
between recursive calls),
then you will get a StackOverflow
error after a few seconds.