Embedded Systems Project

STM32F411RE Bootloader

A bare-metal UART bootloader built entirely from scratch — no HAL, no RTOS, no external libraries.

Size: 48 KB
Protocol: UART 115200
Integrity: CRC-32
Target: Cortex-M4
HAL: None
GitHub Learn How It Works ↓

Why a Bootloader?

In production embedded systems, updating firmware without physical hardware access is a critical requirement. A bootloader makes that possible.

✗ Without a Bootloader

  • Physical debugger required (ST-Link / SWD)
  • Hardware access needed for every firmware update
  • Impossible to update deployed or remote devices
  • Development workflow requires constant re-flashing

✓ With This Bootloader

  • Update firmware over a simple UART serial cable
  • No debugger hardware required at all
  • Fully field-updatable — works remotely
  • CRC32 integrity check protects against corrupt images
  • Atomic update: power failure during flash is safe

Flash Memory Layout

The STM32F411RE has 512 KB of flash split into 8 sectors of varying sizes. The bootloader occupies the first 48 KB — the rest belongs to the application.

Sectors 4 – 7 512 KB
Application
Sector 3 64 KB
App start — 0x0800C000
Sector 2 16 KB
Boot flag — 0x08008000
Sector 1 16 KB
Bootloader — 0x08004000
Sector 0 16 KB
Reset vector — 0x08000000

Bootloader (Sectors 0–2)

Permanent code that runs first on every power-on. Confined to 48 KB. The linker script enforces this boundary — overflow is a build error, not a runtime bug.

Boot Flag (Sector 2)

A single 4-byte word at 0x08008000. When set to 0xDEADBEEF, the bootloader receives and flashes a new image on next power-on. Erased only after a successful update.

Application (Sectors 3–7)

Everything from 0x0800C000 onwards. After a successful flash, the bootloader jumps here by loading the app's stack pointer and calling its reset handler.

Boot Sequence

On every power-on, the CPU hardware reads the first two words of flash and jumps to the reset handler. From that point, the bootloader takes control.

1

Power-On Reset

CPU reads 0x08000000 → loads stack pointer. Reads 0x08000004 → jumps to Reset_Handler. This is a hardware-level rule on all Cortex-M processors.

2

startup.c — Before main()

Copies initialized globals from Flash to RAM (.data section). Zeroes uninitialized globals (.bss section). Then calls main().

3

UART + SysTick Init

Configures USART2 on PA2/PA3 at 115200 baud via direct register access. Initialises SysTick at 1 kHz to provide millisecond timeouts for all UART receives.

4

Check Boot Flag

Reads the word at 0x08008000. If it equals 0xDEADBEEF, an update has been requested.

if 0xDEADBEEF  ↙     always ↘
5a

Receive & Flash

Runs the UART update protocol. Receives image, verifies CRC, erases sectors 3–7, writes and verifies, clears boot flag, sends ACK.

5b

Jump to App

Validates app stack pointer. Sets VTOR to redirect interrupts. Loads MSP register and branches to the app's reset handler via inline assembly.

UART Update Protocol

A simple, robust wire protocol. The entire image is buffered in RAM and CRC-verified before a single flash sector is touched.

Host PC
0xA5 (Start of Frame)
4 bytes: image size
"RDY\r\n"
N bytes: firmware image
4 bytes: CRC32
Compute CRC32 · Compare · Unlock → Erase → Write → Verify · Clear boot flag
ACK 0x06  /  NAK 0x15
STM32
5 s timeout

Every UART receive has a 5-second timeout via SysTick. The bootloader never hangs if the host disconnects.

128 KB RAM buffer

The full image is buffered in RAM. CRC is verified before flash is touched — no partial writes possible.

Atomic update

Boot flag erased only after successful write + readback verify. Power failure during flash → retry on next boot.

Key Code

Every driver written by hand — no HAL, no abstraction layers. Pure register-level C.

The very first code that runs — before main(). Copies globals from Flash to RAM, zeroes BSS, defines the interrupt vector table.
void Reset_Handler(void) {
    // 1. Copy initialized globals: Flash → RAM
    //    e.g. int x = 5;  ← stored in Flash, must live in writable RAM
    uint32_t len = (uint32_t)&_edata - (uint32_t)&_sdata;
    memcpy(&_sdata, &_sidata, len);

    // 2. Zero uninitialized globals (C guarantees this)
    //    RAM is random garbage at power-on — we enforce it manually
    uint32_t bss = (uint32_t)&_ebss - (uint32_t)&_sbss;
    memset(&_sbss, 0, bss);

    main();
    for (;;) {}  // never return from main in embedded
}

// The vector table must sit at 0x08000000 (flash origin)
// Entry 0 = initial stack pointer, Entry 1 = Reset_Handler
__attribute__((section(".isr_vector"), used))
static void *const vector_table[] = {
    (void *)&_estack,   // initial SP — top of 2 KB bootloader stack
    Reset_Handler,      // ← CPU jumps here at power-on
    Default_Handler,    // NMI
    Default_Handler,    // HardFault
    // ... remaining entries ...
    SysTick_Handler,    // 1 ms tick for UART timeouts
};
Hands control to the application. Sets VTOR so interrupts resolve against the app's vector table, then uses inline assembly to set MSP and branch.
static void jump_to_app(void) {
    // Read the first word of the app — its initial stack pointer
    uint32_t sp = flash_read_word(APP_START_ADDR);  // 0x0800C000

    // Sanity check: must be a legal SRAM address (0x20xxxxxx)
    // If no app is flashed, flash reads 0xFFFFFFFF → reject
    if ((sp & 0x2FF00000UL) != 0x20000000UL) {
        uart_send_str("No valid app — staying in bootloader\r\n");
        return;
    }

    uart_send_str("Jumping to app...\r\n");

    // Redirect the CPU's interrupt vector table to the app
    // Without this, app interrupts would call bootloader handlers
    SCB_VTOR = APP_START_ADDR;

    // Read app's reset handler (second word of its vector table)
    const VectorTable *v = (const VectorTable *)APP_START_ADDR;
    FuncPtr app_reset = v->reset_handler;

    // Must use assembly — C cannot set MSP without corrupting its own stack
    __asm volatile(
        "MSR MSP, %0  \n"   // load app's initial stack pointer
        "BX  %1       \n"   // branch to app's Reset_Handler (no return)
        : : "r"(sp), "r"(app_reset) :
    );
}
Direct register control of the flash controller. Flash has strict rules: unlock with magic keys → erase sector → write words → lock. No shortcutting.
FlashStatus flash_unlock(void) {
    if (!(FLASH_CR & FLASH_CR_LOCK))
        return FLASH_OK;  // already unlocked
    // Two magic keys must be written in sequence — hardware protection
    FLASH_KEYR = 0x45670123UL;
    FLASH_KEYR = 0xCDEF89ABUL;
    return (FLASH_CR & FLASH_CR_LOCK) ? FLASH_ERROR : FLASH_OK;
}

FlashStatus flash_erase_sector(uint8_t sector) {
    FlashStatus s = wait_for_not_busy();
    if (s != FLASH_OK) return s;

    // Flash bits can only go 1→0 by writing.
    // To reset back to 1 you must erase an entire sector (→ all 0xFF)
    FLASH_CR &= ~(0xFUL << FLASH_CR_SNB_Pos);
    FLASH_CR |= FLASH_CR_SER | ((uint32_t)sector << FLASH_CR_SNB_Pos) | PSIZE_WORD;
    FLASH_CR |= FLASH_CR_STRT;   // start erase
    return wait_for_not_busy();
}

FlashStatus flash_write_word(uint32_t addr, uint32_t data) {
    FlashStatus s = wait_for_not_busy();
    if (s != FLASH_OK) return s;

    FLASH_CR |= FLASH_CR_PG | PSIZE_WORD;  // enable programming
    *(volatile uint32_t *)addr = data;      // write 4 bytes
    s = wait_for_not_busy();
    FLASH_CR &= ~FLASH_CR_PG;

    // Verify immediately — confirm the write actually stuck
    if (*(volatile uint32_t *)addr != data)
        return FLASH_ERROR;
    return s;
}
USART2 driver — PA2 (TX), PA3 (RX). Configured entirely through memory-mapped registers. SysTick provides millisecond timeouts for non-blocking receives.
void uart_init(uint32_t baud) {
    systick_init();  // 1 kHz tick for receive timeouts

    // Enable peripheral clocks (GPIOA + USART2)
    RCC_AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
    RCC_APB1ENR |= RCC_APB1ENR_USART2EN;

    // PA2, PA3 → Alternate Function 7 (USART2)
    GPIOA_MODER &= ~((3UL << 4) | (3UL << 6));
    GPIOA_MODER |=  ((2UL << 4) | (2UL << 6));   // AF mode
    GPIOA_AFRL  |=  ((7UL << 8) | (7UL << 12));  // AF7 = USART2

    // Baud rate: BRR = F_CLK / baud (16 MHz HSI, no PLL)
    USART2_BRR = 16000000UL / baud;

    USART2_CR1 = USART_CR1_TE | USART_CR1_RE | USART_CR1_UE;
}

// Receive with timeout — never blocks forever
bool uart_recv_timeout(uint8_t *out, uint32_t timeout_ms) {
    uint32_t start = systick_ms;
    // Poll RXNE (Receive Not Empty) bit in status register
    while (!(USART2_SR & USART_SR_RXNE)) {
        if ((systick_ms - start) >= timeout_ms)
            return false;  // timed out — host disconnected?
    }
    *out = (uint8_t)USART2_DR;
    return true;
}

Key Design Decisions

Each decision was made deliberately — not defaults, not framework choices.

🔩

No HAL

Every peripheral is controlled via direct register access. The HAL would add 10–50 KB of overhead and hide what is actually happening. A bootloader must be small, fast, and dependency-free.

Atomic Update

The boot flag in sector 2 is erased only after a successful write and readback verify. If power is cut during flashing, the flag survives and the bootloader retries on the next boot. The device never bricks.

📦

Buffer First, Flash Second

The entire firmware image is received into RAM and CRC-verified before a single flash sector is erased. Corrupt transfer → nothing is written. Prevents partial or corrupted firmware.

⏱️

Timeout on Every Byte

Every UART receive call has a 5-second timeout via SysTick. If the host crashes or disconnects mid-transfer, the bootloader recovers gracefully rather than hanging indefinitely.

🔁

Readback Verify

After writing the full image, every byte is read back and compared against the RAM buffer. Flash write failures are caught before the boot flag is cleared — the source of truth is the hardware, not assumptions.

📐

Linker-Enforced Boundaries

The custom linker script confines the bootloader to 48 KB (sectors 0–2). If the code ever grows beyond that, the linker errors at build time. The boundary is enforced mechanically, not by discipline.