A bare-metal UART bootloader built entirely from scratch — no HAL, no RTOS, no external libraries.
In production embedded systems, updating firmware without physical hardware access is a critical requirement. A bootloader makes that possible.
The STM32F411RE has 512 KB of flash split into 8 sectors of varying sizes. The bootloader occupies the first 48 KB — the rest belongs to the application.
Permanent code that runs first on every power-on. Confined to 48 KB. The linker script enforces this boundary — overflow is a build error, not a runtime bug.
A single 4-byte word at 0x08008000. When set to 0xDEADBEEF, the bootloader receives and flashes a new image on next power-on. Erased only after a successful update.
Everything from 0x0800C000 onwards. After a successful flash, the bootloader jumps here by loading the app's stack pointer and calling its reset handler.
On every power-on, the CPU hardware reads the first two words of flash and jumps to the reset handler. From that point, the bootloader takes control.
CPU reads 0x08000000 → loads stack pointer. Reads 0x08000004 → jumps to Reset_Handler. This is a hardware-level rule on all Cortex-M processors.
Copies initialized globals from Flash to RAM (.data section). Zeroes uninitialized globals (.bss section). Then calls main().
Configures USART2 on PA2/PA3 at 115200 baud via direct register access. Initialises SysTick at 1 kHz to provide millisecond timeouts for all UART receives.
Reads the word at 0x08008000. If it equals 0xDEADBEEF, an update has been requested.
Runs the UART update protocol. Receives image, verifies CRC, erases sectors 3–7, writes and verifies, clears boot flag, sends ACK.
Validates app stack pointer. Sets VTOR to redirect interrupts. Loads MSP register and branches to the app's reset handler via inline assembly.
A simple, robust wire protocol. The entire image is buffered in RAM and CRC-verified before a single flash sector is touched.
Every UART receive has a 5-second timeout via SysTick. The bootloader never hangs if the host disconnects.
The full image is buffered in RAM. CRC is verified before flash is touched — no partial writes possible.
Boot flag erased only after successful write + readback verify. Power failure during flash → retry on next boot.
Every driver written by hand — no HAL, no abstraction layers. Pure register-level C.
void Reset_Handler(void) {
// 1. Copy initialized globals: Flash → RAM
// e.g. int x = 5; ← stored in Flash, must live in writable RAM
uint32_t len = (uint32_t)&_edata - (uint32_t)&_sdata;
memcpy(&_sdata, &_sidata, len);
// 2. Zero uninitialized globals (C guarantees this)
// RAM is random garbage at power-on — we enforce it manually
uint32_t bss = (uint32_t)&_ebss - (uint32_t)&_sbss;
memset(&_sbss, 0, bss);
main();
for (;;) {} // never return from main in embedded
}
// The vector table must sit at 0x08000000 (flash origin)
// Entry 0 = initial stack pointer, Entry 1 = Reset_Handler
__attribute__((section(".isr_vector"), used))
static void *const vector_table[] = {
(void *)&_estack, // initial SP — top of 2 KB bootloader stack
Reset_Handler, // ← CPU jumps here at power-on
Default_Handler, // NMI
Default_Handler, // HardFault
// ... remaining entries ...
SysTick_Handler, // 1 ms tick for UART timeouts
};
static void jump_to_app(void) {
// Read the first word of the app — its initial stack pointer
uint32_t sp = flash_read_word(APP_START_ADDR); // 0x0800C000
// Sanity check: must be a legal SRAM address (0x20xxxxxx)
// If no app is flashed, flash reads 0xFFFFFFFF → reject
if ((sp & 0x2FF00000UL) != 0x20000000UL) {
uart_send_str("No valid app — staying in bootloader\r\n");
return;
}
uart_send_str("Jumping to app...\r\n");
// Redirect the CPU's interrupt vector table to the app
// Without this, app interrupts would call bootloader handlers
SCB_VTOR = APP_START_ADDR;
// Read app's reset handler (second word of its vector table)
const VectorTable *v = (const VectorTable *)APP_START_ADDR;
FuncPtr app_reset = v->reset_handler;
// Must use assembly — C cannot set MSP without corrupting its own stack
__asm volatile(
"MSR MSP, %0 \n" // load app's initial stack pointer
"BX %1 \n" // branch to app's Reset_Handler (no return)
: : "r"(sp), "r"(app_reset) :
);
}
FlashStatus flash_unlock(void) {
if (!(FLASH_CR & FLASH_CR_LOCK))
return FLASH_OK; // already unlocked
// Two magic keys must be written in sequence — hardware protection
FLASH_KEYR = 0x45670123UL;
FLASH_KEYR = 0xCDEF89ABUL;
return (FLASH_CR & FLASH_CR_LOCK) ? FLASH_ERROR : FLASH_OK;
}
FlashStatus flash_erase_sector(uint8_t sector) {
FlashStatus s = wait_for_not_busy();
if (s != FLASH_OK) return s;
// Flash bits can only go 1→0 by writing.
// To reset back to 1 you must erase an entire sector (→ all 0xFF)
FLASH_CR &= ~(0xFUL << FLASH_CR_SNB_Pos);
FLASH_CR |= FLASH_CR_SER | ((uint32_t)sector << FLASH_CR_SNB_Pos) | PSIZE_WORD;
FLASH_CR |= FLASH_CR_STRT; // start erase
return wait_for_not_busy();
}
FlashStatus flash_write_word(uint32_t addr, uint32_t data) {
FlashStatus s = wait_for_not_busy();
if (s != FLASH_OK) return s;
FLASH_CR |= FLASH_CR_PG | PSIZE_WORD; // enable programming
*(volatile uint32_t *)addr = data; // write 4 bytes
s = wait_for_not_busy();
FLASH_CR &= ~FLASH_CR_PG;
// Verify immediately — confirm the write actually stuck
if (*(volatile uint32_t *)addr != data)
return FLASH_ERROR;
return s;
}
void uart_init(uint32_t baud) {
systick_init(); // 1 kHz tick for receive timeouts
// Enable peripheral clocks (GPIOA + USART2)
RCC_AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
RCC_APB1ENR |= RCC_APB1ENR_USART2EN;
// PA2, PA3 → Alternate Function 7 (USART2)
GPIOA_MODER &= ~((3UL << 4) | (3UL << 6));
GPIOA_MODER |= ((2UL << 4) | (2UL << 6)); // AF mode
GPIOA_AFRL |= ((7UL << 8) | (7UL << 12)); // AF7 = USART2
// Baud rate: BRR = F_CLK / baud (16 MHz HSI, no PLL)
USART2_BRR = 16000000UL / baud;
USART2_CR1 = USART_CR1_TE | USART_CR1_RE | USART_CR1_UE;
}
// Receive with timeout — never blocks forever
bool uart_recv_timeout(uint8_t *out, uint32_t timeout_ms) {
uint32_t start = systick_ms;
// Poll RXNE (Receive Not Empty) bit in status register
while (!(USART2_SR & USART_SR_RXNE)) {
if ((systick_ms - start) >= timeout_ms)
return false; // timed out — host disconnected?
}
*out = (uint8_t)USART2_DR;
return true;
}
Each decision was made deliberately — not defaults, not framework choices.
Every peripheral is controlled via direct register access. The HAL would add 10–50 KB of overhead and hide what is actually happening. A bootloader must be small, fast, and dependency-free.
The boot flag in sector 2 is erased only after a successful write and readback verify. If power is cut during flashing, the flag survives and the bootloader retries on the next boot. The device never bricks.
The entire firmware image is received into RAM and CRC-verified before a single flash sector is erased. Corrupt transfer → nothing is written. Prevents partial or corrupted firmware.
Every UART receive call has a 5-second timeout via SysTick. If the host crashes or disconnects mid-transfer, the bootloader recovers gracefully rather than hanging indefinitely.
After writing the full image, every byte is read back and compared against the RAM buffer. Flash write failures are caught before the boot flag is cleared — the source of truth is the hardware, not assumptions.
The custom linker script confines the bootloader to 48 KB (sectors 0–2). If the code ever grows beyond that, the linker errors at build time. The boundary is enforced mechanically, not by discipline.