ARM Cortex-M Interview Questions | Embedded Systems

Interview questions & answers

Q1. What is the ARM Cortex-M architecture and how does it differ from the Cortex-A series?

ARM Cortex-M is a family of 32-bit RISC processor cores optimized for microcontroller applications — deterministic interrupt latency, low power, and tight memory footprint — while Cortex-A is designed for application processors running rich OS like Linux with MMU, caches, and out-of-order execution. An STM32H743 uses a Cortex-M7 core at 480 MHz with 16 KB I-cache and D-cache, achieving 1027 DMIPS in an MCU package, while a Cortex-A55 in a phone SoC runs at 1.8–2 GHz but requires an OS, DDR, and 100× more power. The Cortex-M architecture's 12-cycle worst-case interrupt latency is a hard real-time guarantee that Cortex-A cannot provide due to cache and branch prediction unpredictability.

Follow-up: What is the Cortex-M hierarchy from M0 to M7 and what features distinguish each variant?

Q2. What is the NVIC in ARM Cortex-M and how does interrupt priority work?

The NVIC (Nested Vectored Interrupt Controller) is an integral part of the Cortex-M core that handles interrupt reception, priority comparison, preemption, and tail-chaining without software overhead, using a vector table of function pointers stored in flash starting at address 0x00000000 or relocated via VTOR. An STM32F4 NVIC supports up to 82 external interrupts with 4-bit priority (16 levels), where lower numeric priority means higher urgency — priority 0 preempts priority 15. Priority grouping splits the 4-bit field into preemption priority and sub-priority, allowing fine-grained control over nested interrupts needed in RTOS implementations.

Follow-up: What is the difference between preemption priority and sub-priority in the NVIC?

Q3. What registers are automatically saved to the stack by Cortex-M when an interrupt occurs?

The Cortex-M hardware automatically pushes R0, R1, R2, R3, R12, LR (link register), PC, and xPSR onto the stack when an interrupt is accepted, forming the exception frame that allows the ISR to execute as a normal C function and automatically restores state on exit. On a Cortex-M4F with FPU, if the FPCCR.LSPEN bit is set, the optional floating-point registers S0–S15 and FPSCR are lazily stacked only if the ISR uses the FPU. The 8-word automatic save takes 12 cycles on a Cortex-M3/M4 with zero-wait flash, which is the guaranteed interrupt entry latency that makes Cortex-M deterministic for hard real-time systems.

Follow-up: What is lazy stacking for floating-point registers and how does it reduce interrupt latency?

Q4. What is the difference between MSP and PSP in ARM Cortex-M?

MSP (Main Stack Pointer) is used in Handler mode (interrupts and exceptions) and optionally in Thread mode (main application), while PSP (Process Stack Pointer) is used in Thread mode when an RTOS is running, allowing each task to have its own stack separate from the kernel stack. FreeRTOS on STM32 initializes each task with its own PSP stack frame; the SVC call switches to PSP on the first task start, and the kernel uses MSP for the PendSV and SysTick ISRs that handle context switching. Separating stacks prevents a runaway task from corrupting the kernel stack and is mandatory for any MPU-protected RTOS implementation.

Follow-up: How does an RTOS use the PendSV exception for context switching on Cortex-M?

Q5. What is the Cortex-M memory map and what is at each major address range?

The Cortex-M has a fixed 4 GB address space: 0x00000000–0x1FFFFFFF (code region — typically flash), 0x20000000–0x3FFFFFFF (SRAM), 0x40000000–0x5FFFFFFF (peripheral registers), 0x60000000–0x9FFFFFFF (external RAM), 0xA0000000–0xDFFFFFFF (external devices), and 0xE0000000–0xFFFFFFFF (system region including NVIC, SysTick, ITM, and core debug). On an STM32F407, flash is at 0x08000000 (mapped to 0x00000000 via boot pins), SRAM1 at 0x20000000, and the APB/AHB peripheral base at 0x40000000 with each peripheral mapped at a fixed 0x400-byte offset within that range. Understanding this map is essential for writing bare-metal startup code and memory-mapped peripheral drivers.

Follow-up: What is the bit-band region in Cortex-M3/M4 and how does it enable atomic bit manipulation?

Q6. What are the low power modes available in Cortex-M and how are they used?

Cortex-M supports Sleep mode (halts CPU, peripherals continue, woken by interrupt), Deep Sleep (halts CPU and clocks to most peripherals, state preserved, wake requires specific wakeup sources), and Stop/Standby (further clock/regulator gating, only wakeup pins active on STM32). An STM32L476 in Stop 2 mode draws only 0.5 µA with RTC running and LSE oscillator active, waking on RTC alarm or EXTI pin in about 5 µs. The __WFI() and __WFE() Cortex-M instructions initiate sleep and the SLEEPONEXIT bit in SCR causes the core to re-enter sleep immediately after each ISR returns, implementing an interrupt-driven low-power design without a main loop.

Follow-up: What is the difference between WFI (Wait For Interrupt) and WFE (Wait For Event) in Cortex-M?

Q7. What is the SysTick timer in ARM Cortex-M and how is it used by an RTOS?

SysTick is a 24-bit countdown timer built into every Cortex-M core, decrementing from a reload value to zero at each clock cycle and generating a SysTick exception — it provides an RTOS-independent timebase that does not consume a peripheral timer. FreeRTOS on STM32 configures SysTick to interrupt at 1 kHz (every 1 ms), and the SysTick ISR calls xTaskIncrementTick() which manages time delays and triggers the PendSV context switch if a higher-priority task is unblocked. Because SysTick is part of the core, it is always available regardless of which STM32, LPC, or EFM32 variant is used, making FreeRTOS port files portable across Cortex-M devices.

Follow-up: What happens if SysTick is also used by the HAL library and FreeRTOS simultaneously?

Q8. What is the Memory Protection Unit (MPU) in Cortex-M and how does it protect tasks?

The MPU is an optional hardware unit (present on Cortex-M3/M4/M7/M33) that divides the address space into up to 8 or 16 configurable regions, each with independent access permissions (read/write/execute, privileged-only or unprivileged), allowing the RTOS to prevent tasks from accessing other tasks' stacks, the kernel data, or hardware registers not assigned to them. FreeRTOS-MPU on STM32F4 creates an MPU region for each task covering only its TCB, stack, and assigned peripherals; a stack overflow or wild pointer in one task generates a MemManage fault instead of silently corrupting kernel state. Cortex-M33 (used in STM32U5) supports TrustZone with up to 16 MPU regions per security state for advanced secure/non-secure partitioning.

Follow-up: What exception is triggered when an MPU access violation occurs and how is it handled?

Q9. What is the difference between privileged and unprivileged modes in Cortex-M?

In privileged mode (Handler mode always, Thread mode optionally), the CPU has full access to all system registers including CONTROL, PRIMASK, and MPU configuration; in unprivileged Thread mode, write access to these registers is blocked and MPU restrictions apply. An RTOS kernel runs in privileged Thread mode while user tasks run in unprivileged mode — a SVC instruction allows a task to request a privileged service from the kernel without directly manipulating hardware. This separation is the hardware foundation for the OS protection model: even a buggy or malicious task cannot disable interrupts, modify another task's stack, or access unauthorized peripherals.

Follow-up: What is the CONTROL register in Cortex-M and what bits does it contain?

Q10. What is tail-chaining in Cortex-M interrupt handling?

Tail-chaining is the Cortex-M hardware optimization where, if a second interrupt is pending when an ISR completes, the core immediately fetches the new vector and begins the next ISR without performing the full 8-register pop-and-push of exception entry/exit, reducing the inter-ISR latency to only 6 cycles. In a CAN controller ISR running on an STM32F4, if both TX complete and RX FIFO interrupts are pending simultaneously, the first completes and tail-chains into the second without the stack being walked twice — saving 12 cycles × 2 = 24 cycles at 168 MHz, or about 143 ns. This is one reason Cortex-M achieves lower worst-case interrupt latency than classic ARM7TDMI which had no equivalent hardware mechanism.

Follow-up: What is late arrival preemption in Cortex-M and how does it interact with tail-chaining?

Q11. What are the fault exceptions in Cortex-M and what triggers each?

The main fault exceptions are HardFault (any fault not caught by a configurable handler — always enabled), MemManage (MPU violation or XN region execute), BusFault (invalid memory access, misaligned access in some devices), UsageFault (undefined instruction, unaligned access on Cortex-M0, divide by zero if enabled in CCR), and on Cortex-M33 SecureFault. Accessing address 0x00000000 with an LDR on an STM32F4 where flash is remapped generates a valid access, but accessing 0xCCCCCCCC (unmapped) generates a BusFault with BFARVALID set in BFAR — the fault handler can read this to identify the faulting address during debugging. Implementing fault handlers that print LR, PSP-stacked PC, and CFSR to a UART or RTT buffer is the first debugging strategy every firmware engineer must know.

Follow-up: How do you identify the faulting instruction address from inside a HardFault handler?

Q12. What is the difference between ARM Thumb-2 and ARM32 instruction sets?

ARM32 instructions are all 32 bits wide, while Thumb instructions are 16 bits; Thumb-2 mixes 16-bit and 32-bit instructions in the same stream, giving approximately 26% smaller code than ARM32 with only about 5% performance penalty compared to full 32-bit mode. All Cortex-M cores operate exclusively in Thumb-2 state — there is no ARM32 mode — which is why the lowest bit of a function pointer must be 1 (Thumb bit set) when loading PC, a common bug source when porting code from ARM7TDMI. Compiling for Cortex-M4 with armcc -mcpu=cortex-m4 automatically generates Thumb-2 code including DSP and optional FPU instructions.

Follow-up: What is the IT (If-Then) instruction in Thumb-2 and how does it enable conditional execution?

Q13. How does the Cortex-M4 FPU work and what are its configuration requirements?

The Cortex-M4 has a single-precision FPU (FPv4-SP) accessed through 32 single-precision registers S0–S31 (aliased as 16 double-word registers D0–D15), supporting IEEE 754 arithmetic with hardware square root and fused multiply-accumulate (FMAC); it must be enabled by writing to CPACR before use. In startup code (STM32 SystemInit or CMSIS SystemCoreClockUpdate), the lines setting CP10 and CP11 bits in SCB->CPACR to 0b11 are critical; omitting this causes any floating-point instruction to generate a UsageFault. The FPU enables DSP algorithms like IIR filters on audio data at 48 kHz sample rate to run at 10–20% CPU load on a Cortex-M4 at 168 MHz, which would require 60–80% on a Cortex-M3 using software float.

Follow-up: What is the ABI difference between hard-float and soft-float compilation for Cortex-M4?

Q14. What is the vector table and how does it get relocated for bootloaders?

The vector table is an array of 32-bit addresses starting at 0x00000000 (or VTOR address), with entry 0 holding the initial MSP value and entry 1 the Reset_Handler address, followed by NMI, HardFault, and all peripheral interrupt vectors. In a dual-stage bootloader for STM32F4, the bootloader occupies flash sector 0 (0x08000000–0x08003FFF), and on detecting a valid application, writes SCB->VTOR = 0x08004000 and jumps to the application Reset_Handler; the application's NVIC then uses its own vector table. Forgetting to set VTOR before enabling application interrupts is the most common bug in bootloader implementations — peripheral interrupts jump to the wrong handlers, causing HardFault.

Follow-up: What are the alignment requirements for VTOR in Cortex-M3/M4?

Q15. How do you implement a critical section in Cortex-M without an RTOS?

A critical section on Cortex-M is implemented by saving and disabling all maskable interrupts using __disable_irq() / CPSID i before the critical block and restoring with __enable_irq() / CPSIE i after, or preferably using PRIMASK save-and-restore for nested critical section support. In STM32 bare-metal code, the pattern uint32_t primask = __get_PRIMASK(); __disable_irq(); /* critical */ __set_PRIMASK(primask); correctly handles nesting — if interrupts were already disabled by an outer section, they remain disabled after the inner section exits. BASEPRI register in Cortex-M3/M4 can be used instead to disable only interrupts below a given priority, allowing high-priority interrupts to remain active during the critical section — the method FreeRTOS uses for taskENTER_CRITICAL().

Follow-up: Why is using BASEPRI instead of PRIMASK preferable in an RTOS critical section?

Common misconceptions

Misconception: ARM Cortex-M and ARM Cortex-A processors are programmed the same way.

Correct: Cortex-A requires an OS with MMU and cache management; Cortex-M is programmed bare-metal or with a lightweight RTOS without MMU, and uses a completely different startup model, vector table, and exception handling mechanism.

Misconception: Lower interrupt priority number means the interrupt is less important.

Correct: In Cortex-M NVIC, a lower numerical priority value means higher urgency — priority 0 is the highest possible and preempts priority 15; this is opposite to some other architectures.

Misconception: Disabling interrupts with __disable_irq() is always the correct way to implement a critical section.

Correct: Disabling all interrupts blocks high-priority time-critical ISRs; using BASEPRI to disable only interrupts below a threshold allows the highest-priority interrupts to still fire, which is how FreeRTOS implements taskENTER_CRITICAL() without blocking real-time responses.

Misconception: The SysTick timer is a peripheral timer like TIM1 or TIM2 on STM32.

Correct: SysTick is a 24-bit counter embedded inside every Cortex-M core, independent of any peripheral timer, always available regardless of MCU vendor, and its exception is at fixed priority in the vector table.

Quick one-liners

What is the interrupt entry latency of Cortex-M3/M4?12 clock cycles from interrupt assertion to first instruction of the ISR with zero-wait-state flash.

What registers are automatically stacked on Cortex-M interrupt entry?R0, R1, R2, R3, R12, LR, PC, and xPSR — 8 registers pushed automatically to the stack.

What is the NVIC?Nested Vectored Interrupt Controller — a hardware unit inside every Cortex-M core that manages interrupt priority, preemption, and tail-chaining.

What instruction enters sleep mode on Cortex-M?__WFI() — Wait For Interrupt — halts the CPU until an interrupt is asserted.

What is the purpose of the VTOR register?Vector Table Offset Register — relocates the interrupt vector table base address, essential for bootloader and application separation.

What fault is triggered by an MPU access violation?MemManage fault (configurable, or escalates to HardFault if MemManage is not enabled).

What is tail-chaining in Cortex-M?Skipping the stack pop-push between two back-to-back ISRs, reducing inter-ISR latency to 6 cycles.

What bits must be set in CPACR to enable the Cortex-M4 FPU?CP10 and CP11 bits must both be set to 0b11 (full access) in SCB->CPACR during startup.

What is the difference between MSP and PSP?MSP (Main Stack Pointer) is used by interrupt handlers and the kernel; PSP (Process Stack Pointer) is used by RTOS tasks in unprivileged Thread mode.

What is the Thumb-2 instruction set?A mixed 16-bit and 32-bit instruction encoding used exclusively in all Cortex-M cores, providing ARM32-level performance with 26% smaller code size.

Interview questions & answers

Common misconceptions

Quick one-liners

More Embedded Systems questions