Side-by-side comparison
| Parameter | DMA | Interrupt Data Transfer |
|---|---|---|
| CPU Involvement per Transfer | Full — ISR fires per byte or block; CPU saves context, runs ISR | Zero during transfer — DMA copies autonomously |
| CPU Load at 1 Mbps UART | ~30–50% CPU occupied in ISR overhead | ~2% CPU (setup + completion ISR only) |
| Latency to First Byte | Low — ISR fires within 12 cycles (ARM Cortex-M4) | Slightly higher — DMA request, arbitration, bus grant: ~3–5 extra cycles |
| Setup Complexity | Configure NVIC priority, ISR function, volatile flag | Configure DMA stream, direction, burst size, M/P increment, enable DMAEN |
| Transfer Size | Efficient for 1–8 bytes per event | Efficient for 32 bytes to MB blocks |
| Memory-to-Memory | Not applicable | Possible — DMA can copy between two SRAM regions without CPU |
| Circular Mode | Not native — must re-enable in ISR | DMA circular mode — auto-restarts; perfect for ADC sampling buffers |
| Bus Bandwidth Impact | CPU uses AHB/APB; ISR stalls other bus masters | DMA uses dedicated DMA bus matrix; CPU and DMA can work in parallel (STM32 AHB matrix) |
| Typical Use Case | Low-rate GPIO events, single-byte SPI commands, keypad scans | ADC streaming (12-bit, 1 MSPS), UART at > 115200, I2S audio, SD card block writes |
| Example MCU | STM32F4: UART RXNE interrupt, 1 byte per ISR at 9600 baud | STM32F4: DMA2 Stream0 Ch0 → ADC1 → 1024-sample SRAM buffer, half-transfer + complete ISR |
Key differences
An interrupt fires per transaction event — at 115200 baud UART receiving a 256-byte packet, that is 256 ISR entries, 256 register saves (8 registers × 2 cycles each on Cortex-M4), 256 branch-to-ISR, 256 returns — roughly 6000 wasted clock cycles. DMA fires two ISRs (half-transfer and transfer-complete) for the same 256 bytes — 50 wasted cycles. The STM32F4 DMA controller has a dedicated AHB slave bus that does not block the CPU's AHB master; both can transfer simultaneously, giving true zero CPU overhead. DMA does not replace interrupts — the DMA completion ISR is still needed to process the received buffer; DMA removes the per-byte interrupt overhead while keeping the end-of-block notification.
When to use DMA
Use interrupt-driven transfer for low-rate events (< 10 kHz), short bursts (< 8 bytes), or when the data must be processed byte by byte as it arrives. Example: an I2C address-match interrupt on an STM32L4 slave fires per byte (400 kHz, 8 bytes per frame = 400 kHz / 9 bits × 8 ≈ 35 kHz ISR rate) — manageable without DMA.
When to use Interrupt Data Transfer
Use DMA for any peripheral streaming more than 32 bytes continuously or at rates above 100 kHz. Example: an STM32F407 samples 12-bit ADC at 1 MSPS using DMA2 circular mode into a 1024-sample double buffer — the CPU receives a half-transfer interrupt every 512 samples (512 µs) to process one half while DMA fills the other, achieving zero-sample loss with < 1% CPU load.
Recommendation
For any data rate above 100 kbps or block sizes above 32 bytes, configure DMA — CPU load reduction and bus throughput are decisive. Use interrupts for low-rate, byte-by-byte events where DMA setup overhead outweighs savings. On STM32F4, always pair DMA with double-buffering for audio and ADC to process one half while DMA fills the other.
Exam tip: Examiners ask students to calculate the CPU overhead of interrupt-driven UART receive at 115200 baud, 8N1, assuming each ISR takes 1 µs — the ISR fires 115200/10 = 11520 times per second; 11520 µs = 1.15% CPU overhead at this rate; at 1 Mbps it becomes 10%, justifying DMA.
Interview tip: An embedded systems interviewer at a hardware company will ask you to describe double-buffered DMA for ADC streaming — explain that two SRAM buffers are configured; DMA fills buffer A while the CPU processes buffer B; on half-transfer interrupt the CPU switches; this guarantees zero overrun at any sample rate within SRAM bandwidth.