DMA vs Interrupt Data Transfer | Embedded ECE

Side-by-side comparison

Parameter	DMA	Interrupt Data Transfer
CPU Involvement per Transfer	Full — ISR fires per byte or block; CPU saves context, runs ISR	Zero during transfer — DMA copies autonomously
CPU Load at 1 Mbps UART	~30–50% CPU occupied in ISR overhead	~2% CPU (setup + completion ISR only)
Latency to First Byte	Low — ISR fires within 12 cycles (ARM Cortex-M4)	Slightly higher — DMA request, arbitration, bus grant: ~3–5 extra cycles
Setup Complexity	Configure NVIC priority, ISR function, volatile flag	Configure DMA stream, direction, burst size, M/P increment, enable DMAEN
Transfer Size	Efficient for 1–8 bytes per event	Efficient for 32 bytes to MB blocks
Memory-to-Memory	Not applicable	Possible — DMA can copy between two SRAM regions without CPU
Circular Mode	Not native — must re-enable in ISR	DMA circular mode — auto-restarts; perfect for ADC sampling buffers
Bus Bandwidth Impact	CPU uses AHB/APB; ISR stalls other bus masters	DMA uses dedicated DMA bus matrix; CPU and DMA can work in parallel (STM32 AHB matrix)
Typical Use Case	Low-rate GPIO events, single-byte SPI commands, keypad scans	ADC streaming (12-bit, 1 MSPS), UART at > 115200, I2S audio, SD card block writes
Example MCU	STM32F4: UART RXNE interrupt, 1 byte per ISR at 9600 baud	STM32F4: DMA2 Stream0 Ch0 → ADC1 → 1024-sample SRAM buffer, half-transfer + complete ISR

Key differences

An interrupt fires per transaction event — at 115200 baud UART receiving a 256-byte packet, that is 256 ISR entries, 256 register saves (8 registers × 2 cycles each on Cortex-M4), 256 branch-to-ISR, 256 returns — roughly 6000 wasted clock cycles. DMA fires two ISRs (half-transfer and transfer-complete) for the same 256 bytes — 50 wasted cycles. The STM32F4 DMA controller has a dedicated AHB slave bus that does not block the CPU's AHB master; both can transfer simultaneously, giving true zero CPU overhead. DMA does not replace interrupts — the DMA completion ISR is still needed to process the received buffer; DMA removes the per-byte interrupt overhead while keeping the end-of-block notification.

When to use DMA

Use interrupt-driven transfer for low-rate events (< 10 kHz), short bursts (< 8 bytes), or when the data must be processed byte by byte as it arrives. Example: an I2C address-match interrupt on an STM32L4 slave fires per byte (400 kHz, 8 bytes per frame = 400 kHz / 9 bits × 8 ≈ 35 kHz ISR rate) — manageable without DMA.

When to use Interrupt Data Transfer

Use DMA for any peripheral streaming more than 32 bytes continuously or at rates above 100 kHz. Example: an STM32F407 samples 12-bit ADC at 1 MSPS using DMA2 circular mode into a 1024-sample double buffer — the CPU receives a half-transfer interrupt every 512 samples (512 µs) to process one half while DMA fills the other, achieving zero-sample loss with < 1% CPU load.

Recommendation

For any data rate above 100 kbps or block sizes above 32 bytes, configure DMA — CPU load reduction and bus throughput are decisive. Use interrupts for low-rate, byte-by-byte events where DMA setup overhead outweighs savings. On STM32F4, always pair DMA with double-buffering for audio and ADC to process one half while DMA fills the other.

Exam tip: Examiners ask students to calculate the CPU overhead of interrupt-driven UART receive at 115200 baud, 8N1, assuming each ISR takes 1 µs — the ISR fires 115200/10 = 11520 times per second; 11520 µs = 1.15% CPU overhead at this rate; at 1 Mbps it becomes 10%, justifying DMA.

Interview tip: An embedded systems interviewer at a hardware company will ask you to describe double-buffered DMA for ADC streaming — explain that two SRAM buffers are configured; DMA fills buffer A while the CPU processes buffer B; on half-transfer interrupt the CPU switches; this guarantees zero overrun at any sample rate within SRAM bandwidth.

Side-by-side comparison

Key differences

When to use DMA

When to use Interrupt Data Transfer

Recommendation

More Embedded Systems comparisons