Interview questions

DMA Interview Questions

DMA questions appear in embedded hardware and firmware interviews at Texas Instruments, Bosch, Qualcomm, and L&T Technology Services. IT companies like TCS and Infosys rarely go this deep unless hiring for embedded tracks. DMA topics come up in the second technical round, usually after peripheral interface questions, and often involve explaining how DMA reduces CPU load for ADC or UART transfers.

ECE, EI

Interview questions & answers

Q1. What is DMA and what problem does it solve in embedded systems?

DMA (Direct Memory Access) is a hardware subsystem that transfers data between memory and peripherals without CPU involvement, freeing the processor to execute application code during the transfer. Without DMA, streaming 1024 ADC samples over UART on an STM32F4 requires the CPU to execute a read-then-write loop 1024 times; with DMA, the CPU programs one DMA transaction and is notified only on completion. At 1 Msps ADC sample rates, CPU-polled transfers consume 100% of processor time, while DMA brings that overhead to near zero.

Follow-up: What is the trade-off of using DMA — what resource does it contend for?

Q2. What are the main DMA transfer modes — what is the difference between normal and circular mode?

In normal mode the DMA stops after transferring the configured number of data items and generates a transfer-complete interrupt; in circular mode it reloads the counter and restarts automatically, creating a continuous ping-pong buffer. Circular mode is used for continuous ADC sampling — on STM32, DMA1_Stream0 configured in circular mode with ADC1 keeps filling a 256-sample buffer and interrupts at half-transfer and full-transfer so the CPU processes the first half while the second half is being filled. Normal mode is used for one-shot transfers like sending a fixed-length SPI frame.

Follow-up: What is double-buffering mode in STM32 DMA and how does it differ from circular mode?

Q3. What is a DMA channel and a DMA stream — how does STM32 DMA arbitration work?

On STM32F4, each DMA controller has 8 streams and each stream has 8 channel selections (muxed via the channel bits in SxCR), where a channel selects the peripheral trigger source for that stream. When multiple streams are ready to transfer simultaneously, the DMA hardware arbitrates based on stream priority (very high, high, medium, low configured in SxCR:PL) and for equal priorities uses the lower stream number. Incorrectly assigning two peripherals to the same stream without awareness of their simultaneous activity is a common cause of DMA data loss that is hard to reproduce.

Follow-up: Can two DMA streams from the same controller access the same memory simultaneously?

Q4. What is a DMA burst transfer and when is it used?

A burst transfer groups multiple data beats into a single bus transaction, reducing arbitration overhead and improving memory bus efficiency for large transfers. On STM32F4, MBURST and PBURST fields in DMA_SxCR configure 4-beat, 8-beat, or 16-beat bursts; for transferring audio data to an I2S peripheral at 48 kHz with 32-bit stereo samples, 4-beat bursts reduce AHB arbitration events by 4x compared to single-beat transfers. Bursts are only allowed with FIFO mode enabled; using bursts without the FIFO causes a DMA configuration error flag (FEIF) to be set.

Follow-up: What is the DMA FIFO and why must it be enabled for burst transfers?

Q5. What is the role of the DMA FIFO on STM32 and what is direct mode?

The DMA FIFO on STM32 is a 4-word buffer between the DMA and the bus matrix that allows width adaptation between the memory and peripheral data widths, and enables burst grouping. In direct mode (FIFO disabled), each peripheral request immediately triggers one data transfer at the configured width with no buffering; in FIFO mode, the peripheral fills the FIFO before a burst-write to memory occurs. Direct mode is simpler but limits you to single-beat non-burst transfers; FIFO mode is required for mismatched widths (e.g., reading bytes from SPI and writing words to memory) or for burst transactions.

Follow-up: What does a FIFO error (FEIF) flag indicate in STM32 DMA?

Q6. How do you configure DMA for ADC continuous conversion on STM32?

Enable ADC in continuous scan mode, configure DMA1_Stream0 Channel 0 for DMA_PERIPH_TO_MEMORY, set peripheral address to &ADC1->DR, memory address to the sample buffer, data length to the number of samples, and enable circular mode and half-transfer interrupt. In STM32CubeMX you set ADC1 to continuous conversion with DMA request enabled, which generates HAL_ADC_Start_DMA(&hadc1, buffer, 256) in the application code. The critical step beginners miss is setting the ADC to generate a DMA request on each conversion end rather than relying on software trigger — without this the DMA stream never advances.

Follow-up: What happens to the ADC DMA buffer if the CPU is too slow to process the half-transfer callback?

Q7. What is cache coherency and why is it a DMA problem on Cortex-M7?

Cache coherency means the cache and main memory hold the same data; it becomes a DMA problem on Cortex-M7 (STM32H7, STM32F7) because the D-Cache may hold stale copies of RAM that the DMA has already updated, or DMA may read stale pre-modified data that the CPU has cached but not yet flushed to RAM. The solution is to either use non-cacheable memory regions for DMA buffers (defined in the MPU) or manually call SCB_InvalidateDCache_by_Addr() after a DMA receive and SCB_CleanDCache_by_Addr() before a DMA transmit. Ignoring cache coherency on STM32H7 causes DMA buffers to silently contain stale data, a bug that only appears when cache is enabled and is extremely difficult to diagnose.

Follow-up: What is the difference between cache clean, invalidate, and clean-and-invalidate operations?

Q8. What is a DMA transfer error and how do you handle it?

A DMA transfer error (TEIF flag) occurs when the DMA encounters a bus error — typically an illegal memory address, access to a protected region, or a peripheral that de-asserts its DMA request unexpectedly. In the DMA error callback (HAL_DMA_ErrorCallback in STM32 HAL), you should read DMA->LISR or HISR to identify which stream faulted, log the error, stop the peripheral, and attempt re-initialization rather than silently ignoring the flag. Transfer errors in production systems are usually caused by incorrectly configured peripheral addresses (off-by-4 byte errors on 32-bit-aligned registers) or MPU blocking a DMA access to a guarded region.

Follow-up: What is the difference between a DMA transfer error and a FIFO error?

Q9. What is memory-to-memory DMA transfer and how is it faster than memcpy for large buffers?

Memory-to-memory DMA transfers data between two RAM regions using DMA burst cycles on the AHB bus, allowing the CPU to perform other work during the copy, and on STM32F4 can achieve close to the theoretical 168 MHz AHB bus bandwidth. For copying a 4 KB frame buffer at 168 MHz with 4-beat bursts, DMA completes in under 20 µs while a software memcpy would block the CPU for the same duration. However, for small copies under 32 bytes, the DMA setup overhead exceeds the benefit and a software copy is faster — DMA acceleration is only worthwhile above a few hundred bytes.

Follow-up: Can DMA2 on STM32F4 perform memory-to-memory transfers — why not DMA1?

Q10. What is the difference between DMA request, trigger, and acknowledge in a peripheral handshake?

The peripheral asserts a DMA request signal when it is ready for a data transfer (e.g., UART TX register empty), the DMA responds by performing the transfer, and then the peripheral de-asserts the request (acknowledge) when the transfer is accepted. For SPI DMA transmit on STM32, SPI1->SR TXE flag causes SPI1_TX_DMA_REQUEST, the DMA writes one byte to SPI1->DR, and TXE de-asserts until the shift register is empty again. Misunderstanding this handshake leads to double-write bugs where software writes SPI->DR directly while DMA is also configured, overwriting data in the transmit register.

Follow-up: What happens if a peripheral's DMA request stays asserted after the DMA transfer is complete?

Q11. How does DMA relate to interrupt-driven and polling-based transfers — which is most CPU-efficient?

Polling burns 100% CPU in a wait loop, interrupt-driven frees the CPU between bytes but incurs per-byte ISR overhead, and DMA frees the CPU for the entire block transfer with only a single completion interrupt. For a 1 KB SPI flash read at 10 MHz: polling takes ~800 µs of pure CPU time, interrupt-driven takes ~800 interrupts each costing ~50 ns = 40 µs overhead, while DMA takes roughly 0 µs of CPU time with one completion interrupt. The correct choice is DMA for any transfer above 8–16 bytes in a system where CPU utilization matters.

Follow-up: What is the limitation of DMA for very short transfers of 1–4 bytes?

Q12. What is the DMA double-buffer mode on STM32 and how does it enable zero-copy audio streaming?

Double-buffer mode configures two memory addresses (M0AR and M1AR) and the DMA hardware alternates between them automatically at each transfer-complete event, calling a callback for each switch so software can refill the idle buffer while the other is being transferred. For I2S audio at 48 kHz stereo 16-bit, two 480-sample buffers of 960 bytes each allow the audio codec DMA to switch every 10 ms — the CPU mixes and generates the next 480 samples in the idle buffer before the hardware switches. Without double-buffer mode, there is a race between re-arming the DMA and the next hardware trigger that causes audio glitches.

Follow-up: What happens if the CPU fails to fill the idle buffer before the DMA switches to it?

Q13. What is MDMA on STM32H7 and how is it different from regular DMA?

MDMA (Master DMA) on STM32H7 is a higher-level DMA controller that can access all memory regions including TCM (Tightly Coupled Memory), AXI SRAM, and external SDRAM, while regular DMA1/DMA2 can only access AHB-bus-accessible memory and cannot transfer to or from ITCM/DTCM. MDMA supports linked-list DMA descriptors which chain multiple transfers without CPU intervention, enabling complex scatter-gather operations used in Ethernet and DCMI camera frame transfers. Regular DMA is sufficient for most UART, SPI, and ADC use cases, but GPU framebuffer or JPEG encoder transfers on STM32H7 require MDMA.

Follow-up: What is TCM memory on Cortex-M7 and why can't DMA1/DMA2 access it?

Q14. What is a DMA linked-list or scatter-gather descriptor and when is it used?

A DMA linked-list descriptor is a data structure in memory that describes a single DMA transfer (source, destination, length, next descriptor pointer); the DMA hardware follows the chain automatically without CPU involvement between segments. It is used for Ethernet frame transmission where a frame spans multiple non-contiguous memory buffers — the DMA transmits each buffer in sequence by following the descriptor chain, which is how the LwIP network stack on STM32 avoids copying frames into a contiguous buffer before sending. Without scatter-gather, sending a 1500-byte Ethernet frame stored in three separate buffers requires either memcpy to a contiguous buffer (wasting memory and time) or three separate DMA requests (adding complexity).

Follow-up: What is the difference between hardware scatter-gather and software scatter-gather?

Q15. How do you verify that a DMA configuration is correct before running it in a real system?

Verify by checking peripheral data register address (should be exactly the hardware register address, not an offset), confirming transfer direction matches PERIPH_TO_MEMORY or MEMORY_TO_PERIPH, checking data width alignment, enabling the DMA stream only after the peripheral is initialized, and using a logic analyzer to confirm data integrity on the first transfer. On STM32, reading DMA_SxNDTR after starting confirms the decrement — if it stays at the initial value the stream is not being triggered by the peripheral. Always test edge cases like buffer sizes that are not multiples of the burst size, as these produce FIFO errors that only appear under specific count values.

Follow-up: How do you use CRC verification to confirm DMA data integrity on STM32?

Common misconceptions

Misconception: DMA transfers happen completely in the background and the CPU has no involvement at all.

Correct: DMA uses the AHB bus which is shared with the CPU; during a DMA burst the CPU may stall for 1–2 cycles waiting for bus access, so DMA reduces but does not eliminate CPU interaction with the bus.

Misconception: Circular DMA mode and double-buffer mode are the same thing on STM32.

Correct: Circular mode reloads one buffer address continuously; double-buffer mode switches between two separate memory addresses M0AR and M1AR and provides a callback for each switch, enabling true zero-copy ping-pong processing.

Misconception: You can use the DMA buffer variable immediately after calling HAL_UART_Transmit_DMA.

Correct: The DMA transfer happens asynchronously; modifying the buffer before the transfer-complete callback fires corrupts the data being sent by the DMA.

Misconception: DMA on Cortex-M7 works the same as on Cortex-M4 without any additional configuration.

Correct: Cortex-M7 has a D-Cache that requires explicit cache maintenance (clean/invalidate) before and after DMA transfers to maintain cache coherency, which is not needed on cache-less Cortex-M4 devices.

Quick one-liners

What does NDTR stand for in STM32 DMA?Number of Data items to Transfer Register — it decrements with each transfer and reaches 0 on completion.
Which bus does STM32 DMA use to access memory?The AHB (Advanced High-performance Bus) bus matrix, which it shares with the CPU.
What flag indicates a DMA transfer is complete on STM32?TCIF (Transfer Complete Interrupt Flag) in the DMA_LISR or DMA_HISR register.
Why must DMA buffers be aligned on Cortex-M7 for cache safety?Cache lines are 32 bytes; misaligned DMA buffers share cache lines with non-DMA data causing false sharing and coherency bugs.
Can DMA1 on STM32F4 do memory-to-memory transfers?No — only DMA2 supports memory-to-memory mode on STM32F4.
What is a DMA request multiplexer (DMAMUX)?A hardware block on newer STM32 (G0, G4, H7) that maps any peripheral DMA request to any DMA channel, replacing the fixed channel assignments of older STM32 series.
What is the half-transfer interrupt in DMA circular mode used for?It fires when the first half of the buffer is filled, allowing the CPU to process the first half while DMA fills the second half — enabling continuous double-buffered processing.
What happens to the DMA stream when a transfer error occurs?The DMA disables the stream automatically and sets the TEIF flag; firmware must clear the flag and re-initialize the stream before the next transfer.
What is flow control in DMA and what does peripheral flow control mean?Flow control determines when a transfer ends; peripheral flow control allows the peripheral (not the NDTR register) to signal when the transfer is complete, used for peripherals with variable-length data.
What is the minimum DMA data width on STM32?Byte (8-bit), selectable as byte, half-word (16-bit), or word (32-bit) independently for memory and peripheral sides.

More Embedded Systems questions