Registers Don't Lie

2026-01-19 #zephyr#embedded

Introduction

Those of us who work in embedded systems know the ritual. Something breaks, and the finger-pointing begins. The hardware engineer says it’s your firmware. The firmware engineer says it’s your hardware. Round and round it goes.

This is painfully common, and I think it’s because embedded systems are genuinely complex. We’re working with abstracted physical systems—integrated circuits—which define behaviours based on inputs and outputs. For anything beyond moderate complexity (32-bit micros, Cortex-M and up), it’s literally impossible to know what’s going on inside the chip from a silicon design perspective. Instead, we work one layer up: the specification of behaviours. Write 0x0A to address 0x40 and expect a defined result.

Manufacturers provide datasheets that define this behaviour. But these documents are massive, aren’t designed to be read cover to cover, and can be genuinely difficult to parse. Key details are sometimes buried, ambiguously worded, or missing entirely. The skill of developing a deep, penetrating knowledge of any given chip’s internals is increasingly uncommon. The same can be said for firmware; modern development practices (strong abstraction layers, third-party frameworks and libraries) increase the level of obfuscation and intensify the problem.

When something goes wrong at this level, it’s tempting to guess. Toggle a Kconfig, re-measure, try another option. But guessing doesn’t scale, and it doesn’t give you certainty. The way out is to go below the abstraction. Read the actual register values. Trace the actual interrupts. These things don’t lie (silicon bugs notwithstanding), they tell you exactly what the hardware is doing, regardless of what your firmware thinks it asked for.

This article describes a sleep current investigation on an nRF52833 product where I had to do exactly that.

The Problem

I owned the firmware for this device end-to-end, so figuring out what kind of sleep current we could achieve was my responsibility. I started by reviewing the Nordic datasheet and the board schematic, and estimated a best-case target somewhere around 5-10uA.

The nRF52833 has two low-power modes: System Off and System On (there’s more nuance here—see the datasheet for the full picture). System Off is the lowest power state, but you can’t use a timer to wake from it (the newer nRF54 series chips actually have a GRTC that overcomes this limitation). Our application needed periodic RTC-driven wakeups (mainly for battery monitoring), so System Off wasn’t an option. That left System On with idle sleep. The datasheet puts this at roughly 1-2uA above System Off, which isn’t much compared to our baseline leakage anyway.

In practice, I ended up defining two modes in the firmware: SLEEP (System On with periodic RTC wakeup) and SHUTDOWN (supervisor-triggered full power off).

The measurement setup was a 3.7V LiPo with a Nordic PPK2 in ampere meter mode, inline between the battery and the board.

To establish a baseline, I wrote a minimal Zephyr application called lp-test. The idea was simple: disable everything we can, put the peripherals into their lowest power states, and measure what we get. The Kconfig pulled in only what was needed:

CONFIG_GPIO=y
CONFIG_FLASH=y
CONFIG_SPI=y
CONFIG_SPI_NOR=y
CONFIG_SPI_NOR_SFDP_DEVICETREE=y
CONFIG_SPI_NOR_FLASH_LAYOUT_PAGE_SIZE=4096
CONFIG_NORDIC_QSPI_NOR=n
CONFIG_BT=y
CONFIG_PM_DEVICE=y
CONFIG_PM_DEVICE_RUNTIME=y

The application itself was straightforward. Configure GPIOs as inactive, put external peripherals into their lowest power states, and sleep. The Zephyr SPI NOR driver handles deep power-down automatically, which is convenient; without automatic deep power-down, the flash idles in a higher power state. A devicetree overlay disabled everything else: ADC, PWM, QDEC, UART, I2C, and various board-specific peripherals like the battery monitor and supervisor GPIOs.

int main(void)
{
    /* ... gpio and peripheral init ... */

    while (1) {
        /* Check power consumption during this sleep call */
        k_sleep(K_MSEC(2000));

        /* Brief flash to show loop restarting */
        gpio_pin_set_dt(&blue_gpio, 1);
        k_sleep(K_MSEC(100));
        gpio_pin_set_dt(&blue_gpio, 0);
    }

    return 0;
}

With this I got down to around 30uA. Higher than I’d hoped, but there were a few reasons for that.

First, the board had a roughly 12uA base leakage through a resistor divider that couldn’t be remedied in firmware. This required a hardware fix.

Second, there was no external RTC on the board, so the nRF needed to wake every 4 seconds to calibrate its onboard RC oscillator to meet the 500ppm accuracy target required for BLE. The real firmware uses BLE, so disabling these calibrations would have misrepresented the actual power profile.

Third, there were intermittent current spikes upsetting the average. At 100ksps, I was seeing one or two samples hit 1.2mA before dropping back down. The datasheet for the regulator we were using actually notes that low-load operation can result in oscillations, so I guessed this was likely a hardware-level issue. Removing the regulator entirely and feeding 3V3 directly from a bench supply dropped the baseline to around 18uA, which confirmed the suspicion.

So 30uA was our practical floor with the current hardware. Good enough to move on to the real application and attempt to reach parity.

The Real Application

The real firmware was a different story. The device used BLE, a state machine with multiple power modes, a charger IC, PWM’d LEDs, and several other peripherals. When I first measured sleep current on the full application, it was way off the baseline.

I worked through it module by module, adding proper suspend and shutdown calls to each peripheral. The charger module needed its GPIO pullups disabled to stop leakage current. The LED scheduler needed to suspend its PWM device. BLE needed a clean shutdown and re-init path for sleep/wake cycles.

This got us down to around 230uA. Better, but still well above the 30uA floor. Something was still drawing power, and I needed to figure out what.

I’ll be honest. I spent a lot of time going in circles at this point. Toggling Kconfig options, disabling peripherals one by one, re-measuring, second-guessing previous changes. I even pulled in Segger SystemView to look at the execution traces, checking when the system entered the idle thread, how often it woke, how long it stayed asleep. Zephyr handles low-power entry automatically in the idle thread, so SystemView could at least confirm the sleep/wake pattern looked sane. But it still wasn’t telling me why the current was high.

The problem with all of this is that you’re effectively guessing. You’re changing things and hoping the number goes down, without really understanding why it’s high in the first place.

That’s when I decided to go lower and look at what the hardware was actually doing.

Tracing Interrupts with GDB

The first thing I wanted to know was whether something was waking the CPU from sleep. On Nordic chips running Zephyr, every interrupt passes through _isr_wrapper. Breaking on this function gives you a window into every single interrupt that fires.

(gdb) break _isr_wrapper

That alone isn’t very useful though. You need to know which interrupt hit. On Cortex-M, the IPSR register holds the active exception number. Subtracting 16 gives you the IRQ number. I defined a GDB convenience command to do this:

(gdb) define irqn
Type commands for definition of "irqn".
End with a line saying just "end".
>printf "IPSR=%u  IRQ=%d\n", $ipsr, (int)$ipsr-16
>end

Now when the breakpoint hits:

(gdb) irqn
IPSR=33  IRQ=17

Better, but an IRQ number alone doesn’t tell you much. I wanted the actual ISR symbol, the function that’s about to run. Zephyr maintains an internal table (_sw_isr_table) that maps IRQ numbers to handler functions. A slightly more advanced command:

(gdb) define whoirq
Type commands for definition of "whoirq".
End with a line saying just "end".
>set $irq = (int)$ipsr - 16
>printf "IPSR=%u IRQ=%d\n", $ipsr, $irq
>p _sw_isr_table[$irq]
>printf "isr symbol: "
>info symbol _sw_isr_table[$irq].isr
>end

(gdb) whoirq
IPSR=33 IRQ=17
$1 = {arg = 0x0 <_vector_table>, isr = 0x2581 <rtc_nrf_isr>}
isr symbol: rtc_nrf_isr + 1 in section text

Now we’re talking. But stepping through interrupts manually is tedious. The final step is to automate it. Attach the command to the breakpoint and let GDB log every interrupt as it fires:

(gdb) commands
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>silent
>whoirq
>c
>end

Let it run and you get a stream of every interrupt hitting the CPU:

IPSR=23 IRQ=7
$8 = {arg = 0x35390 <__device_dts_ord_12>, isr = 0x279fd <saadc_irq_handler>}
isr symbol: saadc_irq_handler + 1 in section text
IPSR=33 IRQ=17
$9 = {arg = 0x0, isr = 0x296a1 <rtc_nrf_isr>}
isr symbol: rtc_nrf_isr + 1 in section text
IPSR=16 IRQ=0
$11 = {arg = 0x3367b <nrfx_power_clock_irq_handler>, isr = 0x33895 <nrfx_isr>}
isr symbol: nrfx_isr + 1 in section text
...

This gives you a complete picture of what’s waking the device. In my case, I expected to see the RTC ISR firing every 4 seconds for the LFRC calibration, and a battery voltage sample once a minute. That’s exactly what the trace showed. Nothing unexpected was firing. So the problem was likely a peripheral left running.

Reading Peripheral Registers

The interrupt trace ruled out unexpected wakeups, so the current had to be coming from a peripheral that was still enabled.

Every peripheral on the nRF has a base address and a set of registers at defined offsets. The QDEC (quadrature decoder) for example sits at base address 0x40012000, with its ENABLE register at offset 0x500:

Instance  Base address  Description
QDEC      0x40012000    Quadrature decoder

Register  Offset  Description
...
ENABLE    0x500   Enable the quadrature decoder
...

The ENABLE register is a single-bit field: 0 is disabled, 1 is enabled. Simple.

I had a hunch about the QDEC specifically because I knew it was using pm_device_runtime_get/pm_device_runtime_put for power management, while the other peripherals were using the direct pm_device_action_run API. Worth checking first.

Reading it directly from GDB:

(gdb) x/1xw 0x40012500
0x40012500:     0x00000001

Enabled. The firmware was supposed to have suspended it. After some thought, I figured that the runtime PM ref counting was preventing the device from actually powering down. Swapping to pm_device_action_run (the synchronous API) fixed it. Around 200uA dropped off immediately.

Fixed. Sleep current back to target. I went home satisfied.

The Next Day

I came back the next morning to do some final qualification, and it was broken again. The device was sitting at around 70uA, well above the 30uA target. The measured current behaviour was quite different from the QDEC issue though. The device was waking up very frequently.

I went back to the interrupt trace. This time the RTC ISR was firing every 5-10ms, way more frequently than the expected 4-second calibration interval. Something was scheduling constant wakeups.

After some digging, it turned out that a Kconfig option had been git restored which re-enabled the RTT shell. The shell uses the RTC for its console polling, which schedules wakeups at a much higher rate. This effectively prevented the device from ever properly sleeping.

A mundane cause for a confusing symptom. Disabled the shell, re-verified, and the device was back on target.

Takeaway

Both issues ended up being firmware-side. The QDEC’s power management API wasn’t actually powering it down. The shell was scheduling RTC wakeups every few milliseconds. Neither was obvious from the code alone.

What solved both was the same approach: stop guessing, go below the abstraction, and look at what the hardware is actually doing. An interrupt trace told me the sleep behaviour was correct the first time, and wrong the second. A register read told me exactly which peripheral was still enabled. No ambiguity, no back-and-forth.

These aren’t advanced techniques. Breaking on _isr_wrapper and reading a memory address with x/1xw are about as basic as GDB gets. But they penetrate the abstraction layers that normally sit between you and the hardware. When you’re stuck going in circles, that’s usually what you need.