Hacker gains access to the RP2350 OTP secret by glitching the RISC-V cores

203 points by EvgeniyZh 6 months ago

sounds 6 months ago

This is a repost of the 38C3 talk by Aedan Cullen:

https://media.ccc.de/v/38c3-hacking-the-rp2350

Aedan references a presentation earlier on Day 1 of 38C3 about hardware fault injection of the ACE3 USB-C controller in Apple hardware: https://media.ccc.de/v/38c3-ace-up-the-sleeve-hacking-into-a...

whitehexagon 6 months ago

Aedan's talk was impressive enough, and for sure he got lucky with the 3's, but that second talk is a mind blowing level of commitment and makes me appreciate even more just how much work went into Asahi linux I'm running and similar reverse engineering projects. I got stuck on just some simple audio routing on the pinePhone SoC, and this inspires me to go back and spend some more time on it. It seems the industry solution is to iterate fast on this stuff, i.e. before we get time to fully understand it. When I think how long we had to play around with the Z80 and 68000 chips, and now they are popping out new chips each year, and more and more of it custom silicon. I wonder what hardware designs we'd have if we were in a cooperative market vs walled-garden competitive one.

crest 6 months ago

It was a cool talk. The attacker used a well timed glitch on voltage supply pin shared by the USB peripheral and OTP (one time programable) memory which unlike the CPU power isn't protected by hardware glitch detectors. Instead the BootROM code does multiple validating reads to both known test values and the configuration bits. The glitch is timed to allow the test values to be read, but cuts power while reading the configuration bits. This causes test value to persist.

It jus happens that the test value represents an invalid, but useful, configuration (both ARM and RISC-V cores disabled). The chip power on state machine boots the RISC-V cores in this case instead of hanging the start-up sequence. Since the RISC-V cores don't have a secure enclave mode their debug interface is always accessible if a RISC-V core is selected.

The chip reset state machine is also partly under the control of software, because it's used to wakeup the chip from deep sleep modes. So the RISC-V cores (or their debug interface) can be used enable the ARM debug interface and perform a partial reset from beyond the point the state machine would disable the ARM debug interface, but early enough to reset everything else.

It's my understanding there can be no pure software fix.

v1ne 6 months ago

The title is wrong. The author glitched the IP block with the one time programmable fuses in conjunction with logic that uses those fuse bits, not the RISC-V cores themselves.

freedom-fries 6 months ago

RP2350 was supposed to be a game-changer for the RPi (the corp) and customers alike. It's an immensely powerful, and well documented platform so it's a shame that critical bugs have stolen the thunder. First the E9 errata made me junk the RP2350 for projects where I needed to interface with any kind of tri-state logic and now this one basically shows that extra RISC-V cores are a liability for any security conscious design.

If Rpi wants to rescue RP2350, they must release a new bug-fix stepping soon, or this one is done.

guenthert 6 months ago

> E9 errata made me junk the RP2350 for projects where I needed to interface with any kind of tri-state logic
Do whatever floats your boat; the rest of us just uses the documented work-around [1], i.e. either use internal pull-up or an external pull-down resistor on the bus. Note that while tri-state devices are common, it is rarely a good idea to leave a bus line floating.
As for the OTP security retrieval: high security projects, e.g. 2nd factor authorization fobs seem to be out, bummer. Another important use case I'd think are small market specialized devices for which the manufacturer wants to delay the distribution of intellectual property, e.g. by encrypting part of the code. For that the functionality seems still sufficient.
[1] https://datasheets.raspberrypi.com/rp2350/rp2350-datasheet.p...
- crest 6 months ago
  
  Afaik the internal pull-up/-down resistors aren't strong enough to stay outside of the problematic input voltage range and external ≤ 8kΩ resistors to 3.3V aren't acceptable unless you don't care about power consumption and your signal is still usable like that. It's a really annoying problem to have. I'm still waiting for a hardware revision that fixes the source of the problem instead of nasty workarounds.
Findecanor 6 months ago

The title makes it sound as if RISC-V is a liability. The liability lies in it being possible to reactivate cores that were supposed to have been permanently disabled.
- freedom-fries 6 months ago
  
  The RISC-V implementation on RP2350 does not have any security features - it has no business to be on silicon that was supposed to be marketed for security features, but it is there. Then there are bits to disable ARM and/or RV cores, but disabling the ARM core takes priority over disabling insecure RV core - these are human decisions not architectural.
  This isn't about the technology - the organization's priorities are clearly divided between two segments - one that's trying to expand the revenue by expanding into enterprise/commercial market and another that is trying to stick-on cool stuff and prioritizing fun. In this case neither won - the fun guys got kneecapped by the E9 bug making this silicon unusable for hobbyist projects, and the fun stuff kneecapped the enterprise stuff by bypassing the security bit.
  (as you can imaging, I'm extremely disappointed with RP2350 - RPi needs to focus or they hobby/maker market is done)
  - wren6991 6 months ago
    
    If the RISC-V cores were removed, the bit assignment for the critical OTP flags would be different. Quite likely this attack would then have directly enabled Secure debug access to the Arm cores.
    Whoever wrote that headline lacked even a cursory understanding of the attack. You can do better by watching Aedan Cullen's excellent 38C3 presentation for yourself.
  - Hilift 6 months ago
    
    It seems a product that sells for $7 pushes security validation all the way to the right - the customers.
- crest 6 months ago
  
  The RISC-V cores are innocent accomplices. Having a mix of cores with and without secure boot gives attackers more tool to play with. The exploit works because if both sets of cores are "permanently" disabled the hardware state machine instead of bricking the chip starts the RISC-V cores. Just defaulting to the ARM cores would've prevented this exploit from working.
  It also looks like the chip designers didn't know about the read-persistence between the validation pattern reads and the actual configuration reads achievable by glitching only the OTP memory block (the USB peripheral isn't enabled at this stage). Without access to the documentation for the OTP memory block we can only guess who deserves the most of the blame for this oversight.
mrlonglong 6 months ago

I'm annoyed they didn't bother to add floating point maths to the RISC-V cores. Its useless and that E9 bug is definitely an issue.
- brucehoult 6 months ago
  
  Most microcontrollers sold don't have hardware floating point, but they seem to be useful. I'd bet the Cortex M0 and M3 vastly outsell the M4F.
  The RP2040 didn't have floating point. It is/was popular.
- janice1999 6 months ago
  
  It's annoying but makes sense for a simple state machine / sequencer core.

userbinator 6 months ago

and the first demonstrable success story would get $20,000 and the kudos of being the winner of the challenge

There are many "MCU break" services in the Far East that specialise in this sort of thing as their core business, and $20K USD is roughly at the high end of what they'll charge (for physical work), not mere voltage glitching.

sounds 6 months ago

Aedan desoldered the RP2350 and physically cut the copper trace feeding voltage to a pin. Then he proceeded to use voltage glitching and was pretty open about how lucky he was, or in other words how unlucky Rasperry Pi Ltd. was for the attack to work.
Doesn't this sound like the RP2350 is "secure enough"? Like you mention, if the attacker can send a unit to an MCU Break service, it's game over already.
- crest 6 months ago
  
  The problem is that this isn't an attack against a single chips unique code signing key, but against all RP2350 ever produced with this hardware revision. The problem can't be fixed in software. Now the attack doesn't require anything fancier than a hot air gun, and an interposer board to isolate the USB_OTP_VDD pin for glitching, and a pair of steady hands.
- oytis 6 months ago
  
  These are all operations that only need pretty cheap and available equipment, so I think you can't call it secure enough
- immibis 6 months ago
  
  The parameters of the challenge were apparently that you could do anything at all.

dazhbog 6 months ago

Glitching a microcontroller is pretty trivial these days. You remove the caps and short the power supply for a short period when the debug register is read. Obviously you need to glitch the supply at the right exact time, hence many people use another microcontroller controlling a mosfet that shorts the supply programmatically.

NRF52s had this issue too (like many) and Nordic made new silicon revisions in 2022 (on all their products!) where they don't check for the debug register on boot (and have the protection enabled by default).

It remains to be seen if such revisions completely fix fault injection attacks though.

rkagerer 6 months ago

Isn't there a chip design technique one could use to truly disable pins/functionality after programming, such as burning out pin lines? (Not saying it's foolproof, but could it have helped here?)

v1ne 6 months ago

In theory, it's easy. But as you see, in practice, it's very difficult because the attacker doesn't have to play by your rules.
The attacker can play arbitrary shenanigans, like in this example, glitching only one power rail of many to attack a crucial part, or shine light on parts of your die. And suddenly, there is only little of this "usual" behaviour that remains.
You can look at the hardening mechanisms of Hardware Security Modules or security processors, e.g. in Smart Cards, for all the effort they take in order to detect an attacker.
To come back to your original question: Burning a "wire" is not what's usually done. I consider this to be impractical, since such a "wire" fuse would be electrically weak, impeding performance of any signal travelling through that wire. The same goes for an antifuse (I interpret the "AF" in the RP2350 datasheet as "antifuse" array), which when closed also only creates a weak electrical connection. That's why you usually use fuse bits as input to CMOS switches that will then be opened or closed.
Yet, if you would distribute these fuse bits and switches and put them directly next to their usage site, I think that could achieve your goal. Yet, still, this would mean you'd now have to route the control signals to these fuses instead, which would mean you have to route high-current or high-ish-voltage signals across your chip. So, in the end, I don't see an easy solution to this fuse dilemma.
throwing_away 6 months ago

https://en.wikipedia.org/wiki/EFuse so many

subarctic 6 months ago

I'm having trouble understanding what this vulnerability actually means in practice. To me it just seems like "if you have physical access to the chip, you can program it"

wrs 6 months ago

Not program it, read the program out of it.

sylware 6 months ago

Like the others are "more secured", and it is a bit more complicated than that... lol. Will please ARM FUD on RISC-V (remember than, currently, RISC-V is a death sentence for ARM...).

That said, is there a way to buy a RP2350 with the ARM cores disabled via hardware right from the start, on which raspberry did not pay ARM royalties?

crest 6 months ago

You could argue that it was an oversight by Raspberry Pi to default to the cores without TrustZone if the configuration read from OTP was invalid. It doesn't matter that those cores are RISC-V an ARM -v6M or -v7M core would've worked too.
oytis 6 months ago

It's not ARM vs RISC-V per se, it's how RISC-V is implemented in RP2350. But also RISC-V only has a replacement for "big" Trustzone AFAIK, so it might not be easy to bridge RISC-V with Trustzone-M aware bus/peripherals.
- wren6991 6 months ago
  
  RP2350 maps RISC-V M/U modes onto AHB protection attributes equivalent to the v8-M combinations Secure + Privileged and Non-secure + User. The system-level TrustZone-M filters apply to both architectures, since they're simple bus permission checkers (plus some branding), and therefore don't care about the processor architecture.
  Likewise the RISC-V PMP can divide M-mode and U-mode memory just like the SAU divides Secure and Non-secure memory; the SAU is just another layer of MPU. Secure/Non-secure interrupt targeting can be implemented by de-privileging an M-mode handler head into a U-mode handler tail, etc. You can read more about this stuff in chapter 10 of the datasheet (in particular section 10.6.2 covers the mapping of RISC-V modes to AHB bus attributes), but the short version is that the RISC-V cores have enough security features that you could put together a competent secure boot implementation.
  Really this attack has little to do with the cores, and everything to do with the fact it is possible to bypass the guard reads in the OTP power-up state machine and read invalid values into the critical flag state. If the flag assignment had been different we would have seen either a different hole -- e.g. direct through Arm Mem-APs into OTP, which perhaps these journalists would blame on the Arm cores -- or no hole.