Constant-time support coming to LLVM: Protecting cryptographic code

blog.trailofbits.com

102 points by ahlCVA a day ago

frabert a day ago

This has been a sore point in a lot of discussions regarding compiler optimizations and cryptographic code, how compilers and compiler engineers are sabotaging the efforts of cryptographers in making sure there are no side-channels in their code. The issue has never been the compiler, and has always been the language: there was never a way to express the right intention from within C (or most other languages, really).

This primitive we're trying to introduce is meant to make up for this shortcoming without having to introduce additional rules in the standard.

fanf2 19 hours ago

What happened to the blog post? It was moved and now it has disappeared :-(
- Meneth 19 hours ago
  
  Archived copy here: https://archive.is/tIXt7
- foresto 13 hours ago
  
  https://web.archive.org/web/20251125224147/https://blog.trai...
- frabert 10 hours ago
  
  Sorry, it was published early and we have to wait for some approval checks to clear
GhosT078 6 hours ago

SPARK is a very expressive language for implementing cryptographic applications. It is available for some LLVM targets (e.g. x86-64).
jfindper 21 hours ago

>how compilers and compiler engineers are sabotaging the efforts of cryptographers
I'm not exposed to this space very often, so maybe you or someone else could give me some context. "Sabotage" is a deliberate effort to ruin/hinder something. Are compiler engineers deliberately hindering the efforts of cryptographers? If yes... is there a reason why? Some long-running feud or something?
Or, through the course of their efforts to make compilers faster/etc, are cryptographers just getting the "short end of the stick" so to speak? Perhaps forgotten about because the number of cryptographers is dwarfed by the number of non-cryptographers? (Or any other explanation that I'm unaware of?)
- chowells 21 hours ago
  
  It's more a viewpoint thing. Any construct cryptographers find that runs in constant time is something that could be optimized to run faster for non-cryptographic code. Constant-time constructs essentially are optimizer bug reports. There is always the danger that by popularizing a technique you are drawing the attention of a compiler contributor who wants to speed up a benchmark of that same construct in non-cryptographic code. So maybe it's not intended as sabotage, but it can sure feel that way when everything you do is explicitly targeted to be changed after you do it.
- stouset 21 hours ago
  
  It’s not intentional. The motivations of CPU designers, compiler writers, and optimizers are at odds with those of cryptographers. The former want to use every trick possible to squeeze out additional performance in the most common cases, while the latter absolutely require indistinguishable performance across all possibilities.
  CPUs love to do branch prediction to have computation already performed in the case where it guesses the branch correctly, but cryptographic code needs equal performance no matter the input.
  When a programmer asks for some register or memory location to be zeroed, they generally just want to be able to use a zero in some later operation and so it doesn’t really matter that a previous value was really overwritten. When a cryptographer does, they generally are trying to make it impossible to read the previous value. And they want to be able to have some guarantee that it wasn’t implicitly copied somewhere else in the interim.
- layer8 21 hours ago
  
  “Sabotage” can be used in a figurative sense that doesn’t insinuate intent. An adjacent example is “self-sabotage”, which doesn’t imply intent.
  - layer8 18 hours ago
    
    Since the sibling comment is dead and thus I can’t reply to it: Search for “unintentional sabotage”, which should illustrate the usage. Despite appearances, it isn’t an oxymoron. See also meaning 3a on https://www.merriam-webster.com/dictionary/sabotage.
  - jfindper 19 hours ago
    
    Every dictionary I've looked at, wikipedia, etc. all immediately and prominently highlight the intent part. It really seems like the defining characteristic of "sabotage" vs. other similar verbs. But, language is weird, so, ¯\_(ツ)_/¯.
- colmmacc 5 hours ago
  
  I don't think it's nefarious but it is sabotage. There's long been an implicit assumption that optimization should be more important than safety.
  Yes, languages do lack good mechanisms to mark variables or sections as needing constant-time operation ... but compiler maintainers could have taken the view that that means all code should be compiled that way. Now instead we're marking data and section as "secret" so that they can be left unoptimized. But why not the other way around?
  I understand how we get here; speed and size are trivial to measure and they each result in real-world cost savings. I don't think any maintainer could withstand this pressure. But it's still deliberate.
  - aw1621107 5 hours ago
    
    > Now instead we're marking data and section as "secret" so that they can be left unoptimized. But why not the other way around?
    Worse cost-benefit tradeoff, perhaps? I'd imagine the amount of code that cares more about size/speed than constant-time operation far outnumbers the amount of code which prioritizes the opposite, and given the real-world benefits you mention and the relative newness of concerns about timing attacks I think it makes sense that compiler writers have defaulted to performance over constant-time performance.
    In addition, I think a complicating factor is that compilers can't infer intent from code. The exact same pattern may be used in both performance- and timing-sensitive code, so absent some external signal the compiler has to choose whether it prioritizes speed or timing. If you think more code will benefit from speed than timing, then that is a reasonable default to go with.
Asooka 10 hours ago

There really ought to be a subset of C that lets you write portable assembly. One where only a defined set of optimisations are allowed and required to be performed, "inline" means always inline, the "register" and "auto" keywords have their original meanings, every stack variable is allocated unless otherwise indicated, every expression has defined evaluation order, every read/write from/to an address is carried out, nothing is ever reordered, and undefined behaviour is switched to machine-specific behaviour. Currently if you need that level of control, your only option is writing it in assembly, which gets painful when you need to support multiple architectures, or want fancy features like autocomplete or structs and functions.
- delta_p_delta_x 10 hours ago
  
  > want fancy features like autocomplete or structs and functions
  I would argue that given a certain ISA, it's probably easier to write an autocomplete extension for assembly targeting that ISA, rather than autocomplete for C, or goodness forbid, C++.
  Likewise for structs, functions, jump targets, etc. One could probably set up snippets corresponding to different sorts of conditional execution—loops, if/else/while, switch, etc.
- saagarjha 10 hours ago
  
  Why would you like register and auto to have meaning?
  - thyristan 9 hours ago
    
    Because for timing-sensitive code, those are important. If a variable is really a register, cache-based timing attacks just don't happen, because there is no cache in between.
fooker 21 hours ago

> making sure there are no side-channels in their code
Any side effect is a side channel. There are always going to be side channels in real code running on real hardware.
Sure you can change your code, compiler, or, or even hardware to account for this but at it's core that is security by obscurity.

amluto 19 hours ago

Too bad that Intel chips more or less reserve the right to take LLVM’s nice output and make it non-constant-time anyway. See:

https://www.intel.com/content/www/us/en/developer/articles/t...

Sure, you could run on some hypothetical OS that supports DOITM and insert syscalls around every manipulation of secret data. Yeah, right.

JoshTriplett 18 hours ago

Last I saw, it seemed like the plan was to unconditionally enable it, and on the off chance there's ever a piece of hardware where it's a substantial performance win, offer a way to opt out of it.
- amluto 16 hours ago
  
  I advocated for that, and then I completely lost track of the status.
  The whole design is ridiculous.
  - matu3ba 6 hours ago
    
    What would be more sane alternatives, when it becomes obvious that any side-effect of timing is a potential attack vector? See https://www.hertzbleed.com/ for frequency side channels. I do only see dedicated security cores as options with fast data lanes to the CPU similar to what Apple is doing with Secure Enclave or do you have better suggestions that still allow performance and power savings?
    
    amluto 3 hours ago
    
    A design such that it would actually make sense for a compiler to mark code that should permit data-dependent CPU optimizations differently from code that should not.
    This could be done using an opcode prefix, which would bloat code but would work perfectly. Or it could use an RFLAGS bit or a bit in MXCSR or a new register, etc.
    Almost anything would be better than an MSR that is only accessible to privileged code.
    
    cesarb 3 hours ago
    
    > Or it could use an RFLAGS bit or a bit in MXCSR or a new register, etc.
    > Almost anything would be better than an MSR that is only accessible to privileged code.
    ARM does that: their flag (DIT) is accessible by non-privileged code. If you know the architecture has that flag, either because your -march= is recent enough or because the operating system told you so through the hwcaps or the emulated id registers, you can use it freely without needing to switch to privileged mode through a syscall.
    
    JoshTriplett 3 hours ago
    
    The one reason I can imagine to make it privileged-only is that it could be high-overhead to switch: if a CPU conditioned various kinds of predictors on it, it might have to flush those predictors when toggling it.
stingraycharles 18 hours ago

Sorry, I may be missing the point here, but reading that page doesn’t immediately make it obvious to me what that feature is. Is it some constant time execution mechanism that you can enable / disable on a per-thread basis to do… what exactly?
- seanhunter 10 hours ago
  As a concrete example, say I have a (very naive and bad) password-checker that works like this pseudocode:
  > for i = 1 to len(real_password) { > if entered_password[i] != real_password[i] { > return FAILURE > } > } > > return SUCCESS
  OK now an alert attacker with the ability to very accurately record the time it takes to check the password can determine the length at least of the real password, because the time complexity of this check is O(length of the real password), and they could also gradually determine the password itself because the check would take longer as the attacker got each successive character correct.
  Taking this general idea and expanding it, there are lots of places where the timing of branches of code can leak information about some secret, so in cryptographic code in particular, it’s often beneficial to be able to ensure that two branches (the success and failure branches in the above) take exactly the same amount of time so the timing doesn’t leak information. So to fix the above you would probably want to do two things. Firstly set a boolean to failure and still continue the checking to ensure the “return failure quickly” problem doesn’t leak information and also change your password check to check against a fixed-width hash or something so the length of the password itself wasn’t a factor.
  The problem is lots of performance optimizations (pipelining, branch prediction etc) work specifically against this goal- they aim to take branches quickly in the happy path of the code because normally that’s what you want to ensure optimal performance.
  So say instead of the above I do
  > bool status = SUCCESS > for i = 1 to hash_length { > if hash_of_entered_password[i] != hash_of_real_password[i] { > status = FAILURE > } > } > > return status
  …I don’t want the optimizer to realize that when status becomes FAILURE it can never become SUCCESS again and the loop doesn’t do anything else so just return early. I want it to actually run the pointless comparison of the rest of the hash so the timing is exactly the same each time.
  But now my check is constant time but I’ve shifted the burden onto the person who writes the hash function. That has to run in constant time or my check will once again leak. So in general people want the ability to tell the compiler that they want a particular piece of code to run in constant time. At the moment, in the general case I think you have to break into inline assembly to achieve this.
  - codedokode 6 hours ago
    
    By the way, for small arrays removing branches also improves performance (according to my tests). So if you do a constant-time comparison like this:
    match = True for a, b in pad(entered_pass, real_pass): match = match and a == b return match
    Then it will be faster as well. I was surprised to find this.
  - rkagerer 9 hours ago
    
    Why not just always spin until a fixed number of ticks (or microseconds for slewing clocks) have passed (starting from function entry), prior to returning?
    Obviously this doesn't mitigate power usage side channel attacks, but that's not the point here.
    It's time-bound, so let's check time.
    
    seanhunter 8 hours ago
    
    You could totally do that, but in the exact same way as the above, you'd want the compiler not to optimize your spinlock away thinking it wasn't needed. My understanding is in lots of real applications, the asm code that I mentioned is in part making sure it waits specific numbers of clocks in each branch to ensure they all exactly balance.
    
    littlestymaar 9 hours ago
    
    Wouldn't that be much less efficient?
- wat10000 17 hours ago
  
  It turns off CPU features that could cause execution time to vary in a way that depends on the data being operated on.

charcircuit 13 hours ago

These are meaningless without guarantees that the processor will run the instructions in constant time and not run the code as fast as possible. Claims like cmov on x86 always being constant time are dangerous because a microcode update could change that to not be the case anymore. Programmers want an actual guarantee that the code will take the same amount of time.

We should be asking our CPU vendors to support enabling a constant time mode of some sort for sensitive operations.

adrian_b 6 hours ago

Nowadays both Intel/AMD CPUs and Arm-based CPUs guarantee that a certain subset of the instructions are executed in constant time.
For an example of a list of such instructions see:
https://www.intel.com/content/www/us/en/developer/articles/t...
However, cooperation from the operating system is necessary, as the constant-time execution mode may need to be enabled by setting certain CPU-control bits in protected registers (e.g. IA32_UARCH_MISC_CTL[DOITM]).
See for instance:
https://www.intel.com/content/www/us/en/developer/articles/t...
CMOV is on the list of instructions with constant-time execution, but the list is valid only with the corresponding control bit set correctly.
- cesarb 4 hours ago
  
  > However, cooperation from the operating system is necessary, as the constant-time execution mode may need to be enabled by setting certain CPU-control bits in protected registers (e.g. IA32_UARCH_MISC_CTL[DOITM]).
  The way ARM does this is way better, since it doesn't need help from the operating system: user-space can directly set and clear the DIT bit. Operating system cooperation is necessary only to know whether that bit exists (because the ID registers are not directly readable by user mode).
derangedHorse 7 hours ago

I agree. For use cases where side channel attacks are likely to be attempted, the security of the system ultimately depends on both the software and hardware used.
Gibbon1 10 hours ago

That's been one of my counters to the bitch that C isn't safe. The underlying architecture isn't safe.
That said WG21 and WG14 don't seem to be able to get the memo that safety is more important than single core speed. Or as I suspect a bunch members are actually malicious.

ethin 20 hours ago

So this makes me curious: is there a reason we don't do something like a __builtin_ct_begin()/__builtin_ct_end() set of intrinsics? Where the begin intrinsic begins a constant-time code region, and all code within that region must be constant-time, and that region must be ended with an end() call? I'm not too familiar with compiler intrinsics or how these things work so thought I'd ask. The intrinsic could be scoped such that the compiler can use it's implementation-defined behavior freedom to enforce the begin/end pairs. But Idk, maybe this isn't feasible?

grogers 6 hours ago

It'd be very hard for the compiler to enforce constant-time execution for generic code. As an example, if you wrote the naive password checking where the first byte that doesn't match returns false, is that a compiler error if it can't transform it into a constant time version?
zzo38computer 20 hours ago

Maybe it might be better to be implemented as a function attribute instead
- ethin 19 hours ago
  
  Or a pragma? Like how OpenMP did it?

zzo38computer 21 hours ago

I think __builtin_ct_select and __builtin_ct_expr would be good ideas. (They could also be implemented in GCC in future, as well as LLVM.)

In some cases it might be necessary to consider the possibility of invalid memory accesses (and avoid the side-channels when doing so). (The example given in the article works around this issue, but I don't know if there are any situations where this will not help.)

codedokode 6 hours ago

The names are pretty unattractive (which is not uncommon for C) though to the point one would want to avoid them in the code.
connicpu 20 hours ago

The side channel from memory access timings are exactly why cmov is its own instruction on x86_64. It retrieves the memory regardless of the condition value. Anything else would change the timings based on condition. If you're going to segfault that's going to be visible to an attacker regardless because you're going to hang up.
- wahern 20 hours ago
  
  AFAIU, cmov wasn't originally intended to be a guaranteed constant-time operation, Intel and AMD won't commit to keeping it constant-time in the future, but it just so happened that at one point it was implemented in constant-time across CPUs, cryptographers picked up on this and began using it, and now Intel and AMD tacitly recognize this dependency. See, e.g., https://www.intel.com/content/www/us/en/developer/articles/t...
  > The CMOVcc instruction runs in time independent of its arguments in all current x86 architecture processors. This includes variants that load from memory. The load is performed before the condition is tested. Future versions of the architecture may introduce new addressing modes that do not exhibit this property.
  - adrian_b 4 hours ago
    
    At your link there is a link to the list of instructions that guarantee constant execution time, independent of the operands.
    The list includes CMOV.
    However, the instructions from the list are guaranteed to have constant execution time, even on any future CPUs, only if the operating system sets a certain CPU control bit.
    So on recent and future Intel/AMD CPUs, one may need to verify that the correct choice has been made between secure execution mode and fastest execution mode.
- zzo38computer 20 hours ago
  
  I mean the possibility that the rest of the program guarantees that the address is valid if the condition is true but otherwise it might be valid or invalid. This is probably not important for most applications, but I don't know if there are some unusual ones where it would matter.

rurban 5 hours ago

Or just disable the fucking broken optimizer for such a function.

    #pragma GCC optimize ("O0")

adrian_b 5 hours ago

Disabling optimizations does not necessarily result in more deterministic execution.
With "-O0", the generated code normally retains a huge number of useless register loads and stores, which lead to non-deterministic timing due to contention in the use of caches and of the main memory interface. Optimized code may run only inside registers, being thus executed in constant time regardless of what other CPU cores do.
The only good part is that this non-deterministic timing will not normally depend on the data values. The main danger of the non-constant execution time is when this time depends on the values of the processed data, which provides information about those values.
There are cases when disabling optimization may cause data-dependent timing, e.g. if with optimization the compiler would have chosen a conditional move and without optimization it chooses a data-dependent branch.
The only certain way of achieving data-independent timing is to use either assembly language or appropriate compiler intrinsics.