This post was written by Conrad Ludgate, an Atuin contributor
End-to-end encryption is an essential component of Atuin.
One of our core philosophies, when we created the sync service, was that we didn't want to worry about storing user data. The shell is a very sensitive system with API keys, AWS credentials, account passwords, etc. We didn't want to give the opportunity for that data to leak, either through an attack, or through a mistake on our part.
If there's one thing I have learnt as an engineer, it's that cryptography is hard. If you are an expert in cryptographic implementations or cryptoanalysis, please get in touch.
This post will cover my research as - a non-crypto expert - into the long-term security of Atuin history data.
Disclaimer, where reasonable, I have considered the security of Side channel attacks. Right now, our biggest concern is attacks on the atuin server, where the encrypted data is stored at rest. All atuin data is stored unencrypted on your local device in order to perform search queries. Improvements to cryptographic implementations can come in later revisions if any realistic side-channel attacks are found.
All the way back in April 2021, in our V0.5 release, Ellie decided to use the NaCl standard (aka salt/libsodium) for our encryption as a tried and trusted standard.
Specifically, secretbox was the algorithm of choice.
If you're not familiar, secretbox is an implementation of authenticated symmetrical encryption. This means that only the owner of the encryption key can decrypt the data (this will be the user), and that any attempts
to tamper with the data can be detected.
Honestly, this is a great system and offers everything we needed. However, our interface to libsodium is a now unmaintained crate called sodiumoxide and had issues being portable. Because of this, I started looking into what algorithms libsodium uses underneath and if we can use a native Rust implementation.
Secretbox is made up of two main components. A stream-cipher and a message authentication code. These are XSalsa20 and Poly1305 respectively, designed by NaCl's author Daniel J. Bernstein. In a brave effort, I decided to roll my own crypto and implement this XSalsa20 + Poly1305 system in Rust.
NOTE: I didn't actually implement the underlying algorithms. we are using:poly1305salsa20
From the RustCrypto project.
These algorithms are not known to be vulnerable to software-based side-channel attacks.
Back to the drawing board
After peeling back the veil that is our cryptographic implementation,
I started thinking a lot more about just how secure the system is.
The more I started looking, the more I noticed potential improvements.
Salsa20/Poly1305 both date back to 2005. In another 20 years, is this system going to still be secure?
Let's take a look at some potential attacks
We don't guarantee a unique Initialisation Vector (IV) per message
We use a random 192-bit IV. There is a known attack on stream-ciphers if the
Key + IV pair is ever re-used. For all practical purposes, this is enough, assuming the OS random source is any good. A birthday attack calculation suggests that it needs in the order of 10^23 messages for a one-in-a-trillion chance of collision.
This is not an issue as all of our users combined are never going to generate 10^23 entries, and we certainly aren't willing to store zettabytes of their data.
We use the same key for each message
Shell history is quite predictable. If you have a 2-byte history entry, it's quite likely that it's
ls. Given the encrypted blob, you can start to brute force the associated key. A proof was published stating that no attack on Salsa20 with 128-bit key is possible with an average search time of less than 2^130 (about 10^39) random guesses.
To put that number into perspective. Performing 1 billion key operations per CPU core per second, and using a suite of 1 billion CPU cores, the attack will take roughly 10 trillion years.
Atuin uses a 256-bit key which is even more secure, and therefore not at risk of a practical brute-force attack. It follows that we are likely safe from a known plain-text attack.
However, there is still the issue of key leaking. We have no key-upgrade policy.
If a key is leaked, maybe through a side-channel attack, a social attack, or malware,
then the only solution is to create a new account with a new key.
This is partially an issue.
What we can change
While researching these systems, I learnt of many new cryptographic techniques that some modern systems use. While the analysis above indicates that we are protected, there might be attacks we are unaware of, so keeping up with modern research is important.
We're also in the middle of redesigning our sync service. While we're already planning a big change, we might as well consider updating the encryption too.
A common approach to encrypting lots of items is the use of wrapped keys. The idea here is that each payload has an associated random encryption key. This key is then itself encrypted (wrapped) using the master key and stored with the data.
Initially, this seemed less secure to me. However, my research seems to point out that the master key is less vulnerable to side-channel attacks since it is less used. It also offers no decrease in overall security since brute-forcing the master key from an random key is just as hard as it is for any message. In the end, it's like a password manager for your encrypted data.
This would unlock some potential future upgrades.
- Key rotation is easier since you need to re-encrypt the wrapped keys. This means much less data needs to be updated.
- Wrapped data keys can be decrypted in Hardware Security Modules (HSM) which are immune to side-channel attacks
XSalsa20 was later superseded by XChaCha20 by the same author. It has a very similar construction, but the stream cipher has better mixing characteristics, which makes any non-brute-force attacks harder to craft.
I started to craft a new solution using these concepts. But eventually, I realised that I shouldn't be reinventing the wheel here. During more and more of my research sessions, I stumbled upon PASETO. While the intended use case is security tokens, their local encryption scheme is designed such that encrypted data is safe to be shared publically. Their V4 scheme also uses the XChaCha20 cipher which I was initially planning to use.
In the end, I bit the bullet and decided to use the standard. The nice thing with secretbox is that existing implementations in other languages are widely available. Making it easy to implement sync in third parties. If we implemented our own scheme, that would make it much easier for third parties to make mistakes if they wanted to use the sync data directly.
Using PASETO, there are existing implementations that we didn't have to write. This means that we don't build software doomed to die a lonely death. It also means that we benefit directly from future versions of the specification.