Secure Untrusted Data Repository
The motivation for SUNDR is simple and powerful. Can we get trusted storage from an untrusted host?
Our goal is integrity over secrecy, we want to make sure our data has not been tampered with. Our tools are cryptographic hashes and digital signatures
A hash function takes in arbitrary data and outputs a fixed size “signature”. It is a powerful one-way function, such that an attacker cannot realistically find another piece of data that results in the same hash. These can be used to ID data and check integrity.
We secure data by storing with associated hashes. You can’t fool a client who has the correct hash. We could store larger amounts of data in a tree structure.
To enable mutable storage, we need digital signatures, which are made up of a public-private key pair. You generate a signature from data with a private key which can then be verified for the data using the associated public key.
We’re concerned that an untrusted server could give over an old past value, which these tools don’t immediately prevent. An attacker cannot create fake values, but there’s nothing guaranteeing recency.
SUNDR presents a strawman protocol. The server stores the entire log of filesystem changes. Whenever a client either modifies or fetches,
- The client has to verify the history before it does anything, making sure each signature covers everything appropriately, and that its own last operation is in the log
- It then appends its operation and signs the entire history
- Only then can the client upload back to the server
This is a strawman because it stores a painful amount of data and verifying is extremely costly. It otherwise works pretty well. The worst an attacker could do is “fork” clients from each other, permanently separating their changes. Otherwise, it cannot introduce any inconsistencies.
We call this fork consistency. It’s still not great, since we’d rather not let this happen. One way to mitigate it is to use a trusted “time box”, who signs the history every seconds. If a client sees that there are no more signs from the time box, it knows it has been forked.
SUNDR stores data in the following scheme,
- An i-table maps i-node numbers (essentially a pointer) to i-node hashes (block hashes)
- i-tables are associated with users, through user i-handles
- A group i-table directs towards i-node numbers in user i-tables
- Group i-tables are associated with group i-handles
Whenever a block is modified, we must traverse up to the i-handle
The key to maintaining fork consistency with i-handles in an efficient way is storing a version structure for every user, containing
- The user’s latest i-handle
- A version vector, storing the version number of each user, essentially how many operations that user has performed
- A signature
To write,
- We fetch each VS and validate
- We get the proper i-handle and execute, updating the v-table
- We update our VS
This protocol works fine if we maintain a global lock, but things get more confusing when we allow concurrent disjoint updates