Created: 2005-09-06 by MartinPool
Implementation branch: none yet
We wish to add the Arch feature of gpg-signed revisions. These serve as protection against accidental or malicious corruption of an archive, by testifying that a revision was actually created by the person who claims to have created it.
For each revision, create a testament of the revision, and a hash of the testament. The testament is a canonical form of all the data we consider relevant to a revisions integrity, such as the committer, commit message etc. The testament of a revision changes if file content, file properties, paths, etc change. Signing the testament will sign the whole tree, as will signing the sha1 of the testament.
Revisions should normally be signed by a GPG key corresponding to the committer's email address given in the revision.
I think that multiple signatures should be permitted, to allow for signature upgrades without breaking support for people who use the old signature. --AaronBentley RobertCollins - good point, and we should discuss this at UBZ.
What about having signatures of signatures? So that if you submit something to me, and I merge it, I can sign off on it, to indicate that I agree that this is good. This is more important if I "pull" your change, since the revision text will be stored exactly as you committed it. Note that this means the name on the signature will *not* match the one in the commit log. (similar problem if I sign an unsigned commit) --JohnMeinel
RobertCollins - this is an interesting thing, it gets into the strong policy area of systems like Aegis - maybe it would be better to look at being able to leverage aegis, or have aegis leverage bzr ? (the general question is, at which point does such policy stop being core to a VCS system and become crud for us).
The design should allow multiple hashes (like Bazaar 1 does) at both levels (inventory and signature) to ease the migration if SHA-1 is broken. --MatthieuMoy
RobertCollins - migration of signatures and hashes is planned. Its important to note that baz 1 only supported multiple hashs to preserve compatability with tla: the hash in gpg signatures is typically sha1, and baz 1 had no incremental upgrade faculty for signatures.
Each revision can be determined to be in one of these states:
- Unsigned (no signature present)
- Invalid signature (ie signature does not correspond to the revision testament.)
- Signed by an unknown key; this should rarely happen if gpg is configured to auto-fetch keys
- Signed by a key not belonging to the purported committer
- Signed by a key not *trusted* to belong to the purported committer (including revoked/expired keys)
- Signed by an appropriate trusted key
These should give a warning or an error depending on user policy. The default may be at first that we give only warnings.
If a branch is upgraded, changing the representation of past revisions, their signatures will be invalidated. It may be necessary to re-sign them; possibly using a different key to that originally used, which suggests we may need a way for users to specify a "master key".
Re-signing may also be needed when a key expires or is revoked.
Decide how to invoke gpg: by directly running gpg itself, or through PyMe or some other wrapper library.
- For signing, invoking directly. For verification, pyme is planned.
- Write out signatures when making commits. DONE.
- Propagate signatures when copying revisions between branches. DONE.
- Validate signatures of revisions
This does not address the question of what users should be making commits to a particular branch.
- It might be useful that commands that fetch revisions from a branch take a parameter specifying a list of acceptable keys.
- It might also be good to have, as discussed below, a list of globally trusted keys: not just trusted to be that person, but trusted to write non-malicious code. One could have a global default list and a per-branch list.
- Over time the format stored in the archive may change, which means signatures will break if the archive is upgraded (or conversely we may need to keep reading old formats, which may be inconvenient.) The signature may inadvertently cover information that is "accidentally" present in the archive, such as the last-changed revision, but not essential to what is signed. Dependence on internal data also makes attaching signatures to mailed changesets difficult.
So the approach taken is to have a function that produces a "testament" from a revision, containing only the information that will be attested to. Since the information in this form is only what is needed for the signature, we should still be able to reproduce it even if the internal form has changed.
The testament can itself be versioned so that we can evolve it to use, for example, stronger hash functions in the future.
In doing this we win by having exact control on the output, as we are comparing for equality, not for semantic equivalence -> thus having parsable/robust content is not terribly important. As such it's better to use a plain text format rather than XML which gives us byte-by-byte control of the contents.
Sketch of contents:
bzr signing-revision v5 revision-id: mbp@238912-123-123-123 committer: Martin Pool <firstname.lastname@example.org> date: 213897123 timezone: 36000 parent: robert@foodfof-123-123 dir src file src/foo.c babacba8718927381232983 foo.c-123-123123 ...
The inventory is included inline.
This can be calculated entirely in-memory and piped through GPG to make a detached signature. Alternatively it can be stored with a GPG attached signature around it, allowing it to easily be verified against a working version.
Lalo's brain dump
Having thought about that for a while (and partially prototyped parts of it), here's what I think:
I'd perhaps move to a more monotonic approach to signing. I don't really care that it's signed by a key matching the committer's email, but rather, whether it's signed by a key I trust.
-- MatthieuMoy: One must be carefull about the definition of "trust" here. In the GPG terminology, to "trust" a key means you trust the fact that the key actually belongs to the person it says it does. What we want here is to "trust" the key in the sense "I know that person well enough to trust him/her not to introduce backdoor". -- RobertCollins yes, its about setting a policy for a person. You need the GPG 'I know foo is really foo' to be able to set that policy.
So I'd rather have a "signature store", where there can be any number of signatures (0..) for any given revision. So a revision would be as trusted as the "best" key that signed it.
The reason I would like to see this feature is that it's awesome for code review. I write some code, and submit it; you review it. Then you can either:
- merge it into your branch, and then sign my revision on it
- sign my revision and send me the detached sig
(but even if you do the former, I can still "pull" the signature from your branch.)
Then some tool like pqm can accept revisions based on how many signatures it has and how trusted they are.
As for merging - I think signing is kind of transitive, seeing as bzr is about snapshots. So I only really care about the "trustedness" of the heads. If the head of your branch is trusted, then I can merge it into mine, even if one of the parent revisions is completely unsigned.
There's a rationale for that, in case it isn't clear. When you sign your branch's head, you're signing that specific snapshot, meaning, the current state of all files. It may well be that you merged in a revision that you don't really trust enough to sign it; but, on manual merge resolution, you cleaned it up enough that you can now sign it. So I'm ready to trust your merge, even if it has a dubious parent.
Another odd "semi-use-case" - you can commit work in progress to your private branch but not sign it (or sign it with a secondary, lower-grade, no-passphrase key), so if people try to merge from it, they will get the last one with a good signature, rather than the head (unless they really *want* work in progress, in which case they say so explicitly to bzr). Alas, better than tags
-- MatthieuMoy: Would be really nice to have this. (doesn't the Linux kernel use this kind of features? Usually, the Changelog mentions "Signed-off-by:" for several persons for each changeset. I don't know if this correspond to a cryptographic signature).
- No, it specifically doesn't correspond to a cryptographic signature. It's basically a tiny copyright grant: explicit consent for the code to be distributed under the GPL, etc.