Bazaar

Bazaar

 




Wiki Tools

  • Find Page
  • Recent Changes
  • Page History
  • Attachments

Numbering Revisions

This document is intended to describe how Revisions are assigned numbers, and how this relates to the Mainline concept.

Identifying Revisions

Every Revision has a Revision ID (also called a revid), which is a long opaque string that serves as a Universally Unique ID; no other Revision anywhere will ever have the same ID, and anywhere this Revision is, it will have that ID.

However, long opaque strings have some negative properties. They aren't very human friendly; they're hard to remember, they're impossible to compare with each other and tell anything about the revisions they refer to, and so on. Numbers are much easier to deal with, and so bzr also has Revision Numbers (called revnos).

These are not part of the revision proper though, and a given revision could have many different numbers associated with it in the world. The numbers are local to a given Branch, and are derived on the fly from its placement in the branch.

In this document, numbers will refer to a revno and we'll use letters to stand for the revid.

Linear History

In the case of linear history, as you get with most centralized systems like CVS, the numbering is easy. The first revision is 1, the next is 2, the next is 3, and so on.

  C (3)
  |
  B (2)
  |
  A (1)

Here we have a branch with three revisions, with revids A, B, and C. Because they're laid out in a simple linear progression, they get simple incrementing numbers. This is what you expect in simple cases; you make your first commit, it's called 1. Your second commit is 2, and your third is 3. You don't need to see A, B, or C (which are actually much longer and uglier strings than just the plain letters).

And this pattern continues, as you get revno 10, 100, 106,285, and so on.

At least as long as you only have a linear history.

Branching

However, in a distributed system, you end up with a lot of branches and merging in your history. Consider the case where there are two diverged branches of a project:

   /a        /b
  ---       ---
   D (4)     E (4)
   |         |
   C (3)     C (3)
   |         |
   B (2)     B (2)
   |         |
   A (1)     A (1)

Here we have two different branches of a project. They each have three revisions (A B C) in common, those being numbers 1, 2, and 3. Each of them has commited a new revision, D and E (respectively). For each branch, that new revision is called number 4 (the same revno), even though they're different revisions (different revid).

Now let's pretend we're /a, and want to merge in /b's work. We end up with our revision graph looking like

   (5) F
       |\
   (4) D E (3.1.1)
       |/
   (3) C
       |
   (2) B
       |
   (1) A

We've added a new revision with revid F onto the end of our graph. It's given the revno 5. We've also gotten the revision with revid E from /b. But it doesn't have the revno 4 that it has in the /b branch; our revision D already has that number. We could call it 5, but that would be confusing, because it would imply that it was based off what we call 4 (D), but it's not. So we need some new way to number it.

So revid E is given a dotted revno, showing that it's not part of our mainline.

Mainline

In concept, any merge of two branches is symmetric, meaning roughly merge these two things together. However, socially and mentally that's often not what you think of yourself as doing. You think of it asymmetrically, as merge THAT into THIS (often merge THAT into MINE).

bzr exploits this asymmetry to create a concept of a mainline through the branch. For Revisions that only have a single parent (like B C D E above), we just follow their parent link. For Revisions with multiple parents, resulting from a merge (like F), we follow the left or first parent. That parent link refers to your previous state (D), rather than the right or second (or later) parents, which refer to the state (E) you merged in from "elsewhere".

The result is that this mainline is basically a history of the revisions that you made on this branch with bzr commit, rather than those that were made elsewhere and you brought in via bzr merge. Many things in bzr make use of this distinction to allow you a more direct view to what I'm working on rather than details of what I've gotten from elsewhere.

Numbering

In the case of revnos, we use that split to assign the integers to the revisions on the mainline. That way every commit you make ends up with a number one higher than your previous commit. This fits a mental model of "first I did THIS, then I did THAT, next I did...".

The revisions that you got from "elsewhere", brought in via merge, get dotted revnos. In the current bzr implementation, these are numbered by finding the latest mainline revision they're descended from (in this case, C (3), and counting from there. If we had merged multiple revs, like starting from

   /a        /b
  ---       ---
             G (5)
             |
   D (4)     E (4)
   |         |
   C (3)     C (3)
   |         |
   B (2)     B (2)
   |         |
   A (1)     A (1)

we'd end up with a graph and numbering like

   (5) F
       |\
       | G (3.1.2)
       | |
   (4) D E (3.1.1)
       |/
   (3) C
       |
   (2) B
       |
   (1) A

The first digit of the dotted revno (e.g., 3.1.2) refers to the mainline revision these are descended from. The middle digit is used for the case where we end up with multiple branches from the same place. And the last counts how far it is along that side path.

  • Remember that these numbers are not stored. They're derived based on the branch when they're needed. So the way they're numbered could be changed in later bzr versions.

Resync

Now, in the case above, our branch /a which has had all this done, is a superset of the branch /b; we have everything that was in it. So now /b can pull from /a (or /a can push onto /b) to sync up. When that happens, though, /b will be the same as /a, including having the same mainline. This means that the revision E, which previously has the revno 4 in /b, will now be called 3.1.1, just like in /a. 4 will now refer to D.

In many workflows, this isn't desirable. In those situations, people tend to sync up with each other via merge rather than push/pull. Using merge will never change your existing numbering, it'll just add new pieces onto the end. In the situation above, if /b does a merge, it will merge in two new revisions; D and F. And the new commit will be H. /b is now a superset of /a, and can push onto or be pulled from, but with the same caveat (the numbering will now reflect b's). Often /a won't want that, so it would use merge instead of pull to sync up.

This can create an infinite loop, though. /a merges, but the only thing to merge is that revision H, which doesn't have any file changes. It just connects existing revisions together. /a does that, and creates a new revision I (with again no changes). But now /a has something extra that /b can merge, and create J. Back to /a, merge that and create K... and so on, with no actual changes, just endless ping-ponging of revision graphs.

There's no inherent technical solution to that. The only thing to do really is don't do that. If there's nothing meaningful to merge, don't bother merging.

Confusing?

It's possible this could cause confusion, because the revision '4' can mean different things in different branches. However, the revision 'E' always means the same thing, so the revid can be used when it's necessary to disambiguate.

Leaving aside exotic situations, the only way to change the mainline is to push or pull onto the branch. When the branch is moved forward only via commit (perhaps with a preceding merge), the mainline is the same, so all existing numbers remain unchanged.

Generally, for any given project, there's at least one branch that serves as a trunk, and is used in such a way that the mainline never changes. It's only merge'd into, not pushed over. There is an append_revisions_only branch configuration variable which can be set to help this; it will make bzr refuse to do any operations that would change the mainline. In this situation, the revnos on that branch will be stable over time and can be usefully referred to.

Generally, the more transient a branch is, the less useful long-term are its revision numbers. Still, they're useful in the short-term while you're working on it.

Upstream

SpotDocs