Created: 2006-08-21 by RobertCollins
See also: BzrCherrypickMetadata
Record cherry picks as additional revision parents, with a non-transitive marker. Incrementally bring in UI and merge features based on this data.
Cherry picking is a commonly desired feature, and even with smart merge algorithms that use line identity we do need to be able to identify the lines originating from non-fully-merged revisions.
We've previously discussed storing extra parents on merges that have a merged-parent, but are not a full commit. We have not implemented that at this point, and its somewhat orthogonal to the key discussion here.
To implement it we need to have our model understand that a referenced revision may not in fact be fully-merged. Even if its not fully-merged, we will want to keep the revision around.
This spec is not complete, its primarily a brain dump that I (RobertCollins) hope interested parties will follow through.
Cherrypicks are Arcs on the graph
When we do a merge, we are including all the work done in another branch into our branch. That is, we record a new snapshot with parents of our branches tip, and the other branches tip. If we talk about the changes themselves, then we need to talk about the arcs on the graph - where a node is a revision snapshot, and an arc is a path from one node to another.
So when you do a cherrypick today by doing merge -r x..y BRANCH this results in a merge of the arc (x,y). Now, (x,y) may cover a number of revisions, and merges and the like. Some of which may pre-date x - imagine that x is B and y is E. If the revision graph is:
A: B:[A] C:[A] D:[C] E:[B,D]
then the arc (B, E) will include lines created by any of the commits B, C, D, E even though C might have been committed before B. This is similar to the history-shortcut problem we have encountered in other situations. Most importantly, until we do a merge of (A,B) we cannot use *any* of B, C, D, E as parents for full-merges because they will inappropriately remove content due to the way three-way merge works.
In the case above we can call C, D, E 'partially merged' or 'cherrypicked'. I prefer 'partially merged' here as its quite precise and does not conflat the arc that is the cherrypick with the presence of lines from the revision: B is not partially merged because it was the beginning of the arc that was cherrypicked, and no lines introduced in/attributed to B will be included in the merge output.
Partially merged revision
Key expectations of a partially merged revision that is present in a repository:
- All its file texts can be recreated.
- All its ancestors are present.
That is, if A is partially merged into B, A should be fetched from one repository to another when B is fetched. And check should report that A is a ghost if it is not present in a repository that B is in. This property will hold true through any sequence of partial or full merges.
Doing cherrypicking requires some model changes. Some of those are easier than others. What do we need to do?
- Extend the tree interface to include partially-merged revisions in its parents list.
- Extend working tree to allow setting partially-merged revisions.
- Teach merge -r X..Y to set the partially merged revisions on the tree it is merging into.
- Record partial-merge ancestors in the parents of fileid-revisions, allowing us to record file level ancestry during cherrypick operations.
- Record partial-merge ancestors for inventory data. For example, the ability to indicate that a merge was performed on a subtree would allow subtree checkouts to be implemented more sanely.
- Have merge detect when its being asked to fill-in the revisions around a previous cherrypick, and merge more smartly there. (i.e. in my
- teach missing to not list partially merged revisions in its output.
- Give merge a facility to keep doing cherrypicks from a branch once you start that, to allow you to skip over a revision.
More below in 'open issues'.
- Another use case (which hasn't been mentioned) is "cherry unpick". Where you have merged something already, and you decide you want to remove it, and mark it as unmerged (so you have the option to merge it later.) The specific use case was something like:
- Merge feature X and commit
- Do some more work (several commits)
- Realize that feature X wasn't quite as ready as you thought.
unmerge/unpick feature X.
- feature X gets some more work done on it.
- Merge feature X (in its entirety) back later on.
- Is it more efficient to record the partially merged revisions, or the arcs that led to them? Ghosts interact with this too. I think working with arcs for this is probably more compact more of the time.
- Representation in bzr 0.10 format repositories?
- Having some way to make this work without a format upgrade - even at a performance cost - would be nice.
- Representation in 'cherrypick ready' repositories.
- There should be a optimised representation we can implement which will answer partial merge ancestry queries quickly and efficiently. This probably involves knit format upgrades and an extension to the knit api to support that.
Questions and Answers
- Is this a typo?
In "Cherrypicks are Arcs on the graph", you say "(B, E) will include lines created by any of the commits B, C, D, E". I'm not sure what you mean exactly by "arc" (maybe explanation should be clarified), but I understand it to contain lines that are in E but not in B (i.e. - the diff). By this definition, this line should probably read "any of the commits C,D,E". This would also correlate better to the explanation of "partially merged" that comes few lines later.