Launchpad Entry: https://blueprints.launchpad.net/bzr/+spec/foo
Created: 2007-04-06 by AaronBentley
A new file format for representing diffs against multiple parents
This format may reduce the storage size of repositories, and also make it easier to produce texts efficiently.
When we merge, we produce files with multiple parents. Frequently, the right-hand parent contributes significantly to the text of the resulting file. By including the right-hand parents in the delta, we can produce a much smaller delta.
But we should not leave out the left-hand parent, because it contains all the changes in the current branch, relative to the merge base.
So therefore, multiple parents are desirable.
May be used for storing data in repositories, for non-human-readable bundle formats, for the smart server protocol.
Because line numbers in the output are provided, it provides an easy way to generate a text from a series of deltas without generating intermediate forms of that text.
The format consists of a series of pairs of "insertion" and "common text" sections. Example:
i 3 New line 1 New line 2 New line 3 c 1 5 4 10 c 0 20 14 5
Insertion sections are indicated by the letter "i", and describe new text that is not present in either parent. The "i" is followed by the number of lines to insert. The last newline is considered the "insertion terminator", and not part of the text. Thus,
i 1 my dog has fleas
is a representation of "my dog has fleas", while
i 1 my dog has fleas
represents "my dog has fleas\n".
In a merge scenario, inserted lines will indicate changes made by the user after the merge. (e.g. in conflict resolution)
Common-text sections are indicated by the letter "c", and are followed by four positive integers, separated by spaces. They are terminated by a newline. The numbers mean:
- The parent that the text is common with, indexed from 0
- The line number of the beginning of the section, in the parent text
- The line number of the beginning of the section, in the child text
- The number of lines in the section
Entries 2-4 match the output of SequenceMatcher.get_matching_blocks() (when applied to line lists). Implementation of this format should be relatively simple.
Is it worthwhile to apply bisection to finding sections? If so, it may be worth reserving a marker character to introduce sections.
Is it valuable to include the position in the output as part of insertion sections?
This format does not describe annotation. I presume that the annotated text would be delta'd, rather than making annotations part of the delta format.