Bazaar

Bazaar

 




Wiki Tools

  • Find Page
  • Recent Changes
  • Page History
  • Attachments

Summary

A new file format for representing diffs against multiple parents

Rationale

This format may reduce the storage size of repositories, and also make it easier to produce texts efficiently.

Further Details

When we merge, we produce files with multiple parents. Frequently, the right-hand parent contributes significantly to the text of the resulting file. By including the right-hand parents in the delta, we can produce a much smaller delta.

But we should not leave out the left-hand parent, because it contains all the changes in the current branch, relative to the merge base.

So therefore, multiple parents are desirable.

Assumptions

Use Cases

May be used for storing data in repositories, for non-human-readable bundle formats, for the smart server protocol.

Because line numbers in the output are provided, it provides an easy way to generate a text from a series of deltas without generating intermediate forms of that text.

Implementation

The format consists of a series of pairs of "insertion" and "common text" sections. Example:

i 3
New line 1
New line 2
New line 3

c 1 5 4 10
c 0 20 14 5

Insertion sections are indicated by the letter "i", and describe new text that is not present in either parent. The "i" is followed by the number of lines to insert. The last newline is considered the "insertion terminator", and not part of the text. Thus,

i 1
my dog has fleas

is a representation of "my dog has fleas", while

i 1
my dog has fleas

represents "my dog has fleas\n".

In a merge scenario, inserted lines will indicate changes made by the user after the merge. (e.g. in conflict resolution)

Common-text sections are indicated by the letter "c", and are followed by four positive integers, separated by spaces. They are terminated by a newline. The numbers mean:

  1. The parent that the text is common with, indexed from 0
  2. The line number of the beginning of the section, in the parent text
  3. The line number of the beginning of the section, in the child text
  4. The number of lines in the section

Entries 2-4 match the output of SequenceMatcher.get_matching_blocks() (when applied to line lists). Implementation of this format should be relatively simple.

UI Changes

Code Changes

Schema Changes

Data Migration

Discussion

Is it worthwhile to apply bisection to finding sections? If so, it may be worth reserving a marker character to introduce sections.

Is it valuable to include the position in the output as part of insertion sections?

This format does not describe annotation. I presume that the annotated text would be delta'd, rather than making annotations part of the delta format.

Unresolved Issues

Questions and Answers

CategorySpecification