Bazaar

Bazaar

 




Wiki Tools

  • Find Page
  • Recent Changes
  • Page History
  • Attachments

Summary

We should add support for copying and combining versioned files/directories/symlinks.

Rationale

  • More accurate file history vs simply adding a new file.
  • No loss of history metadata when importing from a tool that records copies.
  • When using Bazaar as a front-end to tools supporting copy, no need to switch back to the foreign tool to record the copy.
  • Changes applied to both original and copied text when merging across a file copy boundary.

Further Details

Assumptions

File copying is a relatively rare operation so the amount of metadata needed to support it is small for most projects. (If fast-import streams are any guide, even the largest projects have a small number of copies recorded.)

The vast majority of merges are not across a file copy boundary.

A primary reason for file copying is splitting a large module/class into multiple pieces. That implies a larger number of conflicts when merging, at least if a simple approach is taken.

File combining is even rarer than file copying and isn't needed in an initial implementation.

Use Cases

  • annotate
  • log
  • copying data back into foreign branches
  • not having to look back into history to figure out file ids in Subversion. This actually requires more than simply tracking copies; it also requires that more than one InventoryEntry with the same fileid can exist, making it all significantly more complicated. Probably not a good idea.

  • possibly doing copy-aware or file-split merges at some point in the future (which is a bit like people who have their heads frozen in the hope that sometime in the future it will be possible to resurrect them *and* someone will want to... and there is a similar issue here that people will only be motivated to do this kind of copying when they know the merge will work.)

Implementation

UI Changes

New commands:

bzr copy ORIGINAL NEWFILE

Output changes:

  • status - A new status flag is needed to indicate a copy.

  • log - Copies should be implicitly followed as renames are. There should be no special options to enable this.

  • diff - Should probably show the copy ala it does for renames.

Interaction with other operations

How should copy combine with rename and delete if both are done in the same commit? As a general rule, the information recorded should be a summary given in the context of the basis tree.

Case 1:

   bzr copy A B
   bzr rename B C

This should show in status/log as "A copied to C".

Case 2:

   bzr copy A B
   bzr rename A C

This should show in status/log as "A copied to B; A renamed to C".

Case 3:

   bzr rename A B
   bzr copy B C

This should show in status/log as "A renamed to B; A copied to C". (It's "A copied to C" instead of "B copied to C" because B doesn't exist in the basis_tree.)

Case 4:

   bzr rename A B
   bzr copy A C

This should show in status/log as "A renamed to B; A copied to C".

Case 5:

   bzr copy A B
   bzr rm B

This is a no-op.

Case 6:

   bzr copy A B
   bzr rm A

This could be recorded as "A renamed to B". Is that the best answer?

Merge interaction

Obviously the biggest problem is how merge would behave. Ie:

  •  $ bzr branch foo bar
     $ cd foo
     foo$ bzr copy a b
     foo$ echo a >> a
     foo$ echo b >> b
     foo$ bzr commit -m "Cloned a to b"
     foo$ cd ../bar
     bar$ bzr mv a c
     bar$ echo c >> c
     bar$ bzr commit -m "Renamed a to c"
     baz$ bzr merge ../foo
    And now what is merged where and why?

Attempted answer (PierreAntoineChampin):

I'm not sure there is an ultimate and definitive answer to that question. My guess is the appropriate strategy depends on the underlying semantics of the copy, which is unreachable. So a choice should be made and made explicit, so that users are not surprised and can override it if they want to.

Mercurial, e.g., has a strange behaviour in this particular case: changes in bar/c are only merged in foo/a, while if you only change bar/a without renaming it, changes are merged in both foo/a and foo/b. The later is reasonable and well argued in http://hgbook.red-bean.com/hgbookch5.html , so the most regular behaviour would be, IMO, to merge changes in both copies, and to rename the source file.

But this is definitely a feature that bazaar should have.

Code Changes

At a minimum, Inventory deltas need to be extended to allow the specification of a copy. Right now, a delta entry has 4 fields:

  old-path, new-path, file-id, inventory-entry

Different operations are specified as follows:

  • add - old-path=None

  • delete - new-path=None

  • modify - old-path=new-path

  • rename - old-path != new-path (and neither are None).

To specify a copy, one option is to add an additional field something like:

  old-path, new-path, file-id, inventory-entry, source-file-id

That would probably be quite a disruptive change though, as all the places generating and processing deltas would need changing. A better option might be to extend the data for InventoryEntrys with a source_id field. A copy operation would then look the same as an add but the content inside inventory-entry would be richer.

The addition of source_id would be made to InventoryFile, InventoryDirectory and InventorySymlink. I'm not sure but my initial reaction is that copying of nested trees isn't needed and therefore InventoryReference objects don't need a source_id field. Is that right?

For efficiency, I suspect we also want to record the source_revision in addition to the source_id. Assuming source_id and source_revision are propagated to subsequent inventories and not just recorded in the original record, that would allow merge (say) to detect whether it's crossing a file/directory/symlink copy boundary without needing to traverse lots of history to find when the copy occurred.

Adding that field implies a new serialiser/deserialiser and disk format. It would therefore be nice to include this change, if agreed, in the upcoming chk-based format.

Schema Changes

There are lots of possibilities here. If adding source_id/source_revision to inventory entries isn't desirable, another option is to add a copy-sources file to the repository metadata. The data would be:

  • new-file-id -> old-file-id, old-revision-id

If a directory is copied, perhaps a record should be added for each child?

Using a separate table would be desirable if copy support was done in a plugin. It really ought to be a core feature long term though.

Data Migration

This is a new feature. I don't think existing data ought to be changed. If users want this,I think it would be reasonable to create new repository via something like ...

   bzr fast-export --find-copies . > xxx.fi
   bzr init-repo new-xxx
   bzr fast-import xxx.fi

Discussion

For a first version it would be enough to track file copies in the storing format, so that no information is lost, when someone wants to copy a file.

As merging is expected to be a bit more difficult, it would be ok to create a conflict so that the user manually need to resolve the conflict.

Questions and Answers

Should copy create two new file-ids or just the one new file-id for the destination? There are arguments both ways:

  • if the file is going to be split, why should one half be more important than the other?
  • one file is implicitly more important - the user chose to keep it's name - so it ought to keep the original file-id.

Temporary Solution

< lifeless> metze: so, for splitting files, the git approach can work for us - which is that we can always come back
and infer later:)

< lifeless> metze: but thats a cop-out I know. the short answer is that file copies done *right* are a post 1.0
feature. If its really common to split files, we can probably start recording a look-aside set of data
for copies in the short term, which would mean no history is lost and when we get real support it would
be converted and used.

CategorySpecification