Bazaar

Bazaar

 




Wiki Tools

  • Find Page
  • Recent Changes
  • Page History
  • Attachments

Status

Line-ending conversion is implemented in Bazaar now, as of the 1.14rc release candidates. See the documentation for the feature.

To use this feature, you only have to upgrade your local (client-side) Bazaar. Even if it talks to an older server, it will still do the line-ending conversion correctly on your end.

The rest of this page is probably out-of-date, as it describes an earlier stage of this feature. Anyone who has enough knowledge to do so should update it.

Summary

Request from Alexander Belchenko:

  • I mean that text files may use different line endings on different platforms: LF, CRLF, CR, other? At now bzr save in the repository actual line endings from edited file. I often faced with problem, when after patching some of bzrlib/* python files on Windows all line endings was changed to CRLF by patch utility and following diff show me that all file content is deleted and then added with patch changes. So I need convert patched files to LF line endigns manually. It's not difficult, but sometimes I forget about this. I think it will be useful if inventory will save for each text file additional metadata about used line endings:
  • Actual (the same as in file on disk, used in current bzr implementation)
  • LF
  • CRLF
  • CR
  • other? In this case when edited file pass through diff command then difference in line endings can be ignored if metadata say that file stored with LF line endings. When commit this file it will be stored in repository with LF only lineendings regardless of actual state on disk (and may be change file on disk for sync purposes). May be this idea is too stupid, but I every day face with different line endings during multiplatform development.

Rationale

This should cover the _why_: why is this change being proposed, what justifies it, where we see this justified.

John Whitley:

In my experience, line-ending mangement is a central use case for a version control system. In most development shops I've worked in, there has been a wide mix of platforms, editors, and tools that devs use to work with versioned files. Some of these environments are not well-behaved. Some tools don't preserve line-endings, either flipping files of another convention or generating files with mixed line-endings. Other tools require a specific line-ending convention to work correctly. The real nightmare from this comes during merging: a file with a trivial change on one line and some/all line-endings having 'flipped' really screws up proper merge behavior.

Further Details

In place of Description of Issue here, add your own title that provides a description of the issue, or intended funcionality, or proposed change. You can have subsections that better describe specific parts of the issue; you can also include here subsections like the following:

Generalization

The way mercurial recently implemented this seems like a good idea. They have a configurable filter for converting the file between on-disk and canonical format. The file is converted to canonical form for diffing and patching (though third filter converting the diff to on-disk form might do better).

Mercurial matches the filters by filenames and the filters can only be external commands so far. IIRC the file is not versioned there.

I think that plugin-provided classes would be slightly better and that there should be both versioed and unversioned configuration source. Actually there is a place for 3 configuration sources-local) default, versioned property (stored in the inventory?) and local override. Local default would be used for setting initial value for added files, while local override would allow turning the filters off for particular files (perhaps it has LocaleCharset filter, but your locale is sufficiently weird that the file fails to convert)

  • I mean that text files may use different line endings on different platforms: LF, CRLF, CR, other?

Well, next thing someone will request is charset conversion, then keyword expansion and whatnot else. Plugable filters are a reasonable way to go.

Actually, I'd suggest plugable diff for the text/binary stuff to. So the configuration would not say whether it's binary, but rather a diffing algorithm for it. So the manifest would end up containing something like:

<filters>
        <filter class="NativeLineEnding" />
        <filter class="LocaleCharset" />
        <filter class="Keywords" />
</filters>
<diff class="TextDiff" />

And binary files would simply have:

<filters />
<diff class="BinaryDiff" />

That would leave the space for adding eg.

<diff class="OpenDocumentDiff" />

without changing the format, except that the repository that actually uses the new diff would not be readable by bzr without that diff method. Adding filters would not even have this problem, because filters can be skipped if they are not available.

-- JanHudec

John Whitley replied:

  • While I'm all in favor of a pluggable filter system as a general and extensible implementation tool, I think that it's really important to have first-class support for managing line-endings on a per-branch and per-file basis. As such, I'm keen to see that support be a property of the branch and its files -- not just a property of a single user's bzr installation.

The requirements are not contradictory. Basically there should be two basic diff algorithms: text and binary -- and five basic filters: native-line-endings, LF-line-endings, CR-line-endings, CRLF-line-endings, native-charset -- in core. Other could be provided.

The diff module should include generating weaves. That is splitting the file into chunks that will become "lines" in the weave.

-- JanHudec

Assumptions

Use Cases

  • Part of developers working on Unix, using gcc to compile and a text editor that always ends it's lines with LF. Another part of developers working on Windows using some tools that screw up horribly if lines are not properly ended with CRLF. So they need to convert the files to their platform-specific line endings.
  • Developers doing some work that requires them to write non-ascii text, but having their systems set to different encodings.
  • Versioning documents in OpenDocument (zipped XML) or Microsof Document XML (plain XML, IIRC) formats. Appropriate plugable filters and diff algorithms would allow them to see changes and merge the documents and would make the storage more efficient.

  • Having toolchain stored alongside the sources under version control. We do this @work when using less common tools (compilers for various embedded systems and such). Plugable binary diff would make the storage more efficient.
  • Having latest revision stored automatically in built binaries. Well, that can be done as part of the build too.
  • Having last modification time and author of last change stored in web pages.
  • Generating automatic changelog like GNU Arch did?

Implementation

The proposed approach as of 2007-02 is:

  • add VersionedProperties to the inventory

  • add a versioned file property that says what line endings a file wants
  • add a dotfile or control file setting the default properties by glob, assigned when the file is added

Discussion

This section should house the larger issues that need discussing; you can sprinkle XXXs around the page if you want to keep the smaller open issues in context.

Unresolved Issues

What's the status on this? All the jibber jabber on this page seems like overcomplication! At the minimum, before any fancy custom filter/plugin stuff, just cover the most common use case and at least do what CVS does. On Windows, when getting a text file (determine text-ness heuristically from file contents!) from the repository, add CRs, and when putting a text file into the repository, strip out CRs. No? It's not perfect, but it's a sane default.

  • The cvs approach often corrupts files. But we probably can use it in at least a couple of places, most notably that diff should probably ignore line ending changes by default. -- MartinPool 2007-02-01 07:09:48

    I would not say the CVS approach often corrupts files. It's heuristics often gets the file format right, and in the rare cases it doesn't a "cvs admin" to change the keyword model to binary (attribute) solves that (disables both keyword expansion and line ending translations). But yes, if a file is committed from Windows with an extension that CVS thinks is text badness will result unless one overrides the automatic format selection. -- HenrikNordström 2008-04-12 00:01:13

Questions and Answers