Launchpad Entry: https://blueprints.launchpad.net/bzr/+spec/i18n
Created: 2007-04-25 by AlexanderBelchenko
Use gettext or similar internationalization of all messages emitted by bzr.
Although English is de-facto franca lingva of Internet, people around the world prefer talk with their computers and programs in their native languages. Bazaar tends to be "friendly version control system", so need for internalization of Bazaar's user interface is out of question (?).
For collaborating work on translation of Bazaar's messages proposed to use Rosetta tool: https://translations.launchpad.net/bzr/
This specification should cover following questions:
- process of extracting string resources from python sources of bzrlib, creating PO-template (pot) and synchronizing existing template in Rosetta with new one
- required code changes to proper support of localized messages
- using translations without installing bzr (for bzr developers and translators): required directory map and filename conventions
required changes to release process and convention to include translations in releases of Bazaar (including ports and packages for various platforms)
1. When we say all messages we have in mind:
- usual messages generated by bzr commands
- help for commands
- help topics
errors raised during command execution (like NoWorkingTree etc.)
- errors messages based on Python's IOError, OSError exceptions (No such file, File exists, Permission denied, etc.)
- errors raised during parsing command-line parameters
2. Bazaar today is command-line tool (bzr) and library (bzrlib) as well. Internationalization of bzrlib should be transparent for others clients.
3. All strings in bzrlib marked for translation with pass-through function, and UI code before write to stdout/stderr should explicitly call gettext function to obtain translated message.
4. All existing tests used only English messages to test base functionality. To check that bzrlib's gettext support works as expected, new blackbox tests needed. New tests should use some special pre-defined testing translations (e.g. test.mo) to provide proper isolation from current user encoding.
Usual gettext approach to mark strings for localization by using N_() pass-through function. For extracting such strings in special POT (template) file need to use special tool: xgettext (from GNU gettext) or pygettext.py script. Latter comes with standard python installation. We choose xgettext because xgettext has a --join option. Why this option is needed is described later.
One BIG issue for bzrlib -- it's usage docstring as help for commands.
- The docstring is indented so we need dedenting.
- The help messages is too big for translaters. They should be splitted by paragraph.
Because Python is interpreted language, using N_() many times may slow down bzr startup time. For example, N_("First paragraph") + N_("Second paragraph") + ... may have not negligible cost.
So we extract docstrings by own way if possible. A new hidden command named export-pot finds commands and help topics from registry and output their help as a po formatted messages. export-pot also exports error messages from bzrlib.errors.
After bzr export-pot > bzr.pot, xgettext --join -o bzr.pot collects messages marked by N_() and gettext(). These two commands is executed by make update-pot.
Working with Rosetta
Rosetta can import pot file from branch. So updating bzr.pot regularly may be enough to update English messages.
When releasing bzr, both of source package and binary package should contain translated messages in any format. And bzr's branch should be tagged with po files. So we should download po files from Rosetta and commit to bzr's branch.
Currently, qbzr, bzr-explorer and TortoiseBZR maintainer manually request downloading po files on Web form. Rosetta makes tarball, then notify by e-mail. The maintainer downloads tarball from link written in the notify mail. This is a boresome task and hard to automate.
So I want to autoexport feature of Rosetta. Rosetta can commit po files to registered branch daily. The workflow with autoexport will be:
- Checkout po files from exported branch.
Put them on Bazaar's po/ directory.
This workflow may be automated easily.
But I don't know much about exported branch. Manually updating will be needed for start time.
If exported branch can be customized, PQM may be able to merge updated po to bzr's branch.
Required code changes
1. Provide update-pot command for exporting some docstrings.
2. setup.py should automatically compile PO-files to binary MO-files and install them in system as well.
3. Provide transparent N_() function for all bzrlib clients. (To import bzrlib.i18n lazily, N_() should be in bzrlib.__init__.py?)
4. Enable run-time translation of messages for all commands except selftest.
5. When bzrlib.i18n.install() is called, it searches MO-files for translations. It should look in application directory first, then in location specific for current platform (see section below about directory structure).
6. UI code explicitly call bzrlib.i18n.gettext() function to obtain translated version of message.
7. Add .l10n() method to BzrError class. It works like .__str__() but returns translated message. This allows plugins to implement own translating system.
8. Add Command.help_l10n() method. It works like Command.help() but returns translated help. This allows plugins to implemented own translated help.
9. help command shows translated help topics. Currently, help topics is just string but not a class. So translation should be executed while getting documents from registry. topic_registry can accept text or callable. Callable returning translated message should be registered to the registry.
For all existing tests we are disable translations and test messages for original English strings.
Special blackbox tests with predefined testing MO-file should be used to test UI code:
that output of any translated unicode message will not cause UnicodeEncodeError
- that output is actually translated
Using translations in development
We should provide simple way to check, test and using translations without actually installing bzr. For this goal we need special target in Makefile to compile text PO-files with translated messages to binary MO-format.
Proposed directories structure:
bzr.dev | +--- bzrlib | | | +--- '''locale''' | +--- contrib | +--- doc | +--- man1 | +--- '''po''' | +--- tools
Directory po will store POT and PO files: bzr.pot is template) and lang.po is translation for each language.
Directory bzrlib/locale will store compiled MO files in subdirectories/files lang/LC_MESSAGES/bzr.mo (here lang is language code).
Directory po and its content should be under version control, either as part of bzr.dev or as separate nested tree (when nested tree will stabilized and becomes mainstream).
Directory bzrlib/locale will store auto-generated files and therefore should not be under version control.
Required changes to release process
1. When development go to phase feature freeze it should also force messages freeze: no changes to translatabled messages is allowed to merge in. In this case translators will work on current release calmly.
2. When development go to phase feature+messages freeze, release manager or i18n maintainer will upload current template and po-files with translations for new release series. This probably require review from Rosetta admins (again), so we need to doing this when there is many time before releasing first candidate.
3. Before release manager will prepare release candidate he should download all translations from Rosetta and put them in po directory in sources tree.
4. Sources archive (tar.gz) should include po directory with its content as well as required helper scripts for setup.py if needed for build MO-files and installing them in system.
5. It's OK to assume that translations will be improved during release candidate period. So each new candidate and final release should update po-files each time (as described in item 3 above).
1) bzrlib should provide default _() pass-through function.
2) ensure that all tests runs in C locale by default
3) default encoded file streams-wrappers around stdout/stderr should be provided
4) ensure that all output of bzr errors handlers and bzr commands goes via encoded file streams. XXX For this task is possible to write appropriate tests, but we need some dedicated translations to use in tests.
5) implement new commands for setup.py: build_mo and install_mo
6) new module bzrlib.i18n should implement all required steps to import gettext and provide translation function gettext() for bzr. For testing it's necessary to have knob to enable/disable translations in runtime.
Most of blackbox and UI tests expect to get English (non-translated) messages. Probably some UI code require special testing in non-English locale. It's hard to say right now exactly, so initially assume we need to run test suite always in C locale.
_ has a special meaning in Python, using _() as a function might not be save in the debugger.
Where is setup.py's commands like build_mo implemented? qbzr and TortoiseBZR uses extras directory for librarys used by setup.py.
Should bzr use ugettext (unicode) and not gettext (utf-8)?
Which error message should be translated? How about .bzr.log? I think internal error doesn't have to be translated. Custom message extractor should extract from classes bzr.errors that inherits BzrError and it's internal_error is False and _fmt is specified.
What about help_topics? It have en/ and es/ directory already. Pros for moving gettext system is per paragraph traceability of translations. (updated and added paragraph can be showed as English)
How help text of commands from plugins should be handled? Idea: Add i18n() method to command class. Plugins should override when providing translated text.
Should bzrlib.i18n module be lazy imported? => Proifle!