Nested repositories

Our intention is to integrate a subset of the functionality of the ForestExtension into the core of Mercurial, while maintaining simplicity. This isn't quite a design document: it's more an exploration of the different design decisions that might make sense, and what the tradeoffs are.

TableOfContents

1. Similar concepts in other systems

git

[http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html git-submodule man page]
[http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#submodules git user manual on submodules]

svn

[http://svnbook.red-bean.com/en/1.0/ch07s03.html svnbook section on externals]

2. Goals

The goal is to be able to use multiple repositories as a single, loosely coupled, unit. A "parent" has a notion of several "modules" that live under it. In at least some cases, performing a command in the parent should affect the modules.

By "loosely coupled", we mean that repositories are largely independent.

Relationships are hierarchical and one-to-many: a parent knows about its modules, but they do not know about their parent or sibling repositories.

2.1. Use cases

Here are the important needs we would like to at least consider.

"Vendor branch": a pile of code that is almost never touched by developers, but that is needed to build a project.
Modular development: a system composed of largely independent units that do not need to be versioned together.
Partial views: a developer who only needs to work with two out of twelve modules should not have to download or deal with the other ten.

2.2. Terminology

Names used by sundry systems:

vendor branch
external
submodule
forest

I'm arbitrarily choosing "module".

3. Managing modules

Modules are listed explicitly, in a file named ".hgmodules" in the root of the tree (similar to ".hgignore" and ".hgtags"). This file contains one of more ConfigParser-style entries like so:

[foo/bar]
url = http://hg.example.com/bar
branch = default
rev = tip

[quux]
url = http://hg.example.com/quux
rev = 9117c6561b0b
optional = true

This file is intended to be read and written by machine. If you edit it by hand, there is no guarantee that comments or formatting will be preserved.

The name of a section is the location under the working directory of the parent where the module should be placed. It must be a relative path. Other items are as follows:

"url": URL from which to clone the repository.
- If omitted, the "default" URL of the parent (from its ".hgrc") is used as a base, followed by a slash, then by the path of the module.
- Thus, if the parent's URL is "http://hg.example.com/foo" and the modules's path within the parent is "bar/baz", the default URL for the module would be "http://hg.example.com/foo/bar/baz".
"branch": branch of the module to check out.
- If omitted, "default" is used.
"rev": changeset ID or tag to check out.
- If omitted, the tip of the given branch is used.
"optional" (boolean): is this module really needed, or simply available?
- If an optional module is present locally, it will be affected by commands that operate in modules.
BrendanCully responds:
- How about we make .hgmodules a directory, where the paths underneath it map to the position of the module in the working directory. For example, a file called .hgmodules/foo/bar would check out the repository pointed to by the contents of bar to the path foo/bar in the working directory.
- The contents of the module file could simply be in hg's extended URL format, eg http://server/path#identifier. This requires very little parsing by either man or machine, and can be passed directly to the internal clone/update code.
- The advantage of using subdirectories is that you don't have to worry about merge conflicts when two different modules are updated.
- The simple URL scheme may be slightly less expressive than the format described above, but I think it matches well with clone, and any deficiencies in the URL format should be addressed in the common URL parser. Frankly I'm not sure why you'd need to specify both branch and rev though, since rev is strictly more precise.

What should the user interface be to this file? It should be formatted such that a user can edit it directly, if need be. But ...

Do we modify the add, remove, and rename commands to edit it?
Do we add a "hg module" command that will edit it?

Probably the latter.

4. Important open questions

Does it only make sense to think about modules when we have a working directory? If not, where do modules live when we don't have a working directory? (It would be technically possible to separate a module's working directory from its repository, for example, though I'm not sure we want to go there.)

For now, I'm assuming that if there's no working directory, there are no modules.

Here's another sticky question without an obvious answer: By default, should commands that operate in the working directory recurse into modules?

The alternative that I lean towards is to not recurse unless explicitly instructed to. Most probably, only a few commands should arguably even be aware of modules.

This model assumes that modules will usually only be read, and checked out at a fixed revision, such that automatically running status queries or updates in them makes little sense: they won't change often enough to be worth the effort. This is in line with the usual use of externals in SVN, and with CVS vendor branches.

For people who would be actively developing in multiple repositories, however, this provides poor support. If you have a better idea, let's hear it! Note that the existing config mechanism lets you add a "--modules" option to whatever commands you think need it.

If a command like "add" is run in a parent repository's working directory, and given a path to a file in a modules's working directory, what should its behaviour be? The current behaviour is to complain and fail: should this remain?

5. User interface changes

5.1. The module command

We add the "module" command, for managing modules. It has several subcommands.

"add" introduces a single new module. A local copy of the repository must already be present. Options:
- "-r": the revision to use.
- "-b": the branch to use.
- "-u": the URL to use.
"remove" removes one or more modules.
"record" updates the changeset ID associated with each module. Uses the working directory's parent from each module. Aborts if any module has zero or two parents.

To clone optional modules, do we extend the behaviour of the built-in clone command, or add a "clone" command here?

5.2. Changes to existing commands

5.2.1. Uniform option naming

We introduce a standard -M / --modules option for commands that need to become module-aware. The name of the option is standard: its interpretation can change, depending on the command.

5.2.2. clone

If invoked with -U to avoid an update, this simply does not clone any modules.
For behaviour without -U, see "update" below.
Should we special-case a local clone, where the repository we're cloning has a working directory and modules?

5.2.3. update

If a required module is missing, it is cloned and updated.
If an optional child module is missing, nothing happens.
The -M / --modules option causes each module to be updated to whatever revision is appropriate, based on the current contents of ".hgmodules".
The content of the ".hgmodules" file in the working directory is used to decide which children to clone and update.
In other words, changes to the ".hgmodules" file do not need to be committed in order to have an effect, like for the ".hgignore" file.
Children are not inspected or updated until work in the parent is complete: this traversal is breadth-first, not depth-first.

5.2.4. add, remove, rename

What should these commands do if asked to operate on a module, or a directory containing a module?
- Modify ".hgmodules" to add, remove, or rename a module?
- Print a warning advising ... something else to be done?
- Remain untouched?

5.2.5. pull

Accepts a -M / --modules option, to pull in modules as well as this repository.
If both --modules and --update are specified, both this repository and each module are updated.
Not clear whether the order of execution (relative to the parent) matters.
If one pull fails, do the others continue, or does everything come to a halt?

5.2.6. push

Accepts a -M / --modules option, to push from modules as well as this repository.
Must push from all children (depth first) before the parent, otherwise remote users will not be able to pull when a push has partially completed, because ".hgmodules" may refer to revisions not yet pushed.
If one push fails, do the others continue, or does everything come to a halt?

5.2.7. incoming, outgoing

Accept -M / --modules options, to operate in modules as well as this repository.

5.2.8. tag

Accepts a -M / --modules option, to tag in modules as well as this repository.

5.2.9. status

Accepts a -M / --modules option. This simply lists modules: it does not recurse into modules.
We can identify modules that are present with "M" - what do we do for modules that are missing? What about optional modules?

5.3. Questionable commands

Here are some possible behaviours for commands where it's really not clear that being module-aware makes sense at all.

5.3.1. commit

Accepts a -M / --modules option, to commit inside modules.
If a commit message is not explicitly provided, we use the commit message from the parent in every module, or prompt for a new message in each?
- Probably the former.

We have the possibility of rolling every commit back if any commit fails, when using --modules. Do we want to do this?

5.3.2. Next sticky question

If we make "commit" module-aware, why not status, diff, and all the rest?

NestedRepositories