Nested repositories

Our intention is to integrate a subset of the functionality of the ForestExtension into the core of Mercurial, while maintaining simplicity. This isn't quite a design document: it's more an exploration of the different design decisions that might make sense, and what the tradeoffs are.

1. Similar concepts in other systems

git

svn

Perforce

2. Goals

The goal is to be able to use multiple repositories as a single, loosely coupled, unit. A "parent" has a notion of several "modules" that live under it. In at least some cases, performing a command in the parent should affect the modules.

By "loosely coupled", we mean that repositories are largely independent.

Relationships are hierarchical and one-to-many: a parent knows about its modules, but they do not know about their parent or sibling repositories.

2.1. Use cases

Here are the important needs we would like to at least consider.

2.2. Terminology

Names used by sundry systems:

I'm arbitrarily choosing "module".

3. Managing modules

Modules are listed explicitly, in a directory named .hgmodules in the root of the tree (suggested by BrendanCully). Each directory under .hgmodules corresponds to a module that will be present in the working directory. For example, a directory .hgmodules/foo/bar contains information about a module that will be located in foo/bar in the working directory.

The files and directories under .hgmodules are intended to be read and written by machine.

For each module, its directory must contain the following files:

The repository directory structure for the .hgmodules given above looks like this:

  parent-repo-dir/
    .hg/
    .hgmodules/foo/bar/default
    .hgmodules/quux/default
    <working dir content from parent-repo-dir>
    foo/bar/
      .hg/
      <working dir of foo/bar module>
    quux/
      .hg/
      <working dir of quux module>

The configuration files in these directories are plain text, but not intended to be edited by hand. How do we modify them?

  1. Do we modify the add, remove, and rename commands to edit them?

  2. Do we add a hg module command that will do some or all of the editing?

Probably the latter.

3.1. Discussion

4. Important open questions

Does it only make sense to think about modules when we have a working directory? Presumably yes, but this introduces the need to possibly have a network connection in order to clone missing modules during a hg update or similar.

For now, I'm assuming that if there's no working directory, there are no modules.

Here's another sticky question without an obvious answer: By default, should commands that operate in the working directory recurse into modules?

The alternative that I lean towards is to not recurse unless explicitly instructed to. Most probably, only a few commands should arguably even be aware of modules.

This model assumes that modules will usually only be read, and checked out at a fixed revision, such that automatically running status queries or updates in them makes little sense: they won't change often enough to be worth the effort. This is in line with the usual use of externals in SVN, and with CVS vendor branches.

For people who would be actively developing in multiple repositories, however, this provides poor support. If you have a better idea, let's hear it! Note that the existing config mechanism lets you add a "--modules" option to whatever commands you think need it.

If a command like "add" is run in a parent repository's working directory, and given a path to a file in a modules's working directory, what should its behaviour be? The current behaviour is to complain and fail: should this remain?

What about nested nested-repositories ? If I have a .hgmodules tree in one of my modules, should a command issued at the root level also recurse in those "sub-modules" ? I guess so.

/root/
  .hg/
  .hgmodules
  module1/
    .hg/
  module2/
    .hg/
    .hgmodules
    module21/
      .hg/
    module22/
      .hg/
  module3/
    .hg/

In the structure above, does a command issued at the root level should also take into account module21 and module22 ? If only module21 is listed in the .hgmodules of module2. What if I have module22 recorded as a module of root ?

5. User interface changes

5.1. The module command

We add the "module" command, for managing modules. It has several subcommands.

To clone optional modules, do we extend the behaviour of the built-in clone command, or add a "clone" command here (+1) ?

5.2. Changes to existing commands

5.2.1. Uniform option naming

We introduce a standard -M / --modules option for commands that need to become module-aware. The name of the option is standard: its interpretation can change, depending on the command.

5.2.2. clone

5.2.3. update

Ideas that probably don't make sense:

5.2.4. add, remove, rename

5.2.5. pull

JesseGlick: I would expect pull -u (or fetch) with --modules to first update the parent, then inspect its updated .hgmodules to see what modules might be there that also need to be updated.

5.2.6. push

5.2.7. bundle

JesseGlick: I'm not sure what bundle --modules should do, actually. The current format can only bundle changesets from one repo.

5.2.8. incoming, outgoing

5.2.9. tag

5.2.10. branch, branches

5.2.11. status

5.2.12. identify

5.3. Questionable commands

Here are some possible behaviours for commands where it's really not clear that being module-aware makes sense at all.

5.3.1. commit

We have the possibility of rolling every commit back if any commit fails, when using --modules. Do we want to do this?

JesseGlick: commit --modules would be nice (for a forest of loosely synchronized repositories) but not essential.

5.3.2. Next sticky question

If we make "commit" module-aware, why not status, diff, and all the rest?


CategoryNewFeatures

NestedRepositories (last edited 2008-07-01 18:31:37 by BrendanCully)