Conversion of ClearCase repositories to Mercurial

IBM Rational ClearCase is a commercial revision control software, for details refer to http://en.wikipedia.org/wiki/ClearCase. Certainly, some or all of the above mentioned names may be registered trademarks.

This article describes how a conversion of a ClearCase VOB to Mercurial could theoretically work.

Like SCCS, RCS and CVS, standard ClearCase handles revisions for each file individually, so that each file has its own revision numbers, assigned labels etc. Usually each file and directory has a mainline, numbered /main/1 to /main/N and can additionally have several branches starting at defined revisions. Furthermore each revision can have multiple labels assigned.

Files and Directories are presented to a user in so called VOBs through the ClearCase filesystem. What the user sees, depends on a so called view specification where he can fine tune which label, revision number or point in time should be used for an entire VOB, subdirectory with all descendants, or a single file or directory. So each user can see different data under the same directory path.

When importing a ClearCase VOB, it has to be decided whether to start at a certain point with all elements present at that time or from the very beginning. Depending on this, one can setup a time based ClearCase view specification so that either a nearly empty or already filled repository is visible.

When not starting at the beginning, then all files visible with an active view using the viewspecification must be copied into a new directory structure and put into a new Mercurial repository using the usual hg init; hg add; hg commit sequence.

Later, it is possible to use the ClearCase find command in order to detect most of the changes that have been applied later. For example

cleartool find /yourpath -version \"brtype(main) && created_since(yourdate)\" -exec \"echo \\\$CLEARCASE_XPN\"

will list any new elements that have been created later than a certain date (usually the date of the latest Mercurial repository update should be used here). elements here means new file revisions and new directory revisions as ClearCase does versioning for directories as well.

For new files, typically versions with extensions /main/1 will hold the first usable content while /main/0 versions can mostly be ignored.

Note that the list of files returned by the command above is not sorted in any way yet, so it must be ordered first, e.g. by using a stat on the full path specifications to obtain the modification (and here also creation) date. Alternatively cleartool describe can be used to obtain the date.

Once the elements are sorted by time, they can be processed and for each element

The comment and the user can be obtained from ClearCase by using commands like cleartool describe -fmt '%c' or ... '%u' respectively.

There are certain problems left:

For these, the directory elements which are returned by the find command earlier must be analyzed in detail, which might be difficult, because several files or subdirectories could have been removed or renamed in between. A move in the same directory can be detected by comparing the inode numbers of the files within the old and new version of the directory. However, files can also be moved between directories in which case two directory elements are checked out and checked in again in ClearCase. This must not necessarily happen at the same time.

After an incremental conversion is done, it is a good idea to check using a recursive find call, and an recursive md5sum, whether the new version of the Mercurial repository is identical to the ClearCase repository.

This is especially needed when ClearCase multisite extension is used and replicas are replayed from elements which have remote mastership. It can happen that e.g. a file is edited at 08:00h, the incremental conversion !Clearcase to Mercurial runs at 09:00h and detects nothing, and at 10:00h the replica is mirrored from remote side. In this situation looking at change that happened later than 09:00h is not sufficient, because the 08:00h file change will then not be found. Theoretically, if the time of checkins must be preserved, an incremental conversion must not be attempted for files created after the last mirroring from remote side.

ClearCaseConversion (last edited 2012-11-06 14:39:59 by abuehl)