15-FEB-2011: CVS Sucks.

Bad CVS. Bad.

CVS sucks. I think most people know this by now, so I don't bring it up to inform anyone of any particular aspect of how it sucks but rather because it's causing problems for me now.

Tcl has been using CVS for the past 12 years. For the last few years they were hosted on Sourceforge's CVS server. Recently, Sourceforge shut down their CVS servers after a security incident where some servers may have been compromised [1]. Fortunately, the Tcl project had a backup of their CVS repositories.

I worked with the Tcl project to convert their CVS repository to a distributed version control system (DVCS), specifically Fossil.

The conversion did not go well.

The first thing we tried was using "cvs2svn" which also supports CVS-to-Git conversion to convert the CVS repository to Git, then exporting that using "git fast-export" which is understood by "fossil import --git". This had the undesirable side-effect of turning all the CVS tags into Fossil branches.

The next thing we tried was "git cvsimport", followed by "git fast-export" to "fossil import --git". This did not fix the problem of all the tags being turned into branches. This is apparently because the output of "git fast-export" produces a stream that "fossil import --git" cannot differentiate between tags and branches.

NetBSD user Brad Harder then found Joerg's "cvs2fossil" utility [2] and pointed us to it. We got much better results from it, but it lacked tags (at least it didn't convert them to branches, though). I wrote a simple script to generate the tags based on the Fossil repository and the SHA1 of the files being tagged. This revealed that CVS users over the years had removed branches from some files, while leaving the tags pointing to things within those branches intact. This broke stuff.

The underlying problem is that CVS operates differently from modern version control systems. In CVS there is nothing that indicates that a group of files changed (check in/change set) but rather each file that is changed is independently versioned. Conversions away from CVS always suffer from this "impedance mis-match", especially old crusty repositories that have had tags applied to a random collection of file revisions (some of which may have never existed at the same time in the repository!), or with branches removed.

In the end, we managed to identify the branches for files that had tags set and replace them all one-by-one to recreate the exact tags and complete the conversion to Fossil.

Hooray.