SVN-to-git bridge (for practice)

Aug 10, 2014 • adridg

Some time ago, I wrote that I needed to be less of a scaredy-cat about git (in particular so as to get back into KDE development, and the rat's-nest of git repositories there was scaring me off -- in that sense I'm a data point in what Paul Adams is writing about). The best way of learning is by doing, so I looked for something to do with git that would basically force me to use it regularly.

That really means "find a way to use git at work-work", since most of my development hours happen there now (largely administrative number crunching in Python).

A bit of background: at work-work we have a central SVN repository. It has a non-standard naming scheme: trunk is called development, and branches are in the releases subdirectory. The only branches are for actual releases. For various reasons we are also using SVN 1.5, which means that we don't have any of the more-modern merge and branch features that SVN has grown. So feature work by the developers happens in trunk directly, not in feature branches, and we end up with some pretty confusing history of interleaved commits.

I have 38 minutes on the train between Arnhem and Utrecht that I could use effectively for development of small things: typo-fixes, message improvements, adding unittests, that kind of thing. But I don't want to end up with one big set of changes for all the little things I do on the train; I want sensible commits of one logical change after another.

So basically I want simple feature branches and offline commits. It needs to linearize history and integrate with a weird SVN setup. Previously I used Mercurial and its hgsubversion extension to get this effect; now I wanted to do the same with git, for learning purposes (and then I can futz with the KDE repositories again).

Most of what I eventually built to give me a nice git-based workflow that meshes with the central SVN repository is based on a series of blog posts

from TF Nicolaisen. They were really useful.

Anyway, my workflow now looks like this:

Pick a ticket N from our bug tracker (it's TRAC, with some customized statistics modules I wrote and a handful of third-party TRAC-hacks for planning purposes),
Update my git repo from SVN with git pull,
Start a git branch for my work on the ticket with git checkout -b ticket-N,
Do my thing, with as many commits and experimental branches as needed, and then clean up (remove debug-commits, maybe merge some small steps), with git rebase -i,
Rebase onto the updated upstream SVN with git svn rebase,
Push the whole thing into SVN with git svn dcommit,
Drop the branch, since it's in SVN now; the detached commits will get garbage collected eventually.

That's a cromulent git workflow, and except for excessive rebasing and the push to SVN at the end, usable for regular git work as well.

The setup I've ended up with is illustrated here; on the server side, there are three repositories: first is the central SVN repository. This is the official and canonical source and other developers commit to SVN normally. Then there's the git-fetch repository, which pulls revisions from SVN and puts them into git commits. This one has only the SVN commits in it. Third is a bare git repository, which is where I pull from and where the fetcher repository pushes things. This bare repository also has a few things that are not from SVN commits -- I push branches here if I want to share them over git with other machines I work on, and there are a few tags in it marking some events in the history of the repository.

Client-side, there's whatever clones I make of the bare repository.

Getting this set up was a matter of configuring and cloning the right bits; this I mostly did by following the steps described by TF Nicolaisen, except that I needed to get the authors map just right ahead of time, and I'm only git-bridging one single branch from SVN (namely, development, which is our trunk), and I'm not interested in ancient history (which is gnarly), but only fairly-recent commits from SVN. So here's what I did:

Figure out what part of the repository is interesting; for me, that was the development/ branch in SVN, from revision 28754 onwards.
Figure out who is committing to the repository. SVN has the usernames, while git needs to have a name and email address. This requires a map of SVN authors to git authors. It's also pretty much essential that the author in the mapping file matches the author and SVN username you configure in client-side git clones, or you'll get a multitude of branches, all twisty and all very much alike. After some history examination (in SVN) and discussion on useful git author names with the other developers that might use git, we ended up with a file like this:

adriaan = Adriaan
bassie = Bassie

I ended up committing this file into SVN so that it would be available -- and could be updated -- for general use. It lives indevelopment/.git-authors, ie. in the root of what I'm going to follow with git.
Do the initial clone of the repository. Unlike the recipe here, I have an authors file (a copy exported from SVN, because it's got to be there before the clone runs), and I don't use the standard layout. I did this on the server, so that the SVN repository is local. The repository lives at/home/svn/project, and the fetching repo will be git-project-fetch

cd /home/svn
git svn clone -A author-map -r 28754:HEAD file:///home/svn/project/development git-project-fetch
Set up the bare repo, which will be git-project:

cd /home/svn
git init --bare git-project
Configure the fetching repo to push changes:

cd /home/svn/git-project-fetch
git remote add origin ../git-project
Then modify .git/config so it reads as follows (again, this is all according to TF Nicolaisen's recipe, only with a restricted SVN tree, a starting revision, and an author map):

[remote "origin"]
  url = ../websites.git/
  fetch = +refs/remotes/*:refs/remotes/origin/*
  push = refs/remotes/*:refs/heads/*

Because there's only one branch here (namely development), there's no need to configure which branch needs to be checked out by default.
Add a post-commit hook to update the git repositories:

if /usr/bin/lockfile -2 -r1 /tmp/project-gitsvn ; then
( cd /home/svn/project-fetch && /usr/bin/git svn fetch && /usr/bin/git push origin )
rm -f /tmp/project-gitsvn
fi

At this point, every SVN commit gets pulled into the fetcher-repo by the git-svn code and then pushed into the bare repository as if it's a normal git repo. The server side is done. One the client side, I set up ssh access to the SVN (and now git, too) server. Then getting a correctly configured client-side clone is as follows:

Use an ssh URL for access to both the SVN and the git repositories

git clone -o mirror ssh://project.example.com/home/svn/git-project
cd git-project/
git checkout -t mirror/git-svn
git svn init --prefix=mirror/ ssh://project.example.com/home/svn/project
git svn dcommit

That last dcommit -- since the repo has just been cloned -- just updates all the revision numbers. It's not really necessary, since it will happen with the first real SVN commit from git anyway.
Client side, we also need to set up the git author and the SVN authors files, so that they match with what happens on the server side. Failing to configure these consistently will cause lots of extra commits to show up in your local clone. Note I'm configuring this in the repo-local configuration, not globally, so it doesn't interfere with the recommended KDE git setup. It uses the authors file I previously checked into SVN at the root of the development branch -- now in the root of the git repo:

cd git-project/
git config user.name "Adriaan"
git config user.email "ade@example.com"
git config svn.authorsfile .git-authors

And with that, I've got a usable git clone that I can use for feature branches on the train, and that can easily push back to SVN. After using this for a few months I've finally gotten comfortable enough with git -- feature branches, and sometimes futzing aroud to massage history into a usable form, and dealing with the rest of the git tools -- to touch KDE git repositories again.