User Tools

Site Tools


software:svnsyncgit

Creating a two-way SVN <> GitHub sync

Intro

When migrating PDCLib from Bitbucket / Mercurial to a local Subversion, I wanted to provide the advocats of Distributed VCS with a way to get the PDCLib sources “their way”.

As Erin Shepherd pointed out in one of the many emails we exchanged, Git seems to have pretty much won the Version Control battle. I still much prefer Subversion, but I realize that being present on GitHub would certainly not hurt the project.

But that meant keeping the Subverion repo and the GitHub repo in sync… and Erin was quite certain that git-svn was the worst of both worlds combined. So it'd better be a real Git repository, and not some SVN / Git chimera – I wanted to provide “real” Git access to the PDCLib sources.

There were multiple how-to's available online on how to achieve a two-way sync, but none of them really did work the way they were supposed to. Ben Lobaugh's article came really close, and was the most helpful.

In the end, I figured a write-up that is not missing one or two crucial steps would be nice, so I did one.

Initial Setup

So I had the SVN repo at svn://rootdirectory.ddns.net/pdclib. To complicate matters, that repo had two Major branches in it, trunk and shepherd, which I would have to synchronize with two just-as-separate Git branches.

I did set up an GitHub account and a PDCLib project. I did initialize the Project with a .gitignore. It would have been “cleaner” to start with an empty repository, but I wanted to showcase how to deal with this two-way sync if the Git repository is not empty.

Get a local clone

This is easy:

git clone git@github.com:/DevSolar/pdclib

You need to have copied your SSH pubkey to your GitHub account for this kind of authorization to work (click on your avatar in the top-right corner, select “Settings” from the drop-down, then “SSH and GPG keys” in the left sidebar).

Change into the freshly cloned Git Directory:

cd pdclib

Tell Git about SVN

Now we tell Git that there is another repository which we want to fetch data from (a.k.a. “setting up a remote” in Git parlance). As it is a Subversion repo, we need to use git svn for that:

git svn init -s svn://rootdirectory.ddns.net/pdclib --prefix=svn/

The -s option tells git svn that the repository is using the trunk / tags / branches setup common for SVN, and –prefix=svn/ selects a name for the remote. See the documentation for git svn for other options.

Map the authors

Subversion logs commits with the user's login name. For Git, firstname lastname email@example.com is usually preferred. You can create an “authors file” that does this mapping. My authors.txt looks like this:

solar = Martin Baute <solar@rootdirectory.de>
erin = Erin Shepherd <erin.shepherd@e43.eu>
cycl0ne = cycl0ne <claus@poweros.de>

Fetch the data

Now we can fetch all the revision data from Subversion. For this, we need git svn again because we are talking to a SVN repo:

git svn fetch --authors-file=authors.txt

If git svn finds a user name not mapped in authors.txt, it will give an error message.

After this step is complete (which might take a while), the latest Subversion information is available to Git.

Git Branches

Let's set up the shepherd branch in Git.

git branch --no-track shepherd

I selected –no-track because, for all practical purposes, the shepherd branch is a disjunct project. That might not apply to your project, or be a dumb idea outright, but it's what I did.

Sync Branches

Now here comes the trick. Git users do not really like git svn, as it is clunky to use. Ideally, a Git user would not “see” the SVN plumbing at all – and that is exactly what we will do here.

We set up two additional branches for the sole purpose of synchronizing SVN <> GIT:

git branch --no-track trunksvn
git branch --no-track shepherdsvn

Then we link up both of these branches to their respective SVN remote. (This is where Ben Lobaugh's tutorial missed a step, the checkout of the sync branch):

git checkout trunksvn
git reset --hard remotes/svn/trunk

If you look at the contents of your directory, you will now see the trunk version of your project.

Make sure that everything is up-to-date (which it will be anyway at this point, but get into the habit early). We are talking to Subversion again, so git svn it is:

git svn rebase

Now switch to the target branch, and merge in the sync branch. We need to tell Git explicitly that it is OK to merge master (which contains nothing but .gitignore at this time) with the sync branch despite the two having nothing in common (yet). This is where an empty Git repo would have been easier, but I wanted to show you the option to make it work anyway:

git checkout master
git merge trunksvn --allow-unrelated-histories

We do the same thing for the second branch-to-be-synced:

git checkout shepherdsvn
git reset --hard remotes/svn/shepherd
git svn rebase
git checkout shepherd
git merge shepherdsvn --allow-unrelated-histories

Push Branches

We can now push the shepherd branch to upstream / origin (i.e. GitHub), and marking our local branch to be “tracking” that upstream branch in the process:

git push --set-upstream origin shepherd

Then we switch to our local master branch (which is already tracking upstream / origin), and push that as well:

git checkout master
git push

Any Git user cloning our GitHub repo now will see only master and shepherd. Neither the two sync branches nor the SVN remote will be visible to them, nor do they need to touch git svn, which is as it should be.

Sync SVN -> Git

To update the Git repo with changes made to SVN, we follow these steps (in our local clone directory which does have the SVN remote and the sync branches):

git checkout trunksvn
git svn rebase
git checkout master
git merge trunksvn
git push origin master

Or, for the shepherd branch:

git checkout shepherdsvn
git svn rebase
git checkout shepherd
git merge shepherdsvn
git push origin shepherd

Sync Git -> SVN

To update the SVN repo with changes made to Git, we follow these steps (in our local clone directory which does have the SVN remote and the sync branches):

git checkout master
git pull origin master
git checkout trunksvn
git svn rebase
git merge --no-ff master
git commit
git svn dcommit

Or, for the shepherd branch:

git checkout shepherd
git pull origin shepherd
git checkout shepherdsvn
git svn rebase
git merge --no-ff shepherd
git commit
git svn dcommit

Re-Building

If, for some reason, you lose that Git setup created above, you need to re-build it:

git clone git@github.com:/DevSolar/pdclib
cd pdclib
git svn init -s svn://rootdirectory.ddns.net/pdclib --prefix=svn/

Now you need to re-build the authors.txt file (as above). Then comes the tricky part: If your SVN repository has moved ahead of your Git repository (i.e. you sync Git → SVN), you need to re-build the setup with the SVN revision you left off with. You can find the revision number of each synced SVN commit in git log, in the git-svn-id:

commit 5950958ff57391789d9a164a56cd1ed87dedaa12 (HEAD -> master, origin/master, origin/HEAD)
Merge: 02e56d5 ec5835f
Author: Martin Baute <solar@rootdirectory.de>
Date:   Tue Feb 2 10:59:34 2021 +0100

    Merge branch 'trunksvn'

commit ec5835f129d8f9629d334657d4c31b40d6190724
Author: solar <solar@bcf39385-58cc-4174-9fcf-14f50f90dd47>
Date:   Mon Feb 1 21:15:12 2021 +0000

    git-svn-id: https://srv183.svn-repos.de/dev34/pdclib/trunk@992 bcf39385-58cc-4174-9fcf-14f50f90dd47
                                                               ^^^

A bit easier is to have the computer extract the number for you:

REVISION=$(git log | grep git-svn-id | head -n1 | sed -e "s/.*@//" -e "s/ .*//")

Now fetch everything from your SVN repository up to that revision:

git svn fetch -r0:$REVISION --authors-file=authors.txt

Set up the sync branch, and link it to the SVN remote:

git branch --no-track trunksvn
git checkout trunksvn
git reset --hard remotes/svn/trunk

Now rebase the branch to what you already fetched. The –local option keeps git svn from connecting to the repository (which would fetch the revisions SVN is ahead, which we do not want at this point).

git svn rebase --local

Now we merge the sync branch to our master. This is basically a no-op, but it sets the merge point from which we will proceed.

git checkout master
git merge trunksvn --allow-unrelated-histories

Now we are set up again, and can sync the SVN revisions we left out previously by the “normal” procedure.

git checkout trunksvn
git svn rebase
git checkout master
git merge trunksvn
git push origin master

Conclusion

I hope this little how-to helps you settling the holy VCS war.

software/svnsyncgit.txt · Last modified: 2021/05/14 22:23 by solar