Splitting a mercurial repository : HgSplit

Wed 09 February 2011 by jj

Here at the R&D lab we use mercurial for our code versionning.

One of the problems we faced was that sometimes we would commit big files like pdfs or raw data into a repository.This is fine, as long as the repository remains for internal eyes only.Then at some point later, we want to publish our code, but having such garbage in the repository is totally not cool.

So I wrote this tool.

Basically, it allows you to clone a part of an existing repository, keeping the history intact (commit date, author, and message), while applying a filter on the tracked files.

So now we can duplicate our repository, while purging all traces of our ugly data file ; but we can also split a repository holding both code and documentation to two separate repositories, etc.


As usual, our code is published under an open-source license.

The particular code is published under the terms of the WtfPLv2


You can grab your copy from github.


If you value your work, always work on a copy of your main repository, just to be safe.

The program takes a few arguments, which are pretty self-explainatory.

Usage: hgsplit [options]
    -s, --repo <path>                path to the repository to split
    -d, --subrepo <path>           path to the directory that will hold the repo
    -x, --exclude                        ignore files in the filelist, include all others
    -r, --regex                           the file list is a list of regexps
    -i, --initial-commit <commitid>  start from this commit (linear commit number - 0 == init)
    -f, --final-commit <commitid>    end at this commit (linear commit number)
    -v, --verbose                       be verbose
    -l, --list <listfile>                 file holding a list of files, one per line


We have our main repository in repo/. It holds a bunch of files, including some *.raw files that we want out.

# first, make a copy of the repo just to be safe
$ hg clone repo repo_copy

# split the repo, ignoring all files matching the given regexp
$ ruby hgsplit.rb -s repo_copy -d repo_clean -r -x '\.raw$'
subrepository saved to repo_clean/, saved 34 changes from 35 total.

# now we can shamelessly publish our code
$ rm -r repo_copy
$ cd repo_clean
$ hg push git+ssh://github.com/me/awesome_repo/