Aug 072012
 

Sometime, you need to check that all the commits in a branch are building correctly. For example, when a rebase has been done, it is possible you or diff has made a mistake during the operation. The building operation can be run against all commits of the current branch with the following one-liner (splitted here for more readability):

for COMMIT in $(git log --reverse --format=format:%H origin/master..HEAD); do
    git checkout ${COMMIT} ;
    make -j8 1>/dev/null || { echo "Commit $COMMIT don't build";  break; }
done

The idea is trivial, we build the list of commits with git log using a simple format string (to get only the hash). We add the reverse tag to start from the oldest commit. For each commit, we checkout and run the build command. If the build fails, we exit from the loop.

The result is a directory with the non-building code. Thus, don’t forget to get back to the original branch ORIG_BRANCH by running a git checkout ORIG_BRANCH.

Aug 022010
 
I have recently faced the challenge to rewrite a git repository. It has two problems:
  • First problem was small: an user has commited with a badly setup git and E-mail as well as username were not correctly set.
  • Second problem seems more tricky: I was needing to split the git repository in two different one. To be precise on that issue, from the two directories at root (src and deps) have to become the root of their own repository.
I then dig into the doc and it leads me directly to ‘filter-branch’ which was the solution of my two problems. The names of the command is almost self-explanatory: it is used to rewrite branches.

Splitting the git repository

A rapid reading of ‘git help filter-branch’ convince me to give a try to the ‘subdirectory-filter’ subcommand:
--subdirectory-filter
Only look at the history which touches the given subdirectory. The result will contain that directory (and only that) as its project root. Implies --remap-to-ancestor
Thus to split the directory, I have simply to copy my repository via a clone call and run the filter command:
git clone project project-src
cd project-src
git filter-branch --subdirectory-filter src
Doing once again for the deps directory and I had my two new repositories ready to go. At once during this cleaning task, I wanted to avoid to loose my directory structure. I mean I want to keep the ‘src’ directory in the ‘src’ repository. Thanks to the examples at the end of ‘git help filter-branch’, I’ve found this trickier command:
git filter-branch --prune-empty --index-filter \\
 'git rm -r --cached --ignore-unmatch deps' HEAD
This literally do the following : for each commit (--index-filter), suppress (rm) recursively (-r) all items of the ‘deps’ directory. If a commit is empty then suppress it from history (--prune-empty).

Shrinking the resulting repository

‘deps’ directory was known to take a lot of disk space and I thus done a check to see the size of the ‘src’ directory. My old friend ‘du’ sadly told me that the split repository has the same size as the whole one ! There is something tricky here. After googling a little bi I’ve found out (mainly by reading Luke Palmer post) that git never destroy immediately a commit. It is always present has an object in the .git/objects directory. To ask for an effective suppression, you’ve got to tell git that some objects are expired and can now be destroyed. The following command will destroy all objects unreachable since more than one day:
git gc --aggressive --prune=1day
Unreachable objects means objects that exist but that aren’t readable from any of the reference nodes. This last definition is taken from ‘git help fsck’. The ‘fsck’ command can be used to check the validity and connectivity of objects in the database. For example to display unreachable object, you can run:
git fsck --unreachable

Fixing commiter name

My problem on badly authored commits was still remaining. From the documentation, --env-filter subcommand was the one I need to use. The idea of the command is that it will iterate on every commit of the branch giving you some environnement variables:
GIT_COMMITTER_NAME=Eric Leblond
GIT_AUTHOR_EMAIL=eleblond@example.com
GIT_COMMIT=fbf7d74174bf4097fe5b0ec559426232c5f7b540
GIT_DIR=/home/regit/git/oisf/.git
GIT_AUTHOR_DATE=1280686086 +0200
GIT_AUTHOR_NAME=Eric Leblond
GIT_COMMITTER_EMAIL=eleblond@example.com
GIT_INDEX_FILE=/home/regit/git/oisf/.git-rewrite/t/../index
GIT_COMMITTER_DATE=1280686086 +0200
GIT_WORK_TREE=.
If you modify one of them and export the result, the commit will be modifed accordingly. For example, my problem was that commit from ‘jdoe’ are in fact from ‘John Doe’ which mail is ‘john.doe@example.com’. I thus run the following command:
git filter-branch -f --env-filter '
if [ "${GIT_AUTHOR_NAME}" = "jdoe" ]; then
GIT_AUTHOR_EMAIL=john.doe@example.com;
GIT_AUTHOR_NAME="John Doe";
fi
export GIT_AUTHOR_NAME
export GIT_AUTHOR_EMAIL
'
Git show here once again it has been made by semi-god hackers that have developped it to solve their own source management problems.