Sunday, July 13, 2014

OpenShift Git History CLean Up

Recently, I found that git history start to fill up the allowed disk space in the gear as I was always deployed the build war files into the gear through git. And git store the entire binary delta every time when there is a new push.

So I start to look around for an ultimate solutions for the problem.

The first solution I encounter is from OpenShift online answer that describe how to clean up remote repo within the gear

https://www.openshift.com/forums/openshift/how-to-erase-all-history-from-a-git-repository-on-openshift-and-start-over-with

This solution works rather well. It removes the entire history. So I used for quite some time.
#!/bin/bash



# Clean up gear info

ssh $gear_ssh_url "cd git; rm -rf *.git; cd *.git; git init --bare; exit"

# Create bare repo to overwrite history (or do a new git clone, it is just I found it is faster just do a new repo and point remote repo)

git init

current_date=`date`

git commit -m "Automatic Push as of $current_date due to code change"

git add origin $gear_git_url

git push origin master -u --force



However, recently, I start to find that I should at least keep one history of backup, so I can rollback more easily (I found that binary-deployment seem to be harder to management then the git, maybe I did not understand the full feature yet). So I start to look for a git solution to clean up history.

Following is the other solution I found

http://stackoverflow.com/questions/11929766/how-to-delete-all-git-commits-except-the-last-five

It works very well. As it do remove the commit history and there reduce the size but also leave at least one history commit that I can revert back to.

What I found is that there is one thing I do not recognize in the solution, it will only impact local repository until there is a new commit to push the change along with these new repo data clean up to effect local repo. So without a new commit, if somebody clone it again, all history will still be there as the change is not in repo yet.

So I did some adjustment

#!/bin/bash




# Reference from the article http://stackoverflow.com/questions/11929766/how-to-delete-all-git-commits-except-the-last-five

current_branch="$(git branch --no-color | cut -c3-)" ;

current_head_commit="$(git rev-parse $current_branch)" ;

echo "Current branch: $current_branch $current_head_commit" ;

# A B C D (D is the current head commit), B is new_history_begin_commit

new_history_begin_commit="$(git rev-parse $current_branch~1)" ;

echo "Recreating $current_branch branch with initial commit $new_history_begin_commit ..." ;

git checkout --orphan new_start $new_history_begin_commit ;

git commit -C $new_history_begin_commit ;

git rebase --onto new_start $new_history_begin_commit $current_branch;

git branch -d new_start ;



git reflog expire --expire=now --all;

git gc --prune=now;



# Still require a push for remote to take effect, otherwise the push will not go through as there is no change

if [ -f .invoke_update ];

then

      rm -rf .invoke_update;

else

      touch .invoke_update;

fi

git add -A .;
current_date=`date`;
git commit -m "Force clean up history $current_date";
git push origin master --force;

It first did as in post to clean up the local repo, make a dummy commit, push the change to remote.

Thanks.

Sincerely,
Danil