How to merge, 3/3

In the previous article, I’ve described a lot of activities one should learn and master to be able to merge efficiently. But why does it matter at all?

Because developing in multiple branches significantly increases average software quality in production.

I hated the word “bug-free” before. We all know that any software has bugs. What I’m calling “bug-free” today is a software having only bugs not known to the team, or when bug fixing costs exceed benefits. And this is a realistically achievable state, although not every team manages it. To be able to achieve this state, you must have enough time and money, and (more importantly) combined team experience and competence, and the actual development process, must ensure that speed of bug fixing is greater than speed of introducing new bugs. Wow, actually, this would be an interesting and easily traceable metric, but I digress, and this is a topic for another post.

Having achieved this bug-free state, and released the software in production, the next feature development can be started. This will inevitably drop the quality of the trunk HEAD; sometimes, the system won’t even compile, often crash in run-time, and almost always it will demonstrate a feature set that much incomplete that it is impossible to release that into production.

As long as there is only one new feature, bug fixing, or change request at any moment of time, this problem is irrelevant. Eventually, changes will be finished, known bugs will be fixed, the software reaches bug-free state, and will be released.

Except, in the real world, it is never possible to work only on one thing at a time. More realistic scenario (like, 99.9999% of all popular web-sites have it) is that you have to fix previously unknown bugs suddenly found in production, and introduce a small change to provide landing for yet another marketing campaign or to perform yet another A/B test, and you have to work on at least one new big feature.

In this situation, to ensure the bug-free quality having only one branch, one need to orchestrate all developments on the time-line so that all of them will reach the bug-free state on the same day, so that on that very day a release can be built and rolled out. And no matter how hard you try, this means that some changes will have to wait until they can be started, and some implemented changes will have to wait until they can be released, and that no change however small it is, can be predictably implemented with bug-free quality in less than several weeks. Oh, yeah, and this also means that project managers will be pronounced saints by the catholic church, knowing how much Christian virtues they must possess to successfully direct this orchestra.

So how to solve this conflict between average time-to-market for a change, and the bug-free state of the system?

One solution to this problem is to adopt continuous deployment. That is, every time a new commit is checked in to the trunk, it will be built and unit-tested, and if tests pass, it will be automatically deployed into production. This way, we have the least possible time-to-market. New features can be developed behind new URLs unknown to the general public. But, to achieve acceptable average bug-free state, bugs found in production will have to be fixed very promptly, which is impossible if you have a 40-hour week and a team locally concentrated in the same time zone.

Another solution is to use multiple branches. Essentially, every branch has its own, independent quality state. If you consider the git development model described in the previous article, the master branch is per definition 90% of time bug-free. Only when a new bug is found, it stops being bug-free for the period of time when a hotfix branch is developed, released, and merged into master. The release branches are spawned from develop branch when it is in the “pre-bug-free” state, that is, it is fully feature-complete but buggy. The release branch will be thoroughly tested, bugs will be fixed, and when it reaches the bug-free status, it can be released and merged into master. The develop branch as well as feature branches may have any state at all. Typically, feature branches will be merged into develop only after reaching the feature-complete state, so that the develop branch never has an incomplete (and therefore not releasable) feature set.

In this development model, you can have virtually any time-to-market, from one hour for a hotfix, up until a year for a very low prio feature, sitting in a separate branch and being developed only when somebody has by a lucky chance some spare time. And not only this approach doesn’t compromise the software quality, it is also increases the time expectation before the first heart attack for the project managers!

Unfortunately, multi-branch development can be only employed by teams, where each team member is a trained merger. And this is not always the case.

Good news: everybody can start merge training at will. Here is how it can be done:

  • Install git
  • Learn git basics
  • Perform
    cd your_working_dir
    git init
    git add .
    git commit -m "Initial commit"
  • Configure your legacy VCS to ignore the .git folder in the working directory.
  • Now, when your start developing a new feature
    git checkout -b my_feature_branch
  • After some changes
    git add file1 file2
    git commit -m "Commit message"
  • And when your change is ready to be commited to the legacy VCS:
    git checkout master  # now get latest changes from your VCS. Git has extensions for many popular VCSes.
    git merge my_feature_branch # at this point, you could possibly get a merge conflict. This is your chance!

Alternatively, if you feel you’re smart enough to learn it hard way (I’m definitely not), you could use github:

  • Pickup some popular and often forked repository on github.
  • Clone it locally.
  • Find some interesting merge commit (start with trivial merges! you have to understand code and architecture of an unknown project, AND merge. This is hard.)
    git log --merges 
  • There will be a string “Merge: <some SHA> <some other SHA>”, these are the HEADs of two branches participated in the merge (before the merge)
  • Fetch the state immediately before merge, create your own training branch, and merge
    git checkout 
    git checkout -b my_merge_training
    git merge 
  • Look at the changes of the branches, and note the commit SHA of the point when they are forked
    git show-branch  
  • Get a list of all merge conflicts
    git status --short | grep UU
  • For each file in the list, read the change history from fork point to HEAD of each branch
    git log .. 
    git log .. 
  • For each file, perform merge
    git mergetool -y 
  • Commit, then compare your result with the original merge
    git commit -m "My merge result"
    git diff HEAD..

Have fun!

Leave a Reply