How to merge, 2/3

In the previous article, I’ve implied that merging is only harder and riskier, because many of us don’t know how to do it right.

Well, this is what I’ve learned, so stop worrying now and master merge in these three simple steps:
1) Accept that you’re not done, until it is merged.
2) Learn how to quickly and effectively regain full control in the merge situation.
3) Learn how to make decisions about the changes, and how to use the merge tool to apply them effectively.

The first step is more about changing your habits than undertaking some actions. Developers working with the traditional centralized VCS tend to quickly develop the notion of the Holy Trunk (or Holy HEAD). This is the sacred place where all the latest, the juiciest source code gets commited. Commiting to it gives them warm fuzzy feeling of being safe and sound. Being disconnected from it for prolonged periods of time makes them anxious and fearful. Because with every day of disconnection, upcoming merge is becoming more and more inevitable.

On the other hand, you can decide to live in a more pluralistic world, where you always have some branches, and some code traveling up or down the branches. By mere existence of this other point of view, you can include merge into the very normal and ordinary development process.

Just in case some of my readers is not aware of a typical git development process, here is a very short description. You have the master branch, and the HEAD state of it always corresponds to the exact state currently released in production. In case a hotfix is required, a hotfix branch is created from master’s HEAD, the hotfix will be made and then released, and then it will be merged both to master, and to the develop branch. The develop branch is basically a place where work of different in parallel working teams gets integrated. A CI server could periodically build develop’s HEAD and run the unit tests. When a new feature development starts, a new so-called feature branch will be created from develop’s HEAD. All developers fetch it locally, and work in there. If needed, they create they local sub-branches from that feature branch, for example to be able to temporarily test some idea. When a local development reaches some logical point, developer would fetch the HEAD of the feature branch locally, then merge it with his latest state, and commit the new feature branch’s HEAD. Periodically, someone would fetch the latest develop’s HEAD and merge it into the feature branch’s HEAD, just to keep the merge scope small, be up-to-date, or even potentially profit by reusing code meanwhile created by other teams. When the feature branch reaches mature state, the HEAD of develop will be merged into it to pick up the very latest changes of other teams, and then its HEAD will be merged into the develop’s HEAD to “graduate” the feature into the integration phase. At some point, the HEAD of develop will reach a potentially releasable state. A new release branch will be created from develop’s HEAD, where test builds can be done, QA testing can be performed, and bugs can be fixed. Meanwhile, some other teams could continue working on features not planned for the nearest release. At some point, the HEAD of the release branch will be released, and then it gets merged into the master branch, and back-merged into the develop branch.

As you can see, merging is not only normal and ordinary in this process, it is an essential part of the whole and is performed very frequently. When I say frequently, I mean it. While using git, I was performing around ten merges per day. Of course, 8 of 10 were only formal merges (this is when one of the branches under merge wasn’t changed since forking), and from the rest, one of the two merges could be performed by git automatically. The remaining one merge conflict per day, I had to do manually.

So, how do you solve a merge conflict every day, and remain sane and happy person?

To merge two branches, one has to go back in time up to the point where they forked. Then analyse changes in each branch, and re-apply all of them to the state of forking. A decent VCS would do it for you fully automatically, marking possible merge conflicts, and launching a 3-way merge tool. What you now have locally is a potentially ready-merged state, which might or might not contain merge conflicts.

There are two kinds of merge conflicts: detectable by VCS and not detectable by it. Start by eliminating the conflicts detected by VCS. To do that, you have to regain full control about the situation. Here is how you do it:

Merge tools would typically launch with the first merge conflict pre-focused, and they try to automatically guess the result, in an effort to provide a starting point. You better ignore this urge to action, step back, and understand the underlying concurrent changes. When I say “changes” I mean semantic changes to software put in context of the corresponding larger task, for example “introducing new property to class Article to accommodate new set of prices according to Issue 1234”. You goal should be to learn about all concurrent changes involved in all merge conflicts, from both branches.
To achieve this goal, first, look at the diffs between the latest and the previous versions, separately in each branch.
If this doesn’t help, navigate from the file change to the corresponding commit, and then read the commit message and read all changes in all files in the same commit to make a complete overview of what is going on.
If you still cannot make a coherent picture or even intention behind the change, go to the developer who has made this change, and talk with him; or have him to come by and merge together.
Note that if you still remember exactly what has happened in your branch, you halve the efforts – thats why it is a good idea to merge often (see above).
Besides, reduced merge scope also reduces probability that the other branch will contain _several_ changes affecting the same line of code, which is always harder to handle.
Note how regular training doing all of the above would help a lot, _and_ also will improve your source code reading skills.
Here is another trick: sometimes, one of the branches will contain a lot more changes than another one. You can cut an edge by understanding only the branch where less changes have happened, then fetching the state of the another branch, and re-applying the changes from the first branch.

Now, when all related changes are understood, it is time to make a decision, which changes will make it to the merged version:

Sometimes, one change is made obsolete by another change.
Sometimes, a change has been made without a good reason, so it is a good thing to revert it. And yes, you’re doing some sort of “active code review” during the merge, and code reviews are considered to be generally a good thing to do.
Sometimes, applying changes in the right order is straightforward. And sometimes it is not, and additional architectural considerations have to be done first. Don’t hesitate to discuss them with the team.
Having the “no code ownership” policy will help a lot here; because it not only allows, but also virtually forces every team member to be able to take important decisions about every part of the system they’re changing. Nevertheless, feel free to communicate with your team mates to inform them about the merge conflict and your decision.

Having decided about the desired outcome, the rest is just a bunch of source code edits. Make sure your merge tool supports at least syntax coloring, and allows you to navigate freely between conflicts without forcing you to solve first conflict before you can proceed to the next one; oh, and of course you should understand how to use the merge tool just like you understand how to use the source control editor of your IDE. For C development with git under Ubuntu, I’ve found “meld” to be sufficient.

The next very important step is verification of the merge result. First, it is possible to introduce some typos doing source code edits. Second, there can be merge conflicts undetected by VCS (remember them?). This for example can be the case, when a class with the same name will be added in both branches, but physically stored in different files. As most of VCSes are still (yet?) too dumb to understand semantics of the programming language, they won’t mark it as a conflict.

Here is how you verify:

Full clean re-compile of the merged version.
Run unit tests. Existence of unit test will help a lot (and not only in this situations).
Where unit tests are economically not feasible (i.e. UI), manual testing, at least the smoke testing, should be performed. I usually also fully test the main scenario of the feature I was working on; in case of a merge error, at least my own change will be correct ;-)

Note that this verification cannot find just any possible issues with the merge. But it is exactly the same amount of verification that is usually done when writing new code (and even without any branching / merging situation), so that it will be definitely “good enough”.

A verified merge can be commited.

In the next blog post, I’ll try to explain why learning to merge worth it, and how to start learning.

Maxim Fridental

Leave a Reply

Categories

Archive