When coding, I tend to produce a lot of turds. Many of them even get committed to some repository somewhere because I use the repo as a means to checkpoint my thinking. My colleagues often suffer from this and I get the Pointy Hat of “you broke the build” quite regularly. Sometimes it’s even granted after committing the anti-turd that fixes the build again. But the net effect across a day or across a coding session is usually an improvement; the intermediate steps are not pleasant.

[[ When I say a piece of code scores three turds, that stands for “Total Un-Resolved Defects” and it is a documented measure of software quality; the EBN for instance counts these things as well but I did not want to use the term there because of the humongously bad social effects it has. In this vein I thought about some nice illustrations for this entry, but couldn’t find a tasteful icon of a dog turd – lazyweb don’t fail me now! ]]

So it would go with feature development as well, or with cleanup work. Let’s take a hypothetical situation in which a diligent contributor picks up one of the junior jobs – fixing spelling errors in the comments in API documentation. That might take a few days, and it’s conceivable that the contributor even goes back and changes wording after fixing spelling errors. There’s no guarantee that this particular junior job is done in one sitting; it might be done in several sessions spread across days and with the possibility of going back and redacting earlier changes I can see it being done in multiple commits as well.

Or is this just my sloppy practice of committing early and committing often rearing its ugly head? Should we push for a “don’t commit until it’s complete and done” policy? It doesn’t seem supportive of community work to me at all.

As a result, I find myself in the position that I can point to two revisions and say “from here to there is a complete thought, a consistent and coherent unit of development that makes sense to view as a whole.” But there’s a lot of crap in between. I can, a priori, say that there is no nutritional value in any of the revisions strictly between here and there except if you’re interested in in finding out my work patterns, commit rates and use of profanity in commit messages.

So if I were to do private development in a local repository, I would continue to work like this: commit partial fixes and intermediate steps towards a fix. But then I would like to publish it in a cleaned up fashion: as one giant leap for mankind which fixes all the spelling errors in kdelibs, for instance. Doing that – where picking suitable “giant leaps” is important, as you don’t want to power plant and you don’t want to cause conflicts with others – makes history more understandable and prevents needless breakage. It’s a pain for researchers, though, as a cleaned-up history is less interesting as an object of study.

With SVK (the offline SVN mirroring thing) you can “lump” a bunch of commits when pushing to somewhere else. The resulting log message (usually) includes all the log messages of the constituent commits, but the whole is just one net change across the commits that you have lumped. I really like it because it allows me to express what my meaningful (to the outside world) units of change are regardless of what the practical units of change in my own process are.

Finding the right tools for this same action for git or for Mercurial has been a bit of a wander around the Interwebs.

There’s cherry-picking, which is either git-cherry-pick(1) or hg transplant, but both of those seem to move changesets (e.g. commits) around and do not do the merging into one that I would like. It seems quilt (from the git world) inspired a similar patch-queue mechanism in Mercurial. I didn’t look for better git tools, but ended up with Mercurial concatenating changesets which looks to be just what I want; there’s a pure Mercurial solution and a patch-queue solution.

This makes it possible to pull in patches one-by-one from a devel repository, then merge them into one coherent change and push that to another repository. If you look at my previous ramble around DVCSsen, that kind of “merge right” on the development highway means that changes become more high-level as they head towards the exit (or winter). At some point you might want to stop aggregating, though – I don’t think a giant commit in winter saying “Update KDE 4.1 to KDE 4.2” is a meaningful changeset. So as usual there is a middle ground to be found in abstracting and providing a meaningful aggregation of changes and in providing useful history.

A rule of thumb that presents itself is this: things pushed to summer should be complete enough that they can have a high-level description as a commit message that is immediately useful to the commit digest. Let’s save Danny some time like that, ok?