Having discussed (or blogged) a bit about sprints and their productivity, maybe we should collectively sit down and try to figure out what makes a sprint successful. That means defining what a sprint is – i.e. what are the artifacts under study – and what we mean by successful.

[But before we go on: Bluestorm, thanks for your comments, by the way, on gaming the system. I hadn’t heard the term ‘affective grounds’ before, but it covers a lot of the effects we’re seeing here. Also, I should add as a disclaimer that I work with Paul every week and drink beer with him too, so he and I are sort of on the same page.]

So, I’m going to shake a definition of sprint out of my sleeve here. I’m going to disallow too small – i.e. Kevin visits Aaron and they hack – and too large – everyone goes to Akademy and we hack – events, for one thing.

A sprint is a gathering, in one physical location for a limited duration (less than 7 days) of a small group of developers (or contributors; don’t let me rule out, say, a usability sprint; group is five to 15 people) of a defined project (you should be able to say ‘this is the <foo^gt; sprint’ and have it understood; not all of the developers need to belong to the project already) with a definite goal (stated beforehand; this tries to rule out ‘lets get together and hack’ and desires a little more vision) and a daily focus on working towards that goal (this suggests some minimal management during the sprint; Paul can best explain how the SQO-OSS ones go).

Actually a short sentence if you leave out all the parenthetical addenda. There are some arbitrary constraints in there; I’d be happy to hear of arguments why some of the arbitrary constraints should be adjusted or relaxed.

This brings us to successful. Given a sprint – the object under study – can we say if it is successful? Or how successful? I don’t particularly care for a [0.0,1.0] scale, but it is useful. We might want to introduce some kind of adjective words for levels of success (such as ‘not’, ‘partial’, ‘adequate’, ‘unqualified’, ‘rousing’). Bear in mind that the notion of success must be measurable which means that general well-being and happiness of developers does not count. That’s not really measurable, is it?

Let’s turn aside for a moment: suppose we have a definition of success. We can go through our historical cases of sprints and calculate the success score. Then we go through and we have to validate the results. Was the KDE PIM sprint in the Netherlands a ‘rousing’ success as the numbers would suggest? Of particular interest is actually both the negative results (finding a sprint that the numbers say really wasn’t a success – we would really need attendee validation that this is the case) or mis-files (say that the attendees find something a rousing success but the numbers say it was just so-so). In both cases we need to check back with the attendees for a qualitative validation of the sprint. And it would be very useful to have some other metrics applied indicating some other form of success (e.g. retrospective function point analysis). Both big jobs, and that is all based on the assumption that we can define success in some global fashion for sprints anyway.

Well, before we get to defining success, we should consider what we can actually measure about a sprint. Because that’s the raw stuff we can use to define success. There’s not much point in defining success based on the colours of socks worn at the event unless we start consistently taking pictures of feet. And that’s just icky. I’ll list some things that I can come up with:

  • Number of attendees; number of days; number of attendee-days.
  • Number of new introductions (pairs [a,b] who haven’t met before, also known as the n00bness quotient and indicative of how the community for the given project is growing).
  • Number of commits by attendees; total commits during event; number of commits to project; relative amount of commits on average per day compared with the medium-term average over the period [t-60, t-3] (t is in days) for the project, where the -3 is there to account for travel.
  • Amount of beer drunk by attendees (we don’t have this data for past events though).
  • Number of new participants still active in the project at t+30. Proportion of all participants still active at t+30.
  • Lines of IRC log during sprint.
  • Number of bugs closed in Bugzilla during sprint. Number of bugs raised in [t,t+180] in the code added during the sprint (note: that’s hard to measure).
  • Number of design documents added to KDE SVN. Number of design pages added to Techbase. Number of Techbase changes.
  • Mail messages by attendees to project mailing lists during events. Relative amount before, during and after.
  • Length of time with increased commits-per-day in the project (at least historically, we have seen that a sprint means a boost in the number of commits per day for some time, six to ten weeks, after the end of the sprint).
  • Number of dot stories about the sprint. Number of CNet articles about the sprint. Number of press articles in general. Number of times the sprint is mentioned on the project’s website.

Most of these are measurable or could be made measurable by sending an observer to the event. My interpretation of “success” in a sprint would be some combination of new developer retention, all developer retention, relative commit gain and length of time thereof, along with some extra consideration of non-SVN artifacts (as Anne-Marie points out, doing a design document can be more important than code monkeying).

That’s my initial stab at defining success for a sprint, without any intention to validate it or actually calculate it. What’s yours?