Velocity targeting and velocity inflation

Continuing my mini-series on filling an iteration, velocity and all that I want to flag up a big big mistake: Velocity Targeting. Which leads to Velocity Inflation.

Velocity targeting happens when someone says: “We did 15 points last iteration, lets aim for 20 this iteration”. And when the team fails to meet 20 they say something like: “What happened? We didn’t meet our target?” – or perhaps they start assuming that because the target is 20 the can adjust the plans and message to stakeholders accordingly.

We can all fall into this trap: its called Hope. We hope for a better world. When it gets dangerous is when the person issuing such statements is in some position of authority, e.g. the word “manager” or “leader” is in their title, and they start issuing communications with the target as reality.

Given a few iterations the team will meet the target. However the means they use to meet the target may not be what is expected. And these may well create problems later on.

For example, the team might skip on testing or skip on refactoring. This results in a short term speed up which sacrifices long term maintainability and flexibility. You might actually choose to do this, with the agreement of the authority person, but this should be a conscious decision and you accept the long term slow down.

A more subtle but systematic problem is Velocity Inflation. In this case the team start giving larger estimates, so when work is done the amount done is greater, so velocity rises. The same amount of work is done but the point value is higher. (This can be a conscious or sub-conscious thing, I expect it is more often sub-conscious.)

In some ways this too is natural. Team members want to appear more successful, they want to achieve more, they want to please people – especially those who are in authority and set targets. But, and this is the real danger of Velocity Inflation, it undermines your ability to predict the future work capacity of the team because yesterday’s values can’t be compared to tomorrows.

Velocity inflation is just like financial inflation rational expectations and the Lucas critique need to be considered. It should come as no surprise that I’m going to quote Goodhart’s Law again:

  • “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

Targets are good when they help people to stretch and reach goals but in setting them you need to be aware of the side-effects. Simply to advocate targets is violation of Deming’s eleventh principle of management:

  • “11 a. Eliminate work standards (quotas) on the factory floor. Substitute leadership.
  • 11 b. Eliminate management by objective. Eliminate management by numbers, numerical goals. Substitute leadership.”

The solution is simple: don’t do it.

If you (quite naturally) want velocity to rise you have look elsewhere. That will be the subject of my next blog entry.

Filling an iteration too well

I want to stick with the theme of “how do I fill an iteration?” for a couple more entries. There are a lot of little nuances here, and what works for one team at one time might not be the best thing for another team, or even the same team at a different time.

I appreciated Ed’s comments on my last entry, I think they go to show how small variations work well for individual teams. However, sometimes variations hold problems.

Sometimes you come across a team that completes exactly the amount of work (measured in abstract points) during an iteration as they forecast they would at the start. For example: a team says it will do 10 points of work in the next iteration, and two weeks later they count up and they did 10 points of work.

Occasionally this happens, thats not unusual. Over time I’d expect it to happen more often but I don’t expect it to happen every iteration. When it does then something is probably wrong.

Statistically this is just very unlikely to happen. Yes a team will do roughly the same amount of work but exactly No. They are not doing the same work, the same tasks, the same events will not occur, the same people won’t work on the same things – and if they did we would expect them to get more done.

So when a team regularly scores the same points, week after week, as they expect to (and by implication the same number they did the previous iteration) then some force is making it happen. The real number should be higher or lower.

If the real number should be lower it means the team is busting a gut to do the work. Maybe they are working long hours, or maybe they are cutting on quality. Either way the work pace is unsustainable and problems are being stored up.

If the real number should be higher it means the team is satisficing. That is, they have the capacity to do more work but they are not taking it on so there is spare capacity. Sustainable yes, but not as productive as they could be.

I recently worked with a team led by a manager who would not allow the team to take on more work than they could guarantee doing. He did this because he didn’t want to explain to the project manager running the project that the team hadn’t done as much as they had scheduled.

The project managers in this company wanted predictability. If that is important to you then that’s a reasonable thing to do. But this company was trying to “go faster”. The managers were making a trade off: less work for more consistency week-on-week.

My preferred method is to always schedule slightly more work than the team expects to do. This way there is more work to do if the team find things go well. And if they don’t go well, or if they go badly, then nobody should have been expecting everything to get done anyway.

Two ways to fill an iterations

The fixed length iteration is a key part of most Agile methods. But the question is: how do you know what (i.e. how much) to put in the iteration?

There are two ways to determine how much is enough but what might be less obvious is that they are alternatives. It is easy to outline both and pretend they work together but really they don’t fit together.

The first way, lets call it the Scrum way, is to set a Goal. The team then commit to making this Goal. The objective is to find a Goal which is both achievable and challenging, and useful but small enough to fit in an iteration.

In this model the Product Owner comes along with some idea of what they want, the team talk about it with the Product Owner and, through conversation, come to an agreement on what can be achieved. The team then go for it.

The team “do what it takes” – if they need to work long hours they do; if they need to bang-heads together, they do; if they need to spend their own money, they do. Given this, one would assume that in those iterations where the team don’t have to move heaven-and-earth they could take things easy. If one iteration they work 14 hour days a reasonable quid-pro-quo seems that in those when they can finish a day early they do.

If at some point the team decide they can’t make the Goal, or the Product Owner says the Goal is compromised – things have changed and it is no longer wanted – then the team declare an Abnormal Termination of Sprint and everything starts over.

The problem with this way of doing things is around sizing the Goal. According to the Scrum literature, by aiming for a Goal the team rally round and choose to stretch themselves. The danger is that the development team start satisficing, that is: they promise enough to keep people happy but not so much that they are ever in danger of failing to meet the Goal.

Those who worry about satisficing probably also worry about what happens if the team meet the Goal early. This isn’t really explained in the Scrum literature I’ve looked at but I’m told that there should be a quick, mini-planning meeting with the Product Owner and some new work accepted. At which point I wonder: what happened to fixed iterations?

Conversely, looked at form the opposite point of view: what is to stop the business side putting on the developers and exploiting the situation to set difficult, time consuming, Goals?

There are plenty of developers in the world who have been bullied by their business partners into giving estimates which meet the business desire but have no real relation to the amount of time and effort it will really take.

Either way, the fundamental problem remains: how do you know how big to make the Goal?

Unless both sides change their mindset then the Goal driven model doesn’t really change anything. And changing mindset on both sides is a big task. One which doesn’t fit well with a gradual adoption approach.

Actually, I don’t think many people use the Goal driven approach to filling an iteration. If they did then I would expect to here about teams having spare time more often, and I would expect to hear about more Abnormal Termination of Sprints. I don’t hear of either so I don’t think this technique is used very often.

The second way of deciding how much to put in an iteration is to use an empirical measurement, i.e. velocity, lets call this the XP way.

In the model the Product Owner proposes some work as before. The development team do some quick estimates on how much effort is involved, the work is prioritised and work begins. The first time you do this you don’t know how much work to put in the iteration – how could you? You’ve not done this before. But the second time you can count how much work you did before and use this as a guide.

Iteration on iteration this count becomes more accurate and its what we call velocity. The technique of using the previous velocity to project the amount of work in the next iteration is called yesterday’s weather.

I prefer to use this technique and when I do I always slightly overload the team in the next iteration. That is, I schedule slightly more work than I expect them to do and everyone knows there is slightly more work than is expected to get done. In other words we expect something not to be done.

There is no point in scheduling even more work because the team isn’t expected to do it. There is no point in scheduling less work because it might be that the team has spare capacity.

If we get luck all the work we scheduled gets done, and if it doesn’t… well nobody should be expecting to get everything done, its just a question of how much.

I don’t update the estimates with actuals because that would be mixing apples and oranges. Estimate counts go in, estimates counts come out.

There are several problems with this technique. You cant be sure what will get done and what won’t, for some people that’s a big issue. Secondly, people like to relate the estimated effort levels to hours or days. When that happens the estimates become less accurate.

More importantly, the Goal driven method aims to stretch the team by challenging them to do something bold. Using velocity and yesterdays weather approach doesn’t even start to do this. The team are immediately satisficing.

Both these techniques have been advocated and used but what isn’t pointed out very often is that they are alternatives. If you use them together you definitely are satisficing to meet a Goal. Using commitment to stretch the team goes out the window. Unless of course you set stretch goals, in which case you are ignoring velocity.

Things to do to improve code quality

As I mentioned a couple of posts ago, I was recently out in Oslo teaching a course on Lean software development. One of the points I make is: Quality is free (or at least cheaper) provided you invest in improving quality.

This section of the course included an exercise were I ask the participants to think of things they could do to improve code quality. On this occasion the exercise went particularly well and resulted in the list in the picture below:

Lets run through these one by one – not necessarily in the order on the sheet:

Test Driven Development: if there is one practice above all others which contributes to better code quality and fewer bugs it is TDD. On the plus side it can be used on any type of project, Agile, Waterfall or other. Its roots go back a long way but it was a forgotten practice until XP resurrected it. When run as part of a continuous integration cycle with frequent automated builds and tests the practice is Unit Testing on steroids.

However it doesn’t just happen by mandating it so. Most developers don’t know how to do it, they need training and help (coaching) to do it. Even then it is going to be a learning experience, don’t expect it to become prevalent overnight.

(And before you say “But we have a legacy system with 1 million lines of code so it won’t work for us” please read my implications of the power law.)

Acceptance Test Driven Development (ATDD) is the next level up from unit test based TDD. Here those making the requests for development not only specify their acceptance criteria but do so before any development happens, and do so in a way that they can be automatically executed. In many cases professional Testers need to work with the “Customers” to create such tests.

Continuos Integration (CI): This is a valuable practices on its own – making sure code builds and new code doesn’t break anything that already exists. When coupled with TDD and ATDD to created automated, repeatable, test suites, it is an order of magnitude more valuable.

Pair Programming: The controversy over pair programming seems to have died down, but then so too have examples of people actually doing it. A shame really. It is instant code review, it is two-heads better than one (think of commercial pilots or surgical teams). It also allows developers to focus intensely on the work in hand – few distractions from telephone calls, e-mails, SMS, and all the other rubbish that distracts us so easily.

Code review: The next best thing to pair programming. If people won’t pair then at least code review. Put in place a light weight process which happens as soon after the code is written as possible. The big formal process so many of us learnt about in school aren’t practical – only NASA can really afford them anyway. Instead use a lightweight process, you will get 80% of the benefit for 20% of the cost.

Static analysis tools: in the past static analysis tools have gotten a bad name for themselves. The current generation are a lot better and while they are not a true substitute for a code review (because in a code review both reviewer and reviewee learn) they are very cheap to use. Sure you might have to buy a license but once you’ve done that and set them up in the build system they run every time code is checked in and can highlight potential issues very quickly.

Coding standards: Traditionally I’m not a fan of coding standards. In my experience too many teams waste to much time debating and arguing over coding standards and when they are put in place they can be used as a tool for some developers to bully others. However, if you can overcome those problems then they have a valuable contribution to make.

Start by having a group discussion – face-to-face, not over e-mail or on a mailing list – about what could be in a coding standard. Find the areas of agreement and have three or four categories: a very few items as “mandatory”, more items as “recommended” and more as “candidates.” This third group are possible candidates for inclusion in recommended or mandatory but need some consideration. The fourth group for things you agree not to standardise on.

Then review these guidelines every three or four months. Promote some from candidate to recommended, and from recommended to mandatory, and if some aren’t working then remove them or demote them. (This recommendation is broadly in line with Les Hatton’s suggestions in Safer C.)

Then, don’t use coding standards as part of your review. Developers should follow them out of honour. But just in case you miss one, automate them. Set your static analysis tools up to run your coding standards against code which is checked in. Remove the human from the loop and remove the bullying.

Automate: In case it hasn’t sunk in yet, most of the suggestions so far can be automated, and should be automated. Not automating them means they take time to do and are therefore expensive in the long run. Automation might cost in the short run but it makes things cheaper overall.

Refactoring (& refactoring tools): The whole point of refactoring is to improve the code quality and, more importantly, the overall design. If it isn’t then something is wrong. You can, and people do, refactor without automated unit tests but this is equivalent to a high-wire act without a safety net. With the safety net in place refactoring should be a frequent activity and one which doesn’t take up lots of time.

As an old C++ hand I’m always impressed by the refactoring tools available to Java and C# developers. These should lead to more frequent, quicker and safer refactorings.

Hopefully it is immediately obvious how the above can lead to better code. Some of the other items on the list aren’t so obvious but I think they are worth including.

Show and Tell (early): maybe not immediately obvious why this one should lead to better code but it will. By regularly showing potential customers of the software what they are getting developers need to keep their code near to release state. This forces more, smaller, steps in development.

The second reason why this helps is that feedback comes more regularly. This provides positive guidance on what is going right and will point out when things are going in the wrong direction.

Finally, if developers are scared to show their work in progress to users and customers then its time to run up the red flags and look for trouble.

User Tests extends this reasoning. User tests provide another line of testing which helps detect problems early.

Similarly, working on smaller pieces/projects provides for more small steps. Before each step there is an opportunity for re-adjustment and course correction both at the work planning level and at the code level.

Finally, Team cohesion is important because without it the team are running in different directions and doing different things with the code. Part of team cohesion must be a shared view on the development objectives, the design ideas in the code and what makes for good code.

This isn’t an exhaustive list, just the ones my students in Oslo came up with; if you have any more suggestions please add a comment, thanks.