Measure Agile Metrics that Actually Work

I think the Agile community needs to change how it measures success for Agile teams. The ways that we gather metrics and the information we seek out of those metrics is actually getting in the way of what’s most important, making working software.

The way I see it, there’s two major problems:

The observer effect: The observer effect states that observing a process can impact its output. For instance, telling a team that you’ll be keeping a close eye on their velocity might cause that team to overestimate their items in order to increase their velocity. This is especially dangerous when working with story points since there’s no way to compare the validity of an estimate. Should your Scrum team stop using story points?

observer-effect-agile

Embed this image

The streetlight effect: The streetlight effect is our human tendency to look for answers where it’s easy to look rather than where the actual information is. For instance, counting the lines of code produced is easy but doesn’t tell us anything about the quality of the application, the functionality it provides or even the effectiveness.

streetlight effect agile

Embed this image

So let’s apply these concepts to some common Metrics. What’s easy to measure?

Unit Tests written: Most agile developers write a lot of unit tests, test driven development creates even more tests(both of which create better quality code). So measuring a developer’s productivity by the number of tests they create must be good!

Actually, the observer effect kills this one dead. Telling a developer that they’ll be measured on the number of tests they write ensures they’ll create many tests with no respect to the quality of those tests. Our goal is not to ship tests, our goal is to ship working code, I’ll take fewer better tests than more crappy tests any day.

Individual Velocity: Once again the observer effect makes this a bad metric. If a developer knows he’s being individually graded on his performance and also knows that he only gets credit for the things he specifically works on then he’s actively discouraged from contributing to the group. He’s placed in the very un-agile situation of competing with his team rather than contributing to it.

In a perfect world an Agile team is collaborating, interacting, discussing and reviewing almost everything they do. This is a good thing for building quality software and solving problems fast but this level of interaction makes it nigh impossible to separate a person’s individual productivity from the group, so don’t try, you’ll simply hurt your team’s ability to make good software.

Team Velocity: This is one of the most misunderstood metrics in all of Scrum. A team’s velocity is unique to them. It simply can’t be compared to another team. Let’s say that team A estimates a certain amount of work at 50 pts for a sprint and team B estimates that same work at 150 pts for the same sprint. Now if both teams finish their sprint successfully then team A has a velocity of 50 pts and team B has a velocity of 150 pts. Which team is more productive? Neither. They both did the same amount of work.

This metric is particularly evil because it encourages teams to fudge the numbers on their estimates which can affect the team’s ability to plan their next sprint. If the team can’t properly plan a sprint then that puts your entire release in danger of shipping late.

For more about your Scrum team’s velocity, you can check out an earlier blog post I wrote.

Okay smart guy, what metrics should we use?

Glad you asked, we measure productivity by the working software we deliver. We measure actual output rather than contributing factors. This approach is more Agile because it frees the team to build software in whatever way can better contribute to their success rather than whatever way creates better metric scores. It’s also much more logical since working software is something that we can literally take to the bank (after it’s been sold of course).

So what are the actual new metrics?

Value Delivered: You’ll need your product owner for this. Ask him to give each user story a value that represents its impact to his stakeholders. You can enumerate this with an actual dollar amount or some arbitrary number of some kind. At the end of each sprint you’ll have a number that can tell you how much value you’ve delivered to your customers through the eyes of the product owner.

This metric does not measure performance, instead it measures impact. Ideally your product owner will prioritize higher value items towards the top of the backlog and thus each sprint will deliver the maximum value possible. If you’re working on a finite project with a definite end in sight, your sprints will start out very high value and gradually trend towards delivering less and less value as you get deeper into the backlog. At some point, the cost of development will eclipse the potential value of running another sprint, that’s typically a good time for the team to switch to a new product.

On Time Delivery: People sometimes tell me that Agile adoption failed at their company because they couldn’t give definite delivery dates to their clients. I don’t buy this. One thing that an Agile team should definitely be able to do is deliver software by a certain date. It’s possible that a few stories may not be implemented but those are typically the lowest value stories that would have the least amount of impact on the client. That being said, a team’s velocity should be reasonably steady, if it goes up or down it should do so gradually. Wild swings in velocity from sprint to sprint make long term planning harder to do.

Here’s the metric: if a team forecasts 5 stories for an upcoming sprint and they deliver 5 stories then they earn 2 points toward this metric. If they deliver 4 stories or they deliver less than 2 days early (pick your own number here) then they earn one point. If they deliver more than 2 days early or they only deliver 3 (out of 5) stories they earn no points. At the end of a quarter or the end of a release or the end of the year the team will be judged by how accurately they can forecast their sprints.

So what we’re measuring is value delivered to the customer and on time delivery of that software. Which are the only two real metrics you can literally cash checks with.

  • Nicholas

    Great examples and cartoons in article about measuring the correct agile metrics. Well done Axosoft, keep up the great work!

    • SeanMcHugh

      Thanks Nicholas!

  • Gary McKay

    Awesome. Any ideas of how this can be added to Ontime.

    • SeanMcHugh

      Short of writing a custom report (OnPremises installations can do this) I don’t think there’s any specific dashboard widgets or tools that you could use for these short of creating a custom field and exporting to a spreadsheet for the math.

      If you want to see something in the product let us know: http://community.axosoft.com/iframe.aspx?Tab=Features

  • Andy Dent

    Thanks, especially for the succinct explanation of how these metrics hurt (reminds me of reading Weinberg twenty years ago). Unfortunately, I can see at least a couple of ways the “value delivered” metric also starts to generate problems. In no particular order of significance….

    Firstly, unless you have a very savvy customer, how are you going to convince them to assign high value to stories that are primarily about delivering engineering improvements to the code? This metric works against refactoring.
    Secondly, if you’re working in sprints, you need a mix of sizes of stories to be tackled (“gravel” to fill in the holes). A team could get more “value delivered” points by just delivering two big, high-value stories, then sitting doing nothing for a day, compared to doing one big story and a bunch of little ones.

    Thirdly, I am not sure if it works with epics where a bunch of stories are needed to deliver a coherent feature. We need either a scheduling mandate in addition or a booster algorithm to make sure all the essential stories in an epic are done (acknowledging that sometimes we trim our epics and good story partitioning is about making this option feasible).

    So, like all purely metric-driven approaches, this too can fail. The number-one failure indicator for an agile team is how much they are measured by their metrics without managers having the ability to intervene.

    I do think you’ve raised the importance of value numbers on stories and it would be great to see “value planning poker” being held with stakeholders.

    • SeanMcHugh

      I agree, metrics are best used as a tool for a manager with a common sense approach rather than a replacement.

      That being said, I think the “Value Delivered” metric is still worthwhile but truthfully, not one that I’ve seen implemented as of yet (it’d be interesting to see what changes with the teams that adopt it and how it affects them).

      I agree, unless you have a very savvy Product Owner it does seem like there will be a tendency to implement higher functionality and less refactoring but hopefully the team is refactoring as they go (http://artofsoftwarereuse.com/2010/04/10/continuous-refactoring-is-a-wise-investment/) and ideally in a perfect world the team and the product owner are working together closely enough the team can make a case that refactoring or redesign IS high value because it will increase the speed of feature development in the future (think of it as an investment). This might be a case of me being spoiled by our Product Owner here at Axosoft being a developer and project manager himself but I don’t think it’s too far a stretch. This might be a case of “No Plan Survives Contact with the Enemy”

      I don’t necessarily agree with the second point. If two large stories provide more value than 1 large story plus several smaller stories then you should work on the two larger stories. That doesn’t mean that the team sits on their thumb once those two large stories are done though, in fact it means that those two large stories should be forecasted for the sprint plus a smaller amount of “gravel”. Assuming a sprint capacity of 120 days worth of work (not a 120 day sprint but a 30 day sprint with 4 developers let’s say). If the highest value story is 70 days worth of work you take it, if the next highest value story is 40 days, you take it as well. If the next item is 11 days, you skip it (you only have 10 days of capacity left to work with so 11 is out) and move on to the next one.

      On the third point, I have to say that I distrust algorithms from a project management perspective. They tend to make things less transparent and they offer the illusion of not being biased. For completion of themes or stories that speak to some larger functionality, each story in that theme you would simply prioritize and value each story in the theme as higher than the functionality you’d rather wait on.

      Once again, value delivered is not a metric that I’ve seen implemented yet so I’m curious to see A) if it does get adopted by the Agile community and B) how it changes once out in the wild.

  • hubter tech

    I personally think your article is amazingly well-written. I concur with much of the information you have in

    this article and will read it over again to consider some others. I really like your style.

    Visit Here My Site For More Info

  • http://twitter.com/ytechie Jason Young

    Excellent post. I actually googled “scrum value points”, because I plan on starting to assign values to each card, and measure value delivered.

    Andy made the comment about refactoring. You were dead on in your response. If it’s something that will increase productivity or improve the product, then do it, because it should increase the value points delivered!

    Once I realized cost had nothing to do with value delivered, it was liberating. Cost is bad, value is good. Why promote your team to feel good when more cost is incurred?

    Also, our plan is to use this for prioritization. My thought is to take value and divide by cost. The result can be sorted descending, and there is your optimal order to work on them (there needs to be some judgement of course).

  • Vin Sam

    I like this article but I think ‘PO value points’ as a metric is being used only because story points are not being used correctly. That is an underlying issue which one needs to focus on rather than coming up with a new metric that is given out from the other end of the same process. Because Agile when done correctly would translate to Story Point Estimates given by Dev = Value Points for PO. If value points are meant to indicate relative prioritization within the Sprint Backlog then its a different dimension in itself and not the right metric by itself.

    I completely agree with the Streetlight effect and observer effect explained here. But Observer effect when applied to any metric even Value Points can be mangled by the estimator. Going back to Story Points, if the team starts blowing up estimates to increase velocity, one should get a gauge of this using 2 things 1. Subtasks into which the story is broken down and their individual estimates and 2. daily scrums…. Because if the team is blowing up estimates to increase velocity a good PO can easily catch this. The team will either run out of work or will obviously be slacking along. Infact if the Observer effect is applied over a long duration of time without any change, the team will start estimating based on its new blown up numbers and they will become the new norm – averaging it all out. So the observer should rather use the series after that observation announcement than the before afters.

    What I’m getting to is that while the suggested metric seems like a good idea it’s essentially a different way of doing the same thing. It doesn’t actually solve any problem.
    whats the difference?
    Dev says: “Oh yeah! I delivered 30 story points!”
    PO says: “Heck no! you delivered 5 value points!”
    Stats can be bodged either way, both by the apple seller and the orange seller. Can’t rely compare em can we :)

    So one should rather fix process and more importantly *people inefficiencies* than relying blindly on any of these metrics.

  • Vin Sam

    Addendum to previous post:

    …. I think a good way would be to use end to end metrics. Both Value Points and Story Points to truly know “How much effort equates to what value”…we’ll get a good gauge of productivity