What’s the real value in QA for cloud consumer services?
I got to thinking about this – again for the umpteenth time – during an informal lunch conversation with a group of developers and QA’ers discussing challenges in their jobs. This friend of mine, who is a very senior individual at his social network startup, was telling us a story about the problems that he has with a group of external developers who build their service. They tend to be a bit like a group of cowboys that shoot code from the hip without thinking through what they are doing, or with much regard for what effect this has on the production environment. I said to him that it sounded like a recipe for disaster and asked if this resulted in a lot of catastrophic failures like their site going down. Surprisingly he answered, not really.
Their situation is they don’t have a dedicated QA group or tester to run regression tests on builds that come out. This is done by someone on their management team who has an intimate knowledge of their front-end application, and he manages to catch the really bad problems before they make it to production. I asked if this was always the case, to which my friend said, not all the time but generally. He went on to relate to me that one of the good things about them being a cloud service was that they could patch problems on the fly if they had to.
I agreed, in quite a few of the places I worked in, patching-on-the-fly was a standard tactic for fixing post deployment problems. But it’s time consuming for all the individuals that are involved, which if it turns out to be a lengthy issue to fix means time off core development work. If you’re doing releases every week, then this can add up to a significant amount of hours, all of which has to be paid for, either in people having to do more hours to meet development schedules, or development schedules being pushed back. In a worst case scenario, it’s both. The end result, your service delivery starts slowing down.
I’ve mentioned nothing about the aggravation that your users experience, but this is not really as serious I don’t think. After all users are a bit like young kids, they are easily upset, but they are also easily placated. As long as you fix things in a hurry and don’t make them wait too long to be able to get back to doing what they were doing, they will forgive you. It will even give them something to twitter about. I survived when my social network service when down for two hours… I read a BOOK! Which can give you enough infamy to get your name out there if you’re new. To be fair though, there is a line-in-the-sand to a users acceptance of failure, it’s different for each service, but once you cross it, users will start to chant your name in angry denunciation. This is bad because it soon follows that they go somewhere else. Typically once you’ve burned a user badly enough, they are never coming back.
I don’t generally put user satisfaction into a QA benefit for the modern world anymore. Even the most technically minded, and thus socially unaware software developer, keenly understands that user satisfaction is everything. It’s the one true deciding factor in whether an online consumer service lives or dies. This doesn’t need to be explained, everyone just gets it. Time however is something else. While everyone in a development group understands that time is of critical importance, you’d be hard pressed to get agreement on what’s the best way to use it. There are hundreds of documented ways to structure a project to -1- organise teams, and -2- organise development activities to achieve milestones. But the goal is always one thing; to build an application that consumers can use – and someone can make money from – in the shortest possible time.
For development, the real benefit in QA is in making things happen in “the shortest possible time”!
There is a time hit involved with getting a bunch of QA/test people involved on a project as they need time to make test cycles on builds. But if they are involved early enough, it means that they catch the bad problems before they become really bad problems. The worst kind of problems are those that take your site down. Once that happens, every manager and his team gets involved to work the issue until it’s resolved. If you’re lucky it’s a few minutes, if not, then the hours start stacking up.
While there are models for software development that are based around code going straight from development to production, it’s a theory I’ve never seen work in the real world. That’s not to say it can’t, it’s just to say I’ve never seen it happen. Eventually code going from development directly to production breaks production and chaos ensues. People are pulled off their core work to fix issues. The end result being lost time.
The goal then for any QA group is to structure itself in the most effective way to facilitate the rapid flow of work from development and turn it around through a series of test activities (cycles) so it can be deployed to production in the shortest possible time. I’ll be talking about this in more detail the posts to come.