Thursday, November 6, 2008

Role play modelling

Virtually every software development methodology stresses a need for specifying how the software end product is expected to work before starting the programming. But the truth is that writing specifications is boring. And most programmers, being lazy people (in good meaning of "lazy". Yes, I believe there is a good meaning) try to avoid boring work. So, when the specs are written, they are often incomplete and don't cover all the possible scenarios. Moreover, as the requirements change, these specs become outdated and, hence, even less useful. I wonder if there is a way to replace the boring specs with something which is fun. One possible way to go may be a "Role play-style modelling". Just think about it. Usually software product consists of a number of modules communicating to each other. Each module does a specific task. In a typical software project, different developers are assigned to implement different modules. Now imaging that before diving into coding, you have a kind of a role-playing game, where each of the programmers is actually pretending to be a module he is going to program. It goes like that:
(Dave is a module that reads an XML file and converts it into a spreadsheet, Bob is a module that is responsible for taking the spreadsheet and sending it to users as an e-mail attachment) Dave: Trying to open the XML file. Wait a minute, what if it's not a well-formed XML, what am I supposed to do then? I should probably send an error message back. Ok, I read it and successfully transformed. Now I notify Bob that it's ready. Bob, we need to decide on this notification protocol. Now, it's your turn. Bob: Ok, first I need to create an e-mail. Where do we get the e-mail address and subject line from – should we ask the user? This is something to find out. Then I need to check that the e-mail address is well-formed and the subject line is not empty. Then I read the spreadsheet file Dave provided and attach it. Dave, how big this file can be? Dave: I reckon it's up to 1 Mb, but I need to clarify it. Bob: I need to find out if I need to compress it. Then I try to send the e-mail. If it is sent successfully, I report Ok status back to Dave. If it's failed, I report failure status with the error message. ... And so on.
That would give all the participants the understanding of how the end product should work. And, assuming the programmers are taking notes as they do it, every one will end up with a mini-spec just for themselves. Another advantage of such approach is that it would encourage people to discuss every possible situation and ask "what if" questions. And asking them is the most important part of the whole process. While there is at least one outstanding question, the spec is incomplete. So, after our game, Bob and Dave would depart on harassing whoever appropriate, like business analysts or even end users, to extract the answers from them. And once they have the answers, we would repeat the game again. And would do it over and over again, until everything unknown is ruled out. It sounds like a odd approach, but it just might be odd enough to work.

Thursday, October 30, 2008

The recruitment landscape is changing...again

A few days ago I went to a presentation by an Oracle recruiter. He talked about the ways Oracle searches for the potential candidates and and hires them. One particular thing that struck me was that Australian Oracle recruitment team does not use specialised recruitment web sites. About a decade ago, looking for a job was all about reading newspaper ads. Then along came job web sites, and the recruitment landscape changed forever. I am myself of a younger generation, and looking for a job for me was never associated with newspapers. When I look for a job, I go to a web sites such as Seek or and search there. I take as given that all the jobs I may be interested in are posted there, either directly of through recruitment agents. This approach is more convenient for both job advertisers and seekers, because it allows them to find each other more easily. And most of the activity is concentrated around a few dominant players such as aforementioned Seek and Monster. This is because of so-called network effect: such web sites are created to connect people to each other, in our case job seekers and advertisers, and because the number of possible connections is proportional to the square of participants, the probability of "getting connected" through a large job web site is much higher than through a small one. Thus, size does matter. Indeed, if I want to find a job, the first thing I do, like Willie Sutton robbing banks because "Because that's where the money is", is look at the largest local job site, because this is where the jobs are. But this is changing. As I already said, I was surprised to hear that Australian Oracle recruiters do not use job web sites. Instead, they exploit a relatively new phenomenon of Internet social network services. Social networks benefit from network effect at the scale of magnitude comparing to the traditional job sites. This is because the network size of job sites depend on the number of job advertisers paying for placing their ads. In other words, id depends on supply of ads. And if supply side does not increase, the network will not grow. On the other hand, the growth rate of a social network is not limited by either supply or demand. As long as new members keep subscribing, network will grow. Even if they stop, the existing members will still be able to establish new connections until all the members are connected to each other. And this number, as I said before, is a square of the number of members. A good example is LinkedIn, which is effectively a tool for building a professional social network. It allows you to easily find your co-workers, provided they are registered with the site, as well as connect with potential employers. So, what is there for employers? The benefit for them is that they are able to build their network, even if they are not actively hiring. And after that, when they need it, they will be able to instantly find the right candidates through their connections. It is like having a job seekers bank at their disposals, almost like a database of resumes. But this bank will update and grow by itself. This leads us to an interesting conclusion: within this new paradigm if you're not online, you cannot be found. And if you cannot be found, you can't be hired. Just because you are outside of the network. All in all, we are witnessing a birth of a new recruitment methods. Who knows where it is going to lead us. Maybe in 5 years it will be mandatory for job seekers to have a Second Life avatar and all the interviews will be conducted in virtual reality. Australian Oracle recruitment team has a blog, they also follow LinkedIn, Twitter and MySpace networks.

Tuesday, October 14, 2008

The Issue of Risk

Once during one of the projects I was asked what was the risk of something going wrong. I remember I answered that we, programmers, do not assess risks. We just do everything possible to prevent them. “There is no point”, I said, “to calculate the chances and the impact of a file not being in the incoming directory upon the start-up of the process if we can implement a simple check and avoid the disaster”. Needless to say, I was wrong. Nowadays I work in a bank. And one of the things I learned about banks, they all have departments responsible for managing risks. A bank in its business plays a risky game - it lends money. It is risky because there's always a possibility that it will not get them back. And banks learnt to manage these risks. They ask: "What can go wrong? What can we do to prevent that? And what can we do to minimize the impact if we won't manage to prevent it?" Software industry is much much younger than banking. But we are proud of ourselves, because we use cutting-edge technology and most modern methodologies. Yet, we haven't learnt some basic principles other industries employ for decades. We don't manage risks. I've seen a few projects screw-ups, all for various reasons: a technology chosen wasn't robust enough to support the solution, the staff's skills were not strong enough to implement the solutions, requirements were unrealistic... The cause may be different, but the result is always the same - failure, or infinite slippage of the deadline, which is the same as a failure. And in every case the failure seemed to have been unexpected. It puzzled me - why could nobody predict it? And it happens over and over again. I am a faithful reader of The Daily WTF blog. Every day it publishes failure stories like the ones I witnessed. So, if we are so smart, why does it keep happening over and over again? Because in a software project nobody asks what can go wrong. Not all risks are the same. Some risks can be assessed and eliminated. Such as a risk that the existing or planned hardware will not be able to manage the workload. This kind of risk is calculable. We can do a stress test, measure the execution time, do basic calculations and determine not only that we need a new server with more powerful CPU, but how exactly powerful it needs to be. Others are harder to predict. How would you assess a risk of your project manager being a crook and ruining a project? And, finally, there's a mother of all risks – a highly improbable event, the Black Swan, as Nassim Nicholas Taleb calls it in his book "The Black Swan: The Impact of the Highly Improbable". (I would highly recommend it to everyone who wants to learn more about risks) What, for example, is going to happen to your project if all the programmers decide to quit at the same time? Once I worked on a project that employed so called custom rules engines for performing complex batch calculations on a huge dataset - about a hundred million rows. We had a few thousand rules defined by business people, and in order to get a result we needed to apply every rule to the source dataset. As you can imagine, it was slow. Scanning a hundred million rows is not fast, especially when you need to do it a thousand times. So, a strategic decision was made to speed it up. One of our local geniuses came up with a brilliant, as he thought, idea: instead of applying a thousand rules hundred thousand times each, we will reformat the existing rules so that they form one monstrous rule, containing all thousand, which can be applied to the source dataset in one pass. And so we did. One thing, though, we didn't take into account: when combined all together, the rules formed a frankensteinian SQL expression more than 1000000 character long (that's right, more than a million). Yes, it worked, and the process became reasonably fast. But what was the price paid? Imagine that you are a support programmer. You are quite unhappy with your life, and your job in particular, because you're fed up with those stupid users and their problems. And on one evening, just as you were going to go home, you get a call. Apparently, the rules engine failed. You open the log and what do you see? A million characters long SQL and at an error message the end, saying that an error happened somewhere in there. Yep, somewhere in those million chars. You see, that's the way Oracle works. It tells you that an error has happened and tells which one (let's say, division by zero), but doesn't tell where exactly within the given SQL it has happened. So, what are you going to do, step by step? Don't know about you, but I would start writing a resignation letter. And I wasn't the only one confused. No one I asked could answer that question. You may ask, what does it have to do with risk? There is a direct connection. We had certain risk that the system being built couldn't handle the workload. The amount of that risk was known. We could calculate how much time it took to process the data set, extrapolate it to the given requirements and find out that it took 10 times more time than was acceptable. We could resolve it by buying 10 times more powerful server, or cluster of less powerful servers. In any case, the risk and the cost associated with eliminating those risks were calculable. After a strategic decision was made to solve the performance problem by much more complex rules engine, the risk became incalculable. Yes, they prevented the performance problem, but introduced another and much more serious: in a case of a single failure the system was going to go down for undefined time. Because no one would be able to fix it. And the undefined downtime would mean an unlimited loss for the business, because in a business world time is always money. So, what they did is substituted an imminent, but assessable risk, that, in fact, could be prevented, with a less probable, but potentially much more severe which that could not be assessed. I don't believe that solving a minor problem by introducing a much severe one, and hiding a possibility of a failure under the rug is a proper risk management. I'd rather called it a suicide.

Popular Posts