System failure
In this episode, we explore what happens when one part of a system fails and brings everything else down with it
A few weeks back, in Service with a smile, I wrote about what happens when someone designed a system well, and had planned properly for the times when something important stopped working. This time, we’ll find out what happens when someone doesn’t plan properly about such things…
My friend is in Canada, and he needs to get to Oregon for business. It’s a bit of a messy route - three flights and the related stopovers - but he should be able to get there well in time for a week’s-worth of business meetings.
But when he gets to the US border at Montreal, things start to go horribly wrong - not just for him, but for hundreds of other people…
Where it starts… well, we’ll have to use a bit of conjecture, but the best guess is that some bright spark in US immigration or some other agency had a really clever idea. (Yeah, one of those ‘clever ideas’…). And it centres around the way they handle checked-baggage - you know, these things:
In the old way, they’d push the bags through various scanners, and pull them aside if they saw anything odd. But the catch was that they then had to find the owner of that bag - who could be absolutely anywhere in the airport.
What they did know, though, was that at some point the owner of the bag would have to pass through the other immigration-check, and get their carry-on baggage checked too.
So up came the clever idea: why not combine all of these things into one great big glorious all-encompassing automated system? Scan the check-in bags, and send the result to the carry-on check, so that they would then have the person present when they combined all of the checks. Okay, the only tricky bit would be that the passenger would have to sit in a separate waiting-area until their checked-baggage had been cleared, and then be told via a screen that they could move to the next line, but it’d all be automated, and they shouldn’t wait any time at all, really. Easy!
No prizes for guessing what happened next…
Yep, the automation failed. Completely. It just didn’t work.
So apparently the baggage crews had to do something like take photographs of the scanner-screens, and then copy them manually into another system that eventually went over to the line for the carry-on check. Which now also didn’t work, because the signals from that system now didn’t match up with those that this part of the system had expected.
The whole process was supposed to take seconds; it actually took hours. Literally. Hundreds of people standing around for hour after hour in the waiting-room, watching with increasing desperation for their name to appear on a screen, while the screen also showed the time of their flight getting closer and closer. The seemingly-lucky ones saw their name appear at last - and went on to another slow, slow, slow moving queue that was hundreds of people long.
A few people did eventually get through the whole mess just in time to catch their flights.
Most didn’t. Hundreds and hundreds and hundreds of people.
And if they couldn’t catch their flight, they were then told that they were now illegally in the checking-area - because they didn’t have a flight any more - and had to leave at once.
At which point, they then met up with the next system-failure. When they’d gone through the first stage of passport-checks, they’d technically left Canada - so even though the airport itself was still in Canada, of course, they now had to immigrate back into Canada again. Which they couldn’t. because the automated immigration pre-screening system needed to know which flight they’d come in on - which of course they hadn’t, because they hadn’t actually left Canada yet, and the only flight they’d booked on was supposed to be going out of the airport, not coming in. So because they hadn’t actually left Canada, the automated immigration screening system couldn’t make sense of what was going on, and hence needed a work-around otherwise wouldn’t let them come back in, and then wouldn’t allow them until they’d retrieved their baggage from US customs - which was still broken, of course. Hence another multi-hour queue to sort out the mess.
If the airline can’t get people onto the flight, they’re supposed to provide an alternative route, and provide a hotel if necessary. Yet by the time my friend at last reached the airline-desk, several hours later, yes, they did manage to sort out a route for him, sort-of - a string of disjointed flights with long stop-overs - but they’d run out of vouchers for hotels. They would get him a hotel, they said, but he’d have to wait for a while until they’d got permission to open a new batch of vouchers.
Another three hours’ wait.
He did get to the hotel. Eventually. And he did have just exactly one hour’s rest there before he had to go back to the airport again for his replacement flight.
Yes, he did at last arrive at Oregon. Almost two days late. And even though he was utterly exhausted by then, he still had to go to work straight away. He’d already missed four of his meetings. And no-one was happy at all.
The moral of this story, I suppose?
Well, the first point would be that, if we’re working on creating change, we need to be very wary about the risks of building a system that depends on everything working perfectly all of the time.
Always work out what’s likely to happen if anything does go wrong, and design for ways around it, right from the start.
Always look for knock-on effects on other organisations’ systems - such as how to help people who are now stranded without the right kind of tickets can get back out to the start-point again, in this example. And work with those system-designers to change their systems so that they can cope when your systems fail.
And always, always, have a human-based back-up where trained people can take on the load and help sort out the mess, as we saw did work in the ‘Service with a smile’ example.
That’d be a start, I guess.
Better than that mess at Montreal, anyway.
(Many thanks to my friend N. for his story - though I’ll admit I’m relying on memory, and may not have all of the details exactly right. For obvious reasons, I’d best not give his full name...)