Friday, May 20, 2011

Curious Case #1: The Power Problem

The first real job that came along for Columbia Interlocking Services was at a small software company called Forest Park that had several racks of servers and one rack kept having servers burn out the motherboard. The service provider would come and replace the motherboard and then later it would happen again, sometimes on another server. It was just they kind of mystery we loved.

Charlie was coming onsite for our regular Tuesday morning consulting anyway so it was convenient to take the afternoon off to handle this new challenge. Usually we go straight to lunch and then go the customer after but this was a cold call on a hot issue and we decided we better go right away.

The traffic was light but it still took 30 minutes to complete the 4 mile trip door to door and that was enough time for us to argue over the music, best route, window position and what was for lunch. Charlie said he thinks my car has a spark plug that is misfiring but it all seems good to me.

We get to the place and find that two of the Nasties from Northwest have already been here and replaced all the power components in the server room: power strip, cables, and even had an electrician onsite to change the circuit breaker just to eliminate suspects. Then last night another server gets smoked and now this morning their CEO is ready to fire somebody.

We met their local IT leader named Harish which I made the mistake of pronouncing with two syllables when he rudely interrupts to correct me by blurting out; "Harsh!". At first I thought he was describing his personality but then I realized that was how he said his name. In spite of my recent training in international cultural sensitivity I agreed to use his pronunciation. We have a policy not to argue with clients.

First thing Charlie does is stand in back of the problem rack and check each cable starting at the bottom making sure each was plugged in all the way even though he knew they had just replaced them all. About half way up he stops and says "Here it is" in a matter of fact voice. He is holding up a network cable to everyone’s amazement when we were so sure the problem was power related, no one had looked at the network.

It turns out there had once been remote network device in this rack that required power over the Ethernet cable. This unusual setup originally required a red cable with a yellow label showing it was not a standard link. Somewhere over the years the cable was removed and eventually someone plugged a regular network cable into the powered injected jack. Amazingly the server would work for a while before failing so no one thought to blame the network cable until Charlie said he could feel the power in the cable.

Harish didn't believe it until we proved it with a power meter. It was only 5 Volts of Direct Current but enough to finally kill the machine. In hindsight it is easy to blame the service provider for replacing the motherboard without providing the root cause but these motherboards have the network adapter built-in which makes it more vulnerable and harder to debug compared to servers with a separate network adapter card.

Later on we were at the Flower Blossom restaurant filling our plates from the All You Can Eat buffet while pretending we knew how to use chopsticks. I had a better grip than Charlie but we both ended up with forks to finish our plates. That's when Charlie told me the rest of the story about how he fell off the power line pole after accidentally connecting himself to the wrong two wires. It smoked both his gloves then burned all the hair off the top of his head and it never did grow back. But it did leave him with the uncanny ability to feel electricity.

No comments:

Post a Comment