When we first rolled out the Citrix Servers we had a few issues and there were a number of support calls that led to the creation of a generic category called Citrix Server Down. We then spent time and money to stabilize the farm and now the servers are very reliable with no outages for months and months, but it's too late, every problem that comes along gets thrown in the Citrix Server Down bucket.
Every morning the report comes in from our Bangalore office that the Citrix Server is Down Again. The twist is, their morning is our night so for me the calls come in between 8pm and 11pm with that same old complaint. These users are in various ODC's (Offshore Design Centers) which are basically branch offices off our branch office in India. These users have laptops and linux workstations that are maintained by a third party.
Here are the trouble report solutions for a typical series of tickets.
Day 1: Users can't even get to Google, obviously a network issue . Turns out all the local desktops were patched which broke their network software. The same thing happens on the Second Tuesday of each month with the latest Microsoft patch release.
Day 2: Some of the users cannot connect some of the time. I finally found a friendly user who admitted that it only happens when they use Firefox instead of Internet Explorer. It is true that Firefox is faster in most cases but since it would not connect it wasn't really faster.
Day 3: Half the users can't connect half the time because they are trying to use “the other network” that they claim is faster. Ping and traceroute show they are not taking the right route. Turns out they are trying to use the unsecure network which is faster but also a security violation.
Day 4: No issues...Holiday in India.
Day 5: Many users cannot connect and all users have slow performance. As soon as I logon to one of their machines to investigate there is a big honking error message about a virus outbreak. I end up helping the Local Admin build a cleanup USB stick to pass around to all users.
Day 6: All users on one server are running slow. The CPU is maxed out at 100% due to a wreckless user running file sharing software. Kill the process but not the user.
Day7: One server locked up with all users frozen. There is a runaway process sending the memory past the limit. Their backup software is running at the wrong time due the time being off. Corrected the time and killed the backup.
In every case it was not the server at fault but I still had to fix it to prove it was their fault and not mine. After too many reports they formed an Action Team to meet every day with special reports to all the big shots in spite of my protests. That finally led to a meeting where they got the ODC Local Admin on the line and he sheepishly admitted that putting calls in the server down queue was the only way to get anything fixed. I will yell sooner next time…
No comments:
Post a Comment