Website and Res System Down

Dont call me Shirley · Jun 10, 2011

redheadtempe33 said:
IT is technically located in Phoenix not Tempe, but it is all one endless parade of dust, tract homes, and pavement out here, so all the cities and suburbs tend to run together and look the same. 🙄

I have no idea about back up generators.

Not sure where the SHARES mainframe is located. I would guess the QIK overlay run though Tempe?

SS255 · Jun 10, 2011

Yikes. I was just chatting up an old classmate at our college reunion dinner tonight. She flew US from DCA this afternoon. She must have gotten out of DCA just in the nick of time.

freedom · Jun 10, 2011

hopefull this is nothing more than a power outage and not a cyber attack ... there have been scores of cyber attacks within the last few months on major corporations ....

LD3 · Jun 10, 2011

Local news in CLT said US spokesperson said there was a fire in one of their facilites in AZ..

ChockJockey · Jun 10, 2011

freedom said:
hopefull this is nothing more than a power outage and not a cyber attack ... there have been scores of cyber attacks within the last few months on major corporations ....

Good point freedom...how do we know it wasn't a conspiracy between the Federal Reserve and illegal aliens??

Dont call me Shirley · Jun 11, 2011

As much as I hate to quote anything from anet, there is always room for an exeption. From a poster:
There is no valid excuse whatsoever for a single datacenter failure to affect a passenger carrying operation. The CIO would be handing me his resignation, or at the very least their VP of Datacenter Ops would be. There should be as many 9s as possible at least in any operational processes. They could allow some things like email to fail, sure, but the phones, revenue systems, gates, ramp, all of that stuff have to be as close to absolute as can be achieved.

If I ever had something like this happen, I would fire myself immediately.

A good point indeed!

WorldTraveler · Jun 11, 2011

a valid comparison is what type of experience have US' airline peers had with system reliability over the past several years...
I don't know but I don't think AA, CO, DL, or UA have had systems failures of this magnitude... but I'm not sure.
.
also, while the failure was widespread, it did not appear to last a long period of time... anyone know how long everything was really down?
.
other carriers have had multi-hour failures of separate/component systems....

SparrowHawk · Jun 11, 2011

WorldTraveler said:
a valid comparison is what type of experience have US' airline peers had with system reliability over the past several years...
I don't know but I don't think AA, CO, DL, or UA have had systems failures of this magnitude... but I'm not sure.
.
also, while the failure was widespread, it did not appear to last a long period of time... anyone know how long everything was really down?
.
other carriers have had multi-hour failures of separate/component systems....

The only one in recent memory was the Comair failure

In December 2004, a glitch developed in Comair’s flight crew scheduling software known as the SBS Legacy System. This forced the company to down all operations during the busy holiday season. 1,100 flights were cancelled and 30,000 passengers were grounded. During the disaster the company maintained that the winter storm that came to the Ohio Valley was main part of the problem and not their IT system. This storm caused Comair to cancel or delay more than 90 percent of their flights between December 22nd and 24th .The storm though was only part of their problem and not the single cause of it. On Christmas day the SBS legacy system, which was nearly two decades old, crashed. What no one at the company knew was that the reason the system crashed was that it had reached its limit. The system had an antiquated counter that logged schedule changes and by that day, it had logged more than its monthly limit of 32,768 changes. The weather caused so many schedule changes that the system had finally reached its limit and shut down. All the flights for December 25th were wiped out and most of those for the 26th. They had no backup system and their software vendor needed to take one full day to repair the system.

By the time the problem was resolved, the damage had already been done. Delta, which acquired Comair in 2000, lost almost all the profits earned by Comair in the previous quarter. They lost $20 million from the system failure.

If I recall one of the Senior Managers of Comair voluntarily tendered his resignation shortly thereafter. Luckily for Comair, US Airways had their legendary "Christmas Meltdown" at the same time which caught the medias attention more than the Comair event did.

When you consider that some of the systems in place have source code that dates back to the early 1960's. In fact Frank Lorenzo's bought Eastern Airlines in order to get what Eastern called "System One" which all of you know as SHARES. There are some US agents floating around the system who actually were trained on System One and can make SHARES purr like a kitten if you, as a customer get stuck. Two that I met are in BOS.

The system failure is yet another example of the high cost of cheap and the spreadsheet mentality of current management.

If you look at US IT from the Beery Era going forward you'll notice a few things.
SHARES was selected based solely on the estimated cost savings over SABRE. Does SHARES now perform as well as SABRE does? From where I sit the answer is NO! Has it improved since US's other IT debacle known as the Res Migration? Yeah it has, mainly because it had no place to go but up. Is it "World Class" or on par with other Star Partners, again I'd have to say NO.

When you run a bare bones IT operation, things like redundancy and disaster recovery take a back seat. US has likely saved many more millions more than this will cost. They say well.. Yeah it cost a few million but over the last 3 quarters we've saved $X by not doing the upgrades and even though this cost us $Y we still saved enough to meet our targets so we all get our bonus.

This is the way this current team operates and you see it at every turn. NO WHERE in their spreadsheet driven world is there room for the Customer and the fact that a great many people who were effected may never fly US Airways ever again. They don't care because they're focused on this quarter and not 2 years from now when Mr & Mrs Volvo and their 2.2 kids go to see Mickey. It doesn't mean they're evil, it's just who they are and like Oprah says, "When people show you who they are BELIEVE THEM"

Art at ISP · Jun 11, 2011

One would imagine the concept of backup servers and data centers is a foreign concept...another fine mark for the Crack IT team at US (or is that IT team on crack?)

jimntx · Jun 11, 2011

To be fair to IT, I would imagine that they, at some point, recommended system redundancy (aka backup systems). However, it was probably seen as an unnecessary cost. In the years I was in IT at Texaco our backup systems had backups in critical operations.

SparrowHawk · Jun 11, 2011

jimntx said:
To be fair to IT, I would imagine that they, at some point, recommended system redundancy (aka backup systems). However, it was probably seen as an unnecessary cost. In the years I was in IT at Texaco our backup systems had backups in critical operations.

Excellent point Jim. Backup & redundancy are "Costs" right up until the day AFTER a system failure and the finger pointing starts as the dollars start rolling out the door to address the problem.

Having sold into IT environments it has always amazed me is the wide range of Redundancy, Backup & security that companies have or don't have and the size of the company didn't seem to matter. Over the years I've been in some places that were so lax that I thought to myself, "If I pull that big power cable out of the wall I could likely shut down the company". While others had double and triple redundancy.

There is a company in the Midwest that does disaster recovery and their computer facility is underground in an abandoned Salt mine. Supposedly they can with stand a nuclear blast overhead of the site.

This stuff is tricky in that how much protection is enough and what does it cost?

ClueByFour · Jun 11, 2011

USFlyer said:
If all of those disparate systems are unavailable, my guess is it's a network problem, which may or may not be within US's control.

That assumption only holds up if US lacks proper redundant systems and they all failed. Could happen--unlikely if implemented properly.

SparrowHawk said:
This stuff is tricky in that how much protection is enough and what does it cost?

A properly sized UPS, generator set, and transfer switch plus the diesel for today's fun can be had for less than $250k.

IOW, if US is claiming a "brownout" did it, they either did not spend the money for backup power, did not test it adequately, or had a cascading failure across their power infrastructure. At which point, you can avoid the entire thing by having your important applications clustered or load balanced or replicated to a second data center with a different power feed. This is not rocket science for a Fortune 1000 company.

SparrowHawk · Jun 11, 2011

ClueByFour said:
This is not rocket science for a Fortune 1000 company.

Well that's fortunate as we know US employs no Rocket Scientists, real or metaphorically speaking.

john john · Jun 11, 2011

Will all those DOT stats during the problem time will not be recorded ???

nycbusdriver · Jun 11, 2011

freedom said:
hopefull this is nothing more than a power outage and not a cyber attack ... there have been scores of cyber attacks within the last few months on major corporations ....

It was Cleary.

Search

Search

Website and Res System Down

Dont call me Shirley

Veteran

SS255

Veteran

freedom

Veteran

LD3

Veteran

ChockJockey

Veteran

Dont call me Shirley

Veteran

WorldTraveler

Corn Field

SparrowHawk

Veteran

Art at ISP

Veteran

jimntx

Veteran

SparrowHawk

Veteran

ClueByFour

Veteran

SparrowHawk

Veteran

john john

Veteran

nycbusdriver

Veteran

Similar threads

Latest posts