Friday, August 21, 2009

Bresnan Outage - Update

Over the course of the last week I had the opportunity to speak twice with Shawn Beqaj, Vice President of Public Affairs for Bresnan Communications, about last Thursday and Friday's outage that affected every Bresnan Internet and phone service customer. Every single one.

Mr. Beqaj is a very good P.R. person. He is accessible, gracious under pressure, exceedingly competent in the nature of his company's activities, and sufficiently vague when he feels he needs to be.

We spoke first about the stated root cause of the outage last week; the corruption of DNS routing tables on one of their servers that in turn sent that corrupt table to all other routing servers, resulting in what is called a broadcast storm that crippled the network.

Mr. Beqaj stated that this problem was not the result of any hacker activity.

I then asked about the relationship between Internet routing and the provision of phone service, and my concerns that the Internet was not as stable and secure as more traditional methods of telephone system connectivity.

Mr. Beqaj replied that Bresnan uses what is called a soft switch to provide dial tone and other services to its telephone customers. These switches are the bridges between what is the essential core elements of legacy circuit switched telephone networks, which use dedicated circuits for each connection made on the network, and packet switched networks, which leverage multi-modal transmission capabilities to send call traffic, along with data and other media, as packets of data that are assembled and disassembled at the server level, before being sent to their respective destinations.

Boy, was that a techie mouthful or what?

Having a basic understanding of these types of switches, I understood where Mr. Beqaj was coming from. He said that Bresnan holds itself up to any Incumbent Local Exchange Carrier, such as Qwest, with regard to the serviceability and reliability of the products and services they provide. Mr. Beqaj also stated that it was "inarguable..that redundancies in place didn't operate as anticipated" last week.

With regard to the notification of 9-1-1 centers about outages, Mr. Beqaj said that different states have different notification requirements, and that he was unaware of the exact requirements or the varying nature from state to state. He did say that Colorado has the most stringent requirements of the states that they serve. These requirements are available here.
I did verify with the PUC that the outage was reported to them as required.

I also received several e-mails regarding the earlier post about this outage, at least two from people who seem to have a greater degree of knowledge about these systems than the average person. Like so many that are already out on the web at places like DSL Reports, these posts put forth ideas about allegedly reliable backups for DNS routing and 9-1-1 availability, none of which I can independently verify or confirm based on my own knowledge, so they're not appearing here.

Mr. Beqaj displayed little patience for some of these assertions, stating in an e-mail that "..
it isn’t productive to enter into a public debate on network architecture among laypeople". He also asserted:
"Bresnan’s engineering team are among the best in the business and what’s more, the more than $1.3 billion we have spent in the infrastructure of our Rocky Mountain footprint is proof positive that this person’s insinuations are not only wrong but offensive as well."
I've worked long enough around technology to know that the most expensive systems are not necessarily the most reliable, especially if they were built for any other purpose than to optimize service to the end user. I've seen expensive trunked radio systems go up to fulfill the ego of a system administrator and the sales quota of an account executive, only to see the system scrapped well short of its intended life because it couldn't serve the needs of the public safety professionals on the other end.

This isn't to say that the Bresnan network is a boondoggle; on the contrary, the company is normally very professional and reliable, with products and services that are responsive to the customer's needs and/or desires, to the extent that applicable law and competent business practices will allow.

I do think that I pay too much, and that services should be provided with more choice available to the subscriber, For example, please deliver me from HSN, QVC, and the Jewelry Channel. I'll take Free Speech TV instead.

I'm expecting that service delivery and reliability will improve significantly in the wake of this incident, but to quote Mr. Beqaj, "the smartest, best system engineers never speak in absolutes". I'm hoping that makes them Jedi Knights, while the bean counters, who know that 2+2 = 4, are the Sith Lords.

Nevertheless, I'm making sure that the Colorado PUC is informed of my experience with this outage, and what I would like to see happen in the future. This includes:

  • Complete disclosure of the nature and cause of the failure, the corrective actions taken to address it, and what preventative measures that are planned or underway.
  • The nature of contingency plans to allow for the protection of critical infrastructure and provision of 9-1-1 service, as well as telephone access to customer service and network operations personnel.
Thanks to Shawn Beqaj for his accessibility. Have a good weekend ahead.

No comments: