Cenic.org

CalREN Service Restoration Expectation and Priorities

This document has been prepared at the request of, and in consultation with, the CENIC Technical Advisory Councils. It describes in general terms the extent to which local disaster recovery planning should or should not assume the availability of the CalREN networks and related services in the event of a disaster.

I. Geographic Scope

The expected availability of CalREN network services will vary with the scope and nature of a disaster. This document does not attempt to exhaustively list all possible disasters but rather to provide information about general classes of disaster based on their geographic scope.

Local Disaster Impacting A Customer Site:
In this scenario, a local issue (e.g., power outage, fire, flood, etc.) results in an outage for a particular site or a small number of sites. Network connectivity for other sites will generally be unaffected. Restoration time may be brief or prolonged depending on the nature of the disaster.

Local Disaster Impacting a CalREN Hubsite or AT&T Central Office:
In this scenario, a local issue results in an outage for a CalREN hubsite (an aggregation point in which many customer sites connect to CalREN) or at an AT&T facility that serves multiple sites. All connections served through the impacted site would be down, for a period of time on the order of hours or days. Customer sites with routed diverse connections not making use of the impacted site would route through the alternate connection. Network services (e.g., ISP or peering connections, video gatekeepers, domain name servers) located in an impacted hubsite would be down; however, critical servers are located in multiple hubsites and the alternate servers would assume the load.

Local Disaster Impacting the CalREN NOC:
In this scenario, a local issue results in the CalREN Network Operations Center (NOC) being inaccessible or uninhabitable. CENIC would activate its NOC disaster recovery plan and relocate services to a nearby campus. NOC services could be unavailable for several hours during this recovery; however, network services would not be impacted by an outage of the NOC.

Widespread Disaster:
A widespread disaster (for example, a major earthquake in the Los Angeles area) would result in outages for many sites and possibly multiple hubsites. Although CalREN is designed to survive most single points of failure, a large disaster of this sort would probably render large portions of the network inaccessible, perhaps for a prolonged period of time.

II. Third-party Services

CENIC makes use of contracted services for many purposes: telco circuits, colocation space, utility power, etc. While in many cases a service level for restoration is in place, a sufficiently large disaster would strain the restoration ability of service providers and it is possible that those service levels would not be met. For example, a typical colocation space maintains 8-16 hours of generator fuel on-site; while in a limited-scope disaster this supply can be replenished, in a more widespread disaster other customers would be competing for that same limited supply of fuel.

III. Restoration Priorities

Since the original development of the CalREN-DC and CalREN-HPR networks, it has been recognized that the more experimental nature of the HPR network necessarily lends itself to a higher level of reliability for the DC network, while the HPR network (and ad-hoc services made available via CalREN/XD) can tolerate larger outages. This principle was incorporated into the design of the various networks.

The CENIC TACs have provided CENIC with guidance as to the relative priority of service restoration for the CalREN networks, as follows:

The CENIC TACs recognize the difference between the various CalREN networks and support the higher-availability nature of DC and the more experimental, research-oriented nature of HPR and XD. Although the SLA spells out these differences for ordinary events, the TACs would like to explicitly state that these differences should also be observed for extraordinary, outside-SLA events, such as major disasters and other serious events that cause widespread network outages. The TACs believe that CENIC resources should initially be devoted to restoration of service of the DC network. Where there is infrastructure common to multiple networks, it is certainly acceptable to devote resources to restoring the function of that infrastructure regardless of the fact that it may support networks other than DC. However, where there is a choice as to whether to devote resources to recovering service on either DC or other networks, it is the TACs’ expectation that resources will first be targeted at the DC network.

The TACs expect that restoration of services on a particular network refers to all supported core services of that network. Given that IPv4 and IPv6 represent the core protocols on the DC network, the TAC expects that the restoration of service will initially include both IPv4 and IPv6 core protocols. We further expect that services that operate on top of the core protocols, such as video conferencing and other services that CENIC operates, can be restored at the discretion of CENIC staff, once the core protocols are operating.

IV. Financial Responsibility

The financial responsibility for service restoration generally rests with CENIC, its contractors, and its Charter Associates.