Advanced Services Startup, Failure, and Recovery

Contents

1Advanced Services Startup, Failure, and Recovery

2

Verifying the Status of ASPs, ASP Pools, and ASP Groups

3

Managing ASP Failure and Recovery
3.1Handling ASP Failures
3.2Monitoring for ASP Down Alarms
3.3Understanding ASP Failover Behavior
3.4Automatic Software Reset of ASPs
3.5ASP Shutdown and Reload

Glossary

Reference List
Copyright

© Ericsson AB 2009–2011. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner.

Disclaimer

The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document.

Trademark List
SmartEdge is a registered trademark of Telefonaktiebolaget LM Ericsson.
NetOp is a trademark of Telefonaktiebolaget LM Ericsson.

1   Advanced Services Startup, Failure, and Recovery

This document describes the behavior of the Advanced Services Processor (ASP) during startup, failure, and recovery. The ASP is the device that provides services on the Advanced Services Engine (ASE) card. Every ASE card has two ASPs. For Security Service, each ASP can be configured to provide separate instances of the Security Service and support for high availability or load balancing.

For Distributed Control Plane (DCP), all ASPs are consumed by the DCP service-type.


 Caution! 
Risk of substantial delay when an ASE card is restarted. An ASE card can take several minutes to complete a shutdown and restart.

2   Verifying the Status of ASPs, ASP Pools, and ASP Groups

With the NetOp™ EMS software, use the card active view, ASP pool active view, and ASP group active view to verify the status of ASPs. These views provide a read-only display of the current status of the ASPs on a card, an individual ASP pool, or group; see Reference [1].

With the Command Line Interface (CLI) of the SmartEdge® OS, use the following commands:

See Reference [2].

The card active view and the show asp detail command display the following information for each ASP on the card:

The ASP pool active view and the show asp pool detail command display the following information:

The ASP group active view and the show asp group detail command display the following information:

3   Managing ASP Failure and Recovery

This section describes how to manage ASP failure and understand the impact an ASP failure can have. It also provides information you can use to minimize that impact.

3.1   Handling ASP Failures

An operational ASP may become non-operational due to operator action, software faults, or hardware faults.

All events that render an ASP non-operational (shut down of the ASP, shut down of the ASE card, removal of the physical ASE card, deletion of an ASP configuration, deletion of an ASE card or ASP from the inventory, software faults, failure of the ASE card hardware, and so on) are mapped into either of the following states

The fault handling differs depending on whether a backup ASP is available and whether the failure is transient or permanent:

When an ASP recovers after it is deemed to have permanently failed, or when a new ASP is added to the ASP pool, the ASP is assigned as follows:

Rebalancing causes IPsec tunnels or subscriber sessions to go down before they are reestablished on the new ASP. In both cases, there is a potential for traffic loss for the affected tunnels or subscribers.

You can use the show asp, show asp group, and show asp pool commands to check the operational state of the ASP, see Section 2.

3.2   Monitoring for ASP Down Alarms

A critical alarm in raised when an ASP on a configured ASE card goes down for any reason other than an explicit shutdown of the card by a user.

See Table 1 for details about each of the two possible alarms.

Table 1    ASP Down Alarm Descriptions

Description

Severity

Probable Cause

Service Affecting

ASP 1 down

Critical

Processor Problem

Yes

ASP 2 down

Critical

Processor Problem

Yes

ASP down alarms are raised for the slot of the SmartEdge router that contains the ASE card of the failed ASP.

Use the Fault view in the NetOp client to monitor alarms. You can filter this view to show only alarms, and sort the view by severity. The ASP down alarm will appear in the Fault view when Network, or the appropriate proxy or domain, is selected in the network navigator, as well as when the affected SmartEdge router or slot is selected in the object navigator. For more information, see the "Faults" chapter of Reference [3].

When an ASP alarm is raised, details are available to indicate the root-cause of the failure.

Example 1   ASP Fault Isolation

Source: Card
Severity: Major     
Description: ASP 1 missing service association
Service Affecting: TRUE

Source: Card
Severity: Major     
Description: ASP 2 missing service association
Service Affecting: TRUE

3.3   Understanding ASP Failover Behavior

The following examples illustrate many of the possible ASP failover behaviors:

3.4   Automatic Software Reset of ASPs

An automatic software reset of an ASP occurs when a critical application or one of the data plane cores fails.

An automatic software reset is triggered when:

3.5   ASP Shutdown and Reload

You can use the following commands available from the SmartEdge OS to shut down the ASE card or its ASPs and restart the card:

The shutdown process for all of these commands takes 2 minutes to complete. When you issue a shutdown command, a message appears informing you not to physically remove the ASE card from the SmartEdge chassis for at least two minutes. This allows the ASP configurations to be backed up to the flash memory on the ASE card. When reloading an ASE card, ensure that traffic processing is not impacted while the card is out of service.


Glossary

ASE
Advanced Services Engine
 
ASP
Advanced Services Processor
 
CLI
Command Line Interface
 
IPsec
Internet Protocol Security

Reference List

[1] Advanced Services Configuration and Operation Using the NetOp EMS Software, 1553-CRA 119 1170/1
[2] Security Service Command Reference, 1/190 80-CRA 119 1170/1-V1
[3] Fault Management, 6/1543-CRA 119 1171/1