Copyright |
© Ericsson AB 2009–2011. All rights reserved. No part of this document may be reproduced in any form without the written permission of the copyright owner. | ||||||
Disclaimer |
The contents of this document are subject to revision without notice due to continued progress in methodology, design and manufacturing. Ericsson shall have no liability for any error or damage of any kind resulting from the use of this document. | ||||||
Trademark List |
|

1 Introduction
The SmartEdge® Storage Engine (SSE) provides Network File System (NFS) services to store large amounts of data for clients and applications internal to the SmartEdge router, including the Cross-Connect Route Processor controller card (XCRP4).
The SSE card stores Call Data Records (CDRs) from the SmartEdge router. The SSE can store a large amount of data records for a number of hours without requiring file extraction. The SSE can also store logs and bulk statistics, as well as event-based performance statistics. Data can be extracted through FTP or SFTP.
The SSE can be installed in any I/O slot in the SmartEdge 600, SmartEdge 1200 or SmartEdge 1200H chassis. It includes two field-replaceable Hard Disk Drives (HDDs) mounted in an HDD carrier that can be inserted and ejected from the SSE card without ejecting the SSE card; see Reference [1] for information on installation. The SSE card can operate with one or two HDDs inserted; each HDD can be replaced separately without interrupting read/write activity.
The SSE card can be configured for redundancy; see Section 2.5.
This document describes how to configure the SSE card, retrieve data records, display SSE information, and perform maintenance tasks. It also describes diagnostics, statistics, logging, and alarms for the SSE card and SSE disks.
2 SSE Card Configuration
Configuring an SSE card involves provisioning the SSE card, partitioning the SSE disks, and setting up a redundancy scheme.
- Note:
- If the SSE card is installed prior to provisioning it, check
for any alarms against the card.
[local]Redback(config)#show system alarm all
If the card is down, or other related problems are reported, see Section 5 for information about how to clear the fault before you attempt to provision the card.
2.1 Provision an SSE Card
You can also provision an SSE card using NetOp™ EMS; see Reference [7].
- Provision an SSE card before or after installing the card
in the SmartEdge chassis to make the SSE card operational.
[local]Redback(config)#card sse slot
- Create the SSE group. Indicate whether the group supports
network or disk redundancy. Redundancy
must be specified when provisioning the group for the first time.
- Note:
- The Converged Packet Gateway (CPG) supports a single hard disk for each SmartEdge Storage Engine (SSE) card.
[local]Redback(config)#sse group group_name {network-redundant [raid-0] | disk-redundant}
- Optional. Provide a description of the group.
[local]Redback(config-SE-group)#description text
- Optional. Partition the SSE disks. See Section 2.2.
- Optional. For network-redundant SSE groups, configure
the redundant group to always use the primary SSE card as active when
available.
[local]Redback(config-SE-group)#revert
- Assign the SSE card to an SSE group:
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Assign the SSE card to an SSE group. Each SSE card can
only be assigned to one SSE group. Each SSE group can have at most
two SSE cards assigned.
[local]Redback(config-card)#bind sse group group_name [secondary]
- Enter card configuration mode.
- Commit the transaction.
To view the configured SSE card, enter the following command in any mode:
>show configuration sse
2.1.1 Configuration Examples
Configure a revertive network-redundant SSE.
[local]Redback(config)#card sse 2 [local]Redback(config-card)#exit [local]Redback(config)#card sse 3 [local]Redback(config-card)#exit [local]Redback(config)#sse group sse_group_1 network-redundant [local]Redback(config-SE-group)#description SSE group 1 [local]Redback(config-SE-group)#partition p01 size 5 disk 1 [local]Redback(config-SE-partition)#alarm low-partition -space raise-at 70 clear-at 65 [local]Redback(config-SE-partition)#exit [local]Redback(config-SE-group)#partition p02 size 5 disk 2 [local]Redback(config-SE-partition)#alarm low-partition -space raise-at 70 clear-at 65 [local]Redback(config-SE-partition)#exit [local]Redback(config-SE-group)#revert [local]Redback(config-SE-group)#exit [local]Redback(config)#card sse 2 [local]Redback(config-card)#bind sse group sse_group_1 [local]Redback(config-card)#exit [local]Redback(config)#card sse 3 [local]Redback(config-card)#bind sse group sse_group_1 secondary [local]Redback(config-card)#commit
Configure a disk-redundant SSE.
[local]Redback(config)#card sse 4 [local]Redback(config-card)#exit [local]Redback(config)#sse group sse_group_2 disk-redundant [local]Redback(config-SE-group)#exit [local]Redback(config)#card sse 4 [local]Redback(config-card)#bind sse group sse_group_2 [local]Redback(config-card)#commit
Configure a standalone SSE card with one partition.
[local]Redback(config)#card sse 11 [local]Redback(config-card)#exit [local]Redback(config)#sse group sse_group_3 network-redundant [local]Redback(config-SE-group)#description SSE group 3 [local]Redback(config-SE-group)#partition p01 size 5 [local]Redback(config-SE-partition)#alarm low-partition -space raise-at 70 clear-at 65 [local]Redback(config-SE-partition)#exit [local]Redback(config)#card sse 11 [local]Redback(config-card)#bind sse group sse_group_3 [local]Redback(config-card)#commit
2.2 Partition the SSE Disks
You can create multiple partitions on each HDD and configure the partition size.
- Note:
- CPG supports a single hard disk for each SSE card.
- Enter SSE configuration mode.
[local]Redback(config)#sse group group_name
- Partition the SSE disks.
[local]Redback(config-SE-group)#partition name [size size_value] [disk disk_num] [non-mirror]
- Configure the partition to generate an alarm when the
partition is low in space; see Table 6.
[local]Redback(config-SE-partition)#alarm low-partition-space raise-at raise_percentage clear-at clear_percentage
- Commit the transaction.
2.3 Delete a Partition
Delete a partition if you need to free disk space, if you are planning to create another partition of the same name but with a different size, if the partition has read errors, or if the partition has failed test cases from ODD. Any data in the deleted partition is lost.
[local]Redback#delete partition sse slot disk_num partition_name
To delete a partition without deleting data, use the no partition command in SSE group configuration mode. If you configure the same partition under the same group in the future, the data from the previously configured partition is available.
[local]Redback(config-SE-group)#no partition name
2.4 Deprovision an SSE Card
You can not deprovision an SSE card if it is bound to an SSE group. Also, if it is not configured for redundancy, you must shut it down with the shutdown command before deprovisioning.
Caution! | ||
Risk of data corruption and loss of charging records. Removing
an SSE card without first shutting it down can cause file corruption.
To avoid the risk, perform the following steps:
|
- Remove the association between the SSE card and the SSE
group.
- Enter card configuration mode.
[local]Redback(config)# card sse slot
- If the SSE card is assigned to an SSE group, remove
the association.
[local]Redback(config-card)#no bind sse group
- Alternatively, if the SSE card is not assigned to an
SSE group, shut it down.
[local]Redback(config-card)#shutdown
- Wait 15 seconds for the card to shut down completely.
- Enter card configuration mode.
- Deprovision the SSE card.
[local]Redback(config)#no card sse slot
2.5 Redundancy
The SSE provides high-availability file storage through a redundancy scheme using Redundant Array of Independent Disks (RAID) 1 on the same slot or across different slots. The SSE supports the following redundancy schemes:
- Disk RAID 1 redundancy
- Network RAID 1 redundancy, with or without RAID 0 disks
- Nonredundant
2.5.1 Disk RAID 1 Mode
Disk RAID 1 redundancy mirrors data to both SSE disks.
In this configuration, both SSE disks must be compatible with each other and supported. If the SSE disks are incompatible, one disk will be started while the other is put out of service. If the SSE disks do not match but are compatible, the SSE disks will be brought up in a degraded fashion, causing a minor mismatch alarm.
Configure Disk RAID 1 Redundancy
[local]Redback(config)#sse group group_name disk-redundant
Hot Swap in Disk RAID 1 Redundancy
To swap out a hard drive without shutting down the SSE card in Disk RAID 1 mode, first shut down the hard drive to detach it from the redundancy scheme.
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Put the SSE disk in maintenance mode by disabling the
SSE disk.
[local]Redback(config-card)#shutdown [disk disk_num]
- Commit the transaction.
- Remove the disabled SSE disk and install a replacement SSE disk; see Reference [1] for instructions.
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Take the SSE disk out of maintenance mode by enabling
the SSE disk.
[local]Redback(config-card)#no shutdown [disk disk_num]
- Commit the transaction.
In a normal maintenance procedure, the newly inserted SSE disk is either not formatted or is blank . If the replaced disk has pre-existing data on it, that data cannot be accessed. If there is enough space to create the configured partitions, the partitions are created and the data is mirrored from one SSE disk to another; if there is not enough space on the newly inserted disk, the data is not mirrored, but the partitions are still servicing the clients from the first disk.
If the partitions on the new SSE disk are matched (or blank), but the inserted SSE disk has a different speed than the existing one, a minor MISMATCH alarm is raised for supported SSE disks.
Caution! Removing an SSE disk without first putting it into maintenance mode (shutdown), or shutting down an active SSE disk during data synchronization, can cause data loss or data corruption.
2.5.2 Network RAID 1 Mode
Network RAID 1 redundancy mirrors data onto two separate SSE cards through an internal IP network. By default, an SSE card maintains two separate hard drives. You can optionally configure the two hard drives on an SSE card with RAID 0, which creates a logical group to provide double the capacity.
In Network RAID 1 redundancy, all partitions on the same SSE card will be all-primary or all-secondary.
You can run the SSE card with only one hard drive inserted and operational but still participate in the Network RAID 1 redundancy scheme; however, any faults or failures can result in data loss or corruption. RAID 0 requires both SSE disks to be inserted and operational.
You cannot combine Disk RAID 1 redundancy and Network RAID 1 redundancy.
In Network RAID 1 redundancy, the standby SSE becomes the active SSE due to a failure condition (failover) or intentional switchover.
You can configure an SSE card with nonmirrored partitions using the partition command in SSE group configuration mode. On failover or switchover, the backup partition takes over without any data from the previously active disk.
Configure Network RAID 1 Redundancy
- To configure Network RAID 1 redundancy, configure the
SSE group as network-redundant.
[local]Redback(config)#sse group group_name network-redundant [raid-0]
- Assign each SSE card to the same SSE group
[local]Redback(config)#card sse slot
[local]Redback(config-card)#bind sse group group_name [secondary]
In a network-redundant configuration, the active SSE card displays a green LED, and the standby SSE card displays a yellow LED. Verify the status using the show hardware card detail command.
The following conditions can lead to failover in Network RAID 1 redundancy:
- Critical problems detected in runtime diagnostics
- Critical condition detected in XCRP environmental monitoring
- Software exception, connectivity failure, or timeout
- Card removal
- HDD removal
- HDD failure
- Card or disk shutdown on the active unit
- Note:
- If only one disk is shut down and the other disk is running on RAID 1, no switchover occurs.
- Active card is removed from SSE group (no bind sse group)
- reload card command issued on the active unit
Optional. On primary SSE failover, the secondary takes the active redundancy state and continues to support data transactions on the SSE group. For network-redundant SSE groups, you can configure the system to use the primary SSE as the active device, when it becomes available again.
- Enter SSE configuration mode.
[local]Redback(config)#sse group group_name
- Optional for network-redundant SSE groups. Configure the
revert command to use the primary as the active device, when it becomes
available again.
[local]Redback(config-SE-group)#revert
The data synching process occurs on all mirrored partitions while the standby SSE card becomes in-service, then the revert is pending until the synching process is complete on all mirrored partitions. Reverting is prevented after a successful manual switchover from primary to secondary. You must use manual switchover to switch back the active SSE card from secondary to primary.
To manually switch over to the primary or secondary SSE:
[local]Redback#sse group switch-over group_name
Hot Swap in Network RAID 1 Redundancy
To swap out a hard drive without shutting down the SSE card in Network RAID 1 mode, shut down the SSE card to detach it from the redundancy scheme, swap out one or both hard drives, then reload the SSE card.
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Put the SSE card in maintenance mode by disabling the
SSE card.
[local]Redback(config-card)#shutdown
- Commit the transaction. If the SSE card is active, the peer SSE card will take over its active role.
- Remove the disabled SSE card and install one or two replacement SSE disks; see Reference [1] for instructions.
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Take the SSE card out of maintenance mode.
[local]Redback(config-card)#no shutdown
- Commit the transaction.
Caution! Removing a hard drive without first putting it into maintenance mode (shutdown) can cause data loss or data corruption.
2.5.3 Nonredundant Mode
You can configure the SSE card in a nonredundant mode, either with or without RAID 0 enabled. Nonredundant mode is a subset of network-redundant mode with only one card bound to the group. If RAID 0 is enabled, the SSE card appears as a single hard drive; if RAID 0 is not enabled, the SSE card appears as two separate hard drives. You can also insert a single hard drive; however, any faults or failures can result in data loss or corruption. RAID 0 requires two SSE disks.
Configure a Nonredundant SSE Card
- To configure an SSE card in nonredundant mode, configure
the SSE group as network-redundant.
[local]Redback(config)#sse group group_name network-redundant [raid-0]
- Assign one SSE card to the SSE group
[local]Redback(config)#card sse slot
[local]Redback(config-card)#bind sse group group_name [secondary]
To swap out a hard drive without shutting down the SSE card in nonredundant mode, shut down an individual SSE disk and swap out the hard drive. While the SSE disk is in maintenance, data recording to the disabled SSE disk is not available.
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Put the SSE card in maintenance mode by disabling the
SSE card.
[local]Redback(config-card)#shutdown [disk disk_num]
- Commit the transaction.
- Remove the disabled SSE disk and install a replacement SSE disk; see Reference [1] for instructions.
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Take the SSE disk out of maintenance mode by enabling
the SSE disk.
[local]Redback(config-card)#no shutdown [disk disk_num]
- Commit the transaction.
The system reconfigures the hard drive.
Caution! Removing a hard drive without first putting it into maintenance mode (shutdown) can cause data loss or data corruption.
3 Data Record Retrieval
Mount points are created automatically when you create partitions on the SSE disks. To access the directory and manage directories and files, use CLI exec level commands listed in Table 1. .
Command |
Description |
---|---|
cd |
Change current working directory |
copy |
Copy a file |
delete |
Delete a file |
directory |
List contents of a directory |
format |
Format an SSE disk |
edit |
Edit a file with vi |
mkdir |
Make a directory |
more |
Display the contents of a file |
mount |
Mount file system |
pwd |
Display current working directory |
rename |
Rename a file or directory |
rmdir |
Remove a directory |
ssh |
Execute SSH/SSHD commands |
telnet |
Telnet to host |
unmount |
Unmount file system |
The directory structure for accessing partitions on the SSE card is /sse/sse_group_name/sse_partition_name. The following example displays a list of the files in a directory on the SSE card for the cde partition in an SSE group named sae_app.
[local]Redback#cd /sse/sae_app/cde [local]Redback#dir
External clients can extract data records by connecting to the Ethernet management port using SSH, logging on, and then using FTP to connect to a remote system, to which the data will be transferred. The client is responsible for deleting files as required—for example, as a periodic background cleanup operation, when the partition is running low on disk space, or when switchover or failover occurs. The SSE card does not delete files.
4 Display SSE Information
Show commands display a variety of information for the SSE card. Enter show commands in any mode.
To display the following information... |
Enter this command... |
---|---|
Administrator sessions on a system. |
show administrators [sftp-session] [active] |
Chassis installed and configured cards and their status. |
show chassis |
Summary of power allocation for the current SmartEdge chassis configuration. |
show chassis power [inventory] |
Current configuration of the SmartEdge router or the contents of a previously saved configuration file on the local file system. |
show configuration [card [slot]] [verbose] |
Current configuration of all SSE groups on the system. |
show configuration sse |
Results of the last completed test from the diag on-demand card slot [disk disk_num] [level level_count] [loop loop_count] command. |
show diag on-demand card slot [disk disk_num] [level level_count] [loop loop_count] |
Disk counters for the SSE card. |
show disk sse counters slot [disk_num] |
Disk information for the SSE card. |
show disk sse slot [disk_num] |
Information about the system hardware. |
show hardware [card slot] [detail] |
SSE group or SSE partition counters. |
show sse {group | partition} counters [group_name [partition_name]] |
SSE group or SSE partition information. |
show sse {group | partition} [group_name [partition_name]] [detail] |
System-level alarms. |
show system alarm [all | sse [group_ID [partition_ID]]] |
5 Fault Management
This section describes the alarms reported by SSE cards, SSE disks, SSE groups, and partitions. It also describes the procedure to enable SSE MIB notifications for SSE disk errors, and maintenance tasks including disabling the SSE card or SSE disks, formatting the SSE card, and reloading the SSE card. See Alarms and Probable Causes for a description of alarm conditions and their probable causes in the SmartEdge chassis and in controller cards, carrier cards, line cards, and their ports.
5.1 SSE Alarms
Faults are reported separately for SSE cards, SSE disks, SSE groups, and partitions. See Table 3, Table 4, Table 5, and Table 6 for alarm descriptions.
You can configure the threshold for low-partition-space alarms using the following command in SSE partition configuration mode:
alarm low-partition-space raise-at raise_percentage clear-at clear_percentage
Verify card- and disk-level alarms using the show hardware slot detail command. Verify group- and partition-level alarms using the show sse group group_name detail command.
Description |
Severity |
Probable Cause |
Service Affecting |
---|---|---|---|
ASE ASP 1 down |
Critical |
processorProblem |
Yes |
ASE ASP 2 down |
Critical |
processorProblem |
Yes |
NFS server service down |
Major |
operationFailure |
Yes |
Disk type mismatch |
Warning |
replaceableUnitTypeMismatch |
No |
CPU Crash |
Critical |
processorProblem |
Yes |
DIMM revision mismatch |
Critical |
replaceableUnitProblem |
Description |
Severity |
Probable Cause |
Service Affecting |
---|---|---|---|
Hard disk health degraded |
Minor |
replaceableUnitProblem |
No |
Hard disk failed |
Major |
diskFailure |
Yes |
Hard disk missing |
Major |
replaceableUnitMissing |
Yes |
Hard disk not supported |
Major |
replaceableUnitTypeMismatch |
Yes |
Hard disk out of service |
Minor |
diskFailure |
No |
Hard disk voltage failure |
Major |
diskFailure |
Yes |
Hard disk overheating: extremely hot |
Major |
diskFailure |
Yes |
Hard disk overheating: temperature hot |
Minor |
diskFailure |
No |
Hard disk read failure |
Major |
diskFailure |
Yes |
Hard disk power-on diagnostic failed |
Major |
diskFailure |
Description |
Severity |
Probable Cause |
Service Affecting |
---|---|---|---|
SSE group block device not connected |
Minor |
nooperationNotification(1) |
No |
SSE group manual switch in progress |
Major |
operationNotification |
Yes |
SSE group auto switch in progress |
Major |
operationNotification |
Yes |
SSE group switch completed |
Warning |
operationNotification |
No |
SSE group switch failed |
Major |
operationNotification |
Yes |
SSE group auto switch waiting to restore |
Minor |
operationNotification |
No |
SSE group not operational |
Major |
operationFailure |
Yes |
SSE group block device failed |
Major |
operationFailure |
Yes |
(1) Probable causes: In some
cases, the block device gets into a state where it is not able to
resolve conflicts between the primary card and secondary card. Because
of this, the primary and secondary instances of the block device are
disconnected and the data is not synchronized. Solution: a) Remove
the standby SSE from the group and issue the configure command, the card sse slot command, the no bind sse group command, and the commit command. b) Take the standby SSE out of the group before switch-over
and format the disks on the card by issuing the format
sse slot 1 and format sse slot 2 commands. c) Add the card back to the
SSE group using the configure command,
the card sse slot command, the bind sse group group name secondary command
and the commit command.
Description |
Severity |
Probable Cause |
Service Affecting |
---|---|---|---|
SSE group partition not operational(1) |
Major |
operationFailure |
Yes |
SSE group partition sync in progress |
Minor |
operationNotification |
No |
SSE group partition data sync failed |
Major |
operationFailure |
Yes |
SSE group partition full |
Major |
operationNotification |
Yes |
SSE group partition low space |
Minor |
operationNotification |
No |
SSE group partition not operational at standby(2) |
Major |
operationFailure |
Yes |
(1) Probable causes:
a) The disk does not have enough space to create the partition; b)
Another partition of the same name but with a different size already
exists on the disk from a previous configuration. Solution: Use the delete partition command to free up disk space or remove
the existing partition, or use the format sse command to remove all user-configured partitions on the disk. The format sse command can only be run on an SSE card that
is not bound to any SSE group.
(2) Probable
causes: a) The disk does not have enough space to create the partition;
b) Another partition of the same name but with a different size already
exists on the disk from a previous configuration. Solution: Use the delete partition command to free up disk space or remove
the existing partition, or use the format sse command to remove all user-configured partitions on the disk. The format sse command can only be run on an SSE card that
is not bound to any SSE group.
5.2 Enable SSE MIB Notifications for SSE Disk Errors
You must enable SSE Management Information Base (MIB) notifications for SSE disk errors to trigger a Simple Network Management Protocol (SNMP) trap when an operation error occurs on any of the hard disks on any of the SSE cards.
- Enter SNMP server configuration mode.
[local]Redback(config)#snmp server
- Enable SSE MIB notifications for SSE disk errors.
[local]Redback(config-snmp-server)#traps ssemib
5.3 Maintenance
Before performing maintenance operations, including swapping out an SSE disk in the SSE card, disable the SSE disk or SSE card.
5.3.1 Disable the SSE Card
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Disable the SSE card.
[local]Redback(config-card)#shutdown
- Commit the transaction.
5.3.2 Disable an SSE Disk on the SSE Card
- Enter card configuration mode.
[local]Redback(config)#card sse slot
- Disable an SSE disk on the SSE card.
[local]Redback(config-card)#shutdown [disk disk_num]
- Commit the transaction.
5.3.3 Format the SSE Card
Before formatting the SSE card, remove the association between the SSE card and the SSE group. If you try to format the SSE disk while the SSE card is a member of an SSE group, you will receive an error and the command will not take effect.
- If the SSE card is assigned to an SSE group, remove the
association.
[local]Redback(config-card)#no bind sse group
- Format the SSE card.
[local]Redback#format sse slot disk_num
The following example removes the association between the SSE card in slot 2 and the SSE group to which it is bound, and then formats disk 1 on the SSE card in slot 2.
[local]Redback(config)#card sse 2 [local]Redback(config-card)#no bind sse group [local]Redback(config-card)#exit [local]Redback(config)#exit [local]Redback#format sse 2 1
5.3.4 Reload
Use the reload card command to reload the SSE card. Use the reload disk command to reload an SSE disk on the SSE card.
Reloading an SSE disk gracefully shuts down the SSE disk before rereading the data on the SSE disk to avoid corrupting data. Reload an SSE disk in the SSE card if you encounter problems while mounting an SSE disk or with redundancy in Network RAID 1 mode on a given SSE disk. Reloading an SSE disk has lower impact than reloading the whole SSE card. The reload disk command is equivalent to removing and reinserting the SSE disk. If you issue this command on the active SSE card during data synchronization on any partition, the following warning message appears: Executing the command during data synchronization on any of the partitions will cause data corruption. Use the reload disk command in exec mode.
[local]Redback#reload disk slot_num disk_num
5.4 Troubleshooting
This section includes troubleshooting instructions for problems you may encounter during SSE card configuration or operation.
5.4.1 Recover from Uncorrected File System Error
If a file system corruption occurs on an SSE disk and cannot be automatically repaired, an fsck error log is created and the show diag on-demand command reports an uncorrected file system error and the location of the fsck log, as shown in the following example:
[local]Redback#show diag on-demand card 11 disk 1 detail Slot Number : 11 Disk Number : 1 Serial Number : G4xxxxxxxxxxAA Detected Serial Number : G4xxxxxxxxxxAA Controller Serial Number: D202G390840865 Test Level : 2 Loop Count : 1 Start Time : 09:34:08 06/30/2009 (UTC) Completion Time : 10:07:01 06/30/2009 (UTC) Test Summary : 1 Failure Test Results Loop 1: HDD R/W Verify Test : Passed HDD CLIE Verify Test : Passed HDD FS Surface Check : Failed Test Failure Details: - HDD FS Surface Check, slot 11, component 1 DIAG_TEST_FAILURE File system errors left uncorrected.. See fsck log : /p01/vx/odd/odd-slot10-fsck-sda4.error
View the fsck log for detailed error information; for example:
Ericsson# start sh # cat /p01/vx/odd/odd-slot10-fsck-sda4.error
To manually recover from the file system error, perform the following steps:
- If the SSE card is assigned to an SSE group, remove the
association.
[local]Redback(config-card)#no bind sse group
- Format the SSE card.
[local]Redback#format sse slot disk_num
- Reassign the SSE card to the SSE group:
[local]Redback(config-card)#bind sse group group_name
5.4.2 Recover from the Disconnected Group Block Device Error
In some cases, the block device gets into a state where it is not able to resolve conflicts between the primary card and secondary card. Because of this, the primary and secondary instances of the block device are disconnected and the data is not synchronized. In other words, CDR resilience may be lost if an SSE card switch-over from active to standby is taking place when the cards are not in sync.
When this problem is detected, the show sse group detail command will indicate that the block device is not connected:
The standby SSE card (9) fails to become active after issuing the reload card 6 command (6 indicates the active card).
The node reboots and there was no traffic forwarded during the switch-over.
The following example displays the status of the active and standby cards before triggering the SSE card switch-over:
[local]Redback# show sse group detail Name : SSE_group1 ID : 1 Description : ------------------------------------------------------------------------ State : Up Redundancy : network-redundant Disk Mode : Independent Revert : no revert Switch Reason : No Reason Switch Failed Reason: No Reason Alarms : NONE Partition(s) : -------------------------------------------- Name : cdr ID : 1 Group Name : SSE_group1 Group ID : 1 State : Up Size (GB) : 120 Percent Used : 1 Disk : 1 Mirrored : Enabled Alarm Low Space : Enabled Trigger Percentage : 80 (clear 70) Alarms : NONE Primary Slot : 6 -------------------------------------------- Redundancy State : Active Slot State : Up Disk ID(s) Ready : 1 Total Size (GB) : 276 Data Status : Up-To-Date Active Alarms : NONE Secondary Slot : 9 -------------------------------------------- Redundancy State : Standby Slot State : Up Disk ID(s) Ready : 1 Total Size (GB) : 276 Data Status : Up-To-Date Active Alarms : NONE
The following example displays the contents of /proc/drbd for slot 6 and 9. Unknown indicates that the cards are not in sync:
ENGINEERING: Should this output list "not connected" or "unknown?"
root@sse-slot06:/root> cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by eabinfte@eselnlx1063, 2011-01-31 12:35:32 0: cs:StandAlone st:Primary/DUnknown ds:UpToDate/DUnknown r--- ns:0 nr:0 dw:1172 dr:5162 al:22 bm:4 lo:0 pe:0 ua:0 ap:0 oos:29136 root@sse-slot09:/root> cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by eabinfte@eselnlx1063, 2011-01-31 12:35:32 0: cs:WFConnection st:Secondary/DUnknown ds:UpToDate/DUnknown C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:69 lo:0 pe:0 ua:0 ap:0 oos:1052672
The following example displays the status of the card after the failed switch-over:
[local]Redback# show sse group detail Name : SSE_group1 ID : 1 Description : ------------------------------------------------------------------------ State : Down Redundancy : network-redundant Disk Mode : Independent Revert : no revert Switch Reason : Card Not Ready Switch Failed Reason: Active Error Alarms : SSE group switch failed Partition(s) : -------------------------------------------- Name : cdr ID : 1 Group Name : SSE_group1 Group ID : 1 State : Down Size (GB) : 120 Percent Used : 0 Disk : 1 Mirrored : Enabled Alarm Low Space : Enabled Trigger Percentage : 80 (clear 70) Alarms : SSE group partition not operational SSE group partition not operational at standby Primary Slot : 6 -------------------------------------------- Redundancy State : Standby Slot State : Down Disk ID(s) Ready : None Total Size (GB) : 0 Data Status : Not Connected Active Alarms : NONE Secondary Slot : 9 -------------------------------------------- Redundancy State : Active Slot State : Down Disk ID(s) Ready : None Total Size (GB) : 0 Data Status : Not Connected Active Alarms : NONE
The following example displays the contents of /proc/drbd for slot 6 and 9.
- Note:
- Not Connected is no longer present. Instead, the status is now listed as UpToDate.
root@sse-slot06:/root> cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by eabinfte@eselnlx1063, 2011-01-31 12:35:32 0: cs:WFBitMapT st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:21 lo:0 pe:0 ua:0 ap:0 oos:125829056 root@sse-slot09:/root> cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by eabinfte@eselnlx1063, 2011-01-31 12:35:32 0: cs:WFBitMapT st:Secondary/Primary ds:UpToDate/UpToDate C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:69 lo:0 pe:0 ua:0 ap:0 oos:1052672
The show sse group detail command will result in a hanging CLI:
[local]Redback# show sse group detail (no prompt)
The show system alarm all command will not display any SSE-related messages:
Timestamp Type Source Severity Description --------------------------------------------------------------------------- Feb 1 09:20:12 chassis Minor Chassis power failure - side A2 Feb 1 09:20:12 chassis Minor Chassis power failure - side B1
The CDR directory is not available:
[local]Redback# start shell # cd /sse/SSE_group1/ # ls -l #
Since the two instances of the block device are disconnected and data is not synchronized between the primary and the secondary cards, to recover from this failure, issue the following recovery steps:
Take the standby SSE out of the group before switch-over and format the disks on the card by issuing the following commands:
- Remove the standby SSE from the group:
# configure # card sse slot # no bind sse group # commit
- Take the standby SSE out of the group before switch-over
and format the disks on the card by issuing the following commands:
# format sse slot 1 # format sse slot 2
- Add the card back to the SSE group:
# configure # card sse slot # bind sse group group name secondary # commit
6 Diagnostics and Monitoring
You can determine the hardware status of the SSE card by using LEDs and the results of the Power-on Diagnostics (POD); see Reference [4] or Reference [5] for details. Run On-Demand Diagnostics (ODD) for detailed diagnostics during operation.
6.1 On-Demand Diagnostics
For general information about ODD, see Reference [4] or Reference [5].
ODD performs detailed diagnostics on the SSE card or an SSE disk on the SSE card.
6.1.1 Card-Level ODD Testing
To run ODD on the SSE card:
- Put the SSE card in maintenance mode using the shutdown command in card configuration mode.
- Run ODD.
[local]Redback# diag on-demand card slot [level level_count] [loop loop_count]
The SSE card automatically reboots after ODD completes. The completion of the ODD is logged, including any failures encountered.
6.1.2 Disk-Level ODD Testing
You can run ODD on an SSE disk on the SSE card if the SSE card is configured in Disk RAID 1 redundancy or nonredundant mode.
To run ODD on a disk on the SSE card:
- Put the SSE disk in maintenance mode using the shutdown [disk disk_num] command in card configuration mode.
- Put the SSE disk in ODD mode.
[local]Redback#on-demand-diagnostics
- Run ODD on the appropriate SSE disk.
[local]Redback# diag on-demand card slot [disk disk_num] [level level_count] [loop loop_count]
The SSE card does not automatically reboot after ODD completes. The completion of the ODD is logged, including any failures encountered.
6.1.3 Disk-Level ODD Repair
You can run ODD repair to repair file system errors.
- Put the SSE disk in maintenance mode using the shutdown [disk disk_num] command in card configuration mode.
- Put the SSE disk in ODD mode.
[local]Redback#on-demand-diagnostics
- Run ODD on the appropriate SSE disk with the repair option.
[local]Redback# diag on-demand card slot [disk disk_num] repair
6.2 Statistics
Retrieve SSE disk I/O statistics using the show disk sse counters slot [disk_num] command in any mode. Retrieve Network RAID 1 counters using the show sse {group | partition} counters [group_name [partition_name]] in any mode.
6.3 Logging
Log messages are recorded for SSE card and SSE disk insertion, removal, and failures. The software running on the SSE card also logs events; software log messages with a severity level of notification or higher are displayed in the output of the show log command, prefixed by sse.
Completion of on-demand diagnostics for the SSE card and SSE disks are logged in a way similar to the logging for line cards.
Failure logs are displayed when you run the show diag on-demand card slot [disk disk_num] [level level_count] [loop loop_count] detail command. Failure reporting on ODD for third-party components is PASS/FAIL with a brief description based on application return codes. The location of relevant logs on the compact flash disks of the XCRP card is provided for more details.
See Logging for general information on logging, and logging configuration and operations.
7 NetOp EMS Support for SSE Card
You can provision an SSE card and view SSE information, including status, statistics, traps, and alarms using NetOp Element Management System (EMS). See Reference [7].
8 Command Hierarchy
config sse group description revert partition alarm low-partition-space card sse bind sse group shutdown disk snmp server traps ssemib exec clear sse group counters clear diag on-demand card clear disk sse counters delete partition diag on-demand card format sse reload card reload disk sse group switch-over all modes show administrators show chassis show chassis power show configuration show configuration sse show diag on-demand show disk sse show disk sse counters show hardware show sse {group | partition} show sse {group | partition} counters show system alarm show version
Glossary
ASE |
Advanced Services Engine |
CDRs |
Call Data Records |
EMS |
Element Management System |
HDDs |
Hard Disk Drives |
MIB |
Management Information Base |
NFS |
Network File System |
ODD |
On-Demand Diagnostics |
POD |
Power-on Diagnostics |
RAID |
Redundant Array of Independent Disks |
SNMP |
Simple Network Management Protocol |
SSE |
SmartEdge® Storage Engine |