Tech Note 001
Title: Troubleshooting guidelines (1) - General
Updated: April 2005
 
From time to time you may experience issues with backups, archives or restores. This usually results in a failed operation. Always make sure that you check the status and logs of completed processes. If an operation has failed, you should attempt to ascertain the cause of the failure and re-run the operation until it completes. In the case of backups and archives, your data may not be safe until the operation has successfully completed.
If, after following the guidelines in this document, you suspect that the issue is hardware-related, please refer to Tech Note 002: Troubleshooting guidelines (2) - Diagnosing hardware faults
It is important that you re-try the job before contacting support staff. In order to correctly diagnose an issue you must be able to reproduce the problem, and support staff will usually ask you to re-run the process with the debug option enabled (see Tech Note 003, Setting debug for further information). If the process you are running uses a large data set, re-running the process may not be feasible due to time constraints. If this is the case, re-try the job using a smaller data set, but make sure that everything other than the data set is exactly the same, including the volumes and drives used by the original process.
 
Finding the problem
When a process fails, the FlashNet process log gives a good indication of the reason. Every log contains a list of messages which outline the progress of the operation. Each message in a FlashNet log begins with a four digit code, which denotes the type of message. There are three basic types of message: I(nformation), W(arnings) and F(atal). Information messages provide general information about the progress of the job. Warning messages highlight items that you should examine, but that are not sufficiently catastrophic to cause a fatal error; for example, files that were not able to be backed up are highlighted with warning messages. Fatal messages indicate a catastrophic failure of the process, which is halted at the point of the failure.
In the case of a failed operation, the log will contain at least one Fatal error. The actual message will vary depending on the cause of the failure, but the first thing to look for in a failed FlashNet process log is the 'Fxxx' message. If the fatal message is caused by a failure of the drive or medium (this may be indicated by an I031 message preceding the failure, though not all I031 messages report errors), please refer to Tech Note 002: Troubleshooting guidelines (2) - Diagnosing hardware faults.
First steps
In many cases the Fatal message indicates the exact problem, e.g. if the current volume becomes full and the group contains no more volumes, FlashNet displays the message: F082: 10 Spanning failed because no media are available in group '%s. You can then rectify the issue (in this case, make sure that enough media are present in the destination group or the excess group) and re-try the job. If the job now completes then the issue is fixed.
If the problem persists…
If the message doesn't give a clear indication of the problem, or if the re-tried job fails after remedial measures, make sure that the installation is in a 'good' state. From time to time, if a process is killed outside of FlashNet, or if FlashNet is unable to successfully complete a process or movement within the library, the files that FlashNet uses to gather information about the hardware devices may not be correctly updated, and the FlashNet UI doesn't have an accurate picture of the state of the hardware. For example, if there is a failure after an autochanger drive has unloaded a volume, but before the robot has replaced the volume in its slot, then the drive reports as empty. The drive is in fact unloaded, but the tape is still in the drive slot. In this case subsequent operations will fail.
It's important, therefore, to ensure that all hardware is in a good state, and that the Flashnet UI is updated correctly to reflect this. Make sure that all drives are unloaded (except those in use by other operations), and that all volumes are correctly replaced in their original slots in the autochanger. The FlashNet Autochanger Setup window must be an accurate reflection of the contents of the autochanger.
Once you are sure that the installation is 'good', re-try the job.
Further assistance
If all else fails you will need to escalate the problem to your Authorized Xinet Integrator (AXI). AXIs are trained by SGL and Xinet personnel and can provide assistance on most FlashNet and FlashWeb issues. If necessary, they will escalate the issue to Xinet and SGL.
Before passing the issue along to your AXI, make sure that the problem is reproducible. They will also ask for a debug log. You'll need to provide them with the original process log (/<FlashNet_home_directory>/database/process_logs), and the debug log (/<FlashNet_home_directory>/error_logs).
 

 ==END==

>> top