5.3 Interpreting Log File Messages
5.3.2 The /usr/adm/dpujobmgr.log File
This file is a log of all the dpumanager daemon activity. This log reports
background diagnostic errors, register status when errors are reported, and
ACU kernel information. This log is especially helpful when you are trying to
determine what might have caused a problem when a program aborts.
This example shows there was a control store parity error (FLTCOD=2). Control
store parity errors can occur on a PE array PCB, ACU PCB, or the backplane.
(dpu0) Tue Apr 30 07:04:11 1991 (21) DPU fault: SWOPT=5; HWOPT=5;
FLTCOD=0x2; PMSTAT=0x0; PMEMECC=0
For this type of error, look at the parity LED on the PE PCBs for a clue to which
PCB is generating the error.
This example shows an execution of a
mpshutdown
command and the execution of
a
dpumanager
command.
(dpu0) Mon May 13 10:39:44 1991 Termination signal received; shutting down
(dpu0) Tue Jul 21 15:40:10 1992 Starting up; Version 2.2.0
(dpu0) Tue Jul 21 15:40:11 1992 loading microcode file: "/usr/mpp/etc/mp12ucode.wo"
(dpu0) Tue Jul 21 15:40:18 1992 loading ACU kernel file: "/usr/mpp/etc/acuk"
(dpu0) Wed Jul 22 12:10:55 1992 ACU kernel timeout (command 6)
ECSR=0x4002, QCSR=0x0, PTACCESS=0, CPC=0xffff00a8
(dpu0) Wed Jul 22 12:10:55 1992 Save of context failed, killing job; pid = 10333
(dpu0) Wed Jul 22 12:10:55 1992 ACU kernel command error (!ECSR<Run>)
(dpu0) Wed Jul 22 12:10:55 1992 unable to abort user -- reloading ACU kernel
(dpu0) Wed Jul 22 12:10:55 1992 Job context lost in system reset; pid = 10333
(dpu0) Wed Jul 22 12:10:55 1992 Job context lost in system reset; pid = 10355
(dpu0) Wed Jul 22 12:10:55 1992 loading ACU kernel file: "/usr/mpp/etc/acuk"
(dpu0) Wed Jul 22 12:10:56 1992 (@1) DPU fault: SWOPT=4; HWOPT=4;
FLTCOD=0x200; PMSTAT=0x0; PMEMECC=0
(dpu0) Wed Jul 22 12:10:56 1992 (@2) 15 PEs had errors
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x678 (board
5, cluster 3,6, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x674 (board
5, cluster 3,5, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x64c (board
4, cluster 3,3, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x638 (board
1, cluster 3,6, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x634 (board
1, cluster 3,5, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x60c (board
0, cluster 3,3, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x27c (board
5, cluster 1,7, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x278 (board
5, cluster 1,6, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x260 (board
5, cluster 1,0, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x25c (board
4, cluster 1,7, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x23c (board
1, cluster 1,7, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x238 (board
1, cluster 1,6, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x21c (board
0, cluster 1,7, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x44 (board 4,
cluster 0,1, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 12:10:56 1992 (@2) PE fault: PE number=0x4 (board 0,
cluster 0,1, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 18:24:32 1992 (@1) DPU fault: SWOPT=3; HWOPT=4;
FLTCOD=0x400000; PMSTAT=0x0; PMEMECC=0
(dpu0) Wed Jul 22 18:24:32 1992 (@2) 1822 PEs had errors (only 21 reported)
(dpu0) Wed Jul 22 18:24:32 1992 (@2) PE fault: PE number=0x1c (board 0,
cluster 0,7, PE-in-cluster 0,0); error bits=0x2 ( ROUTER )
(dpu0) Wed Jul 22 18:24:32 1992 (@2) PE fault: PE number=0x1b (board 0,
cluster 0,6, PE-in-cluster 0,3); error bits=0x2 ( ROUTER )
5–10 Using Diagnostic Software
Содержание DECmpp 12000/Sx 100
Страница 36: ......
Страница 82: ......
Страница 84: ......
Страница 91: ...mpstat 1 Files usr adm dpuacct See Also dpumanager 8 mpq 1 Data Parallel Unit Reference Pages B 7...
Страница 145: ...pe_rtbp 1 Files Executable binary MP_PATH field bin pe_rtbp Data Parallel Unit Reference Pages B 61...